如何使LLM输出结构化数据,使用ollama和pydantic

发布于:2025-07-29 ⋅ 阅读:(23) ⋅ 点赞:(0)

如何使LLM输出结构化数据,使用ollama和pydantic

参考:

https://ollama.com/blog/structured-outputs

先来看正常的LLM输出的消息:

from ollama import chat
from pydantic import BaseModel



response = chat(
    messages=[
        {
            'role': 'user',
            'content': 'Tell me about Canada.',
        }
    ],
    model='llama3.1',
)

print(response.message.content)

输出结果:

Canada! The Great White North, as Canadians affectionately call it. Here's a comprehensive overview:

**Geography**

Canada is the world's second-largest country by land area (after Russia), spanning over 5 million square miles (13 million square kilometers). It shares borders with the United States to the south and three territories (Yukon, Northwest Territories, and Nunavut) to the north. The landscape varies from rugged mountains to vast prairies, forests, and coastlines along the Atlantic Ocean, Pacific Ocean, and Arctic Ocean.

**Cities**

Some of Canada's most notable cities include:

1. **Toronto**: Ontario's capital, known for its diverse culture, finance industry, and iconic CN Tower.
2. **Vancouver**: A hub on the West Coast, famous for its mild climate, mountains, and beaches.
3. **Montreal**: Quebec's largest city, with a rich history, vibrant arts scene, and delicious cuisine (poutine, anyone?).
4. **Calgary**: Alberta's energy capital, close to Banff National Park and the Rocky Mountains.
5. **Ottawa**: The nation's capital, featuring stunning Parliament Hill and Rideau Canal (a UNESCO World Heritage Site).

**Culture**

Canadian culture is a melting pot of influences from Indigenous peoples, European settlers, and immigrants from around the world. This diversity is reflected in:

1. **Multiculturalism**: Canada celebrates its diverse heritage through festivals, cuisine, and traditions.
2. **Indigenous Peoples**: The country has a significant Native population (over 4% of the population), with their own languages, customs, and contributions to Canadian society.
3. **Quebecois Culture**: Quebec has its own distinct culture, language (French), and history, which is protected by laws like Bill 101 (official language law).
4. **Canadian Identity**: A shared sense of national pride, humor (think apologetic Canadians!), and values such as politeness, diversity, and respect for others.

**Economy**

Canada has a mixed economy, with:

1. **Natural Resources**: Energy (oil, gas), mining (metals, minerals), forestry, and agriculture contribute significantly to the country's GDP.
2. **Manufacturing**: Canada is home to diverse industries like automotive, aerospace, and technology.
3. **Service Sector**: Tourism, finance, healthcare, education, and tourism are also significant contributors.
4. **Trade**: The country has a strong trading relationship with the United States, as well as trade agreements with other countries.

**Demographics**

Canada's population is approximately 37 million people (2020 estimate), with:

1. **Diversity**: Over 20% of Canadians were born outside Canada.
2. **Age Structure**: A relatively young population, with a median age of 41 years.
3. **Language**: English and French are the official languages; many other languages are spoken by immigrants.

**History**

Canada's history spans thousands of years, from Indigenous peoples to European exploration and settlement:

1. **Indigenous Peoples**: For centuries, various Native groups inhabited the land before European contact.
2. **Fur Trade**: The early 17th-century fur trade marked the beginning of European presence.
3. **British and French Colonization**: Successive British and French colonies were established in the 18th century.
4. **Confederation**: The Canadian Confederation was formed in 1867, uniting provinces under a single government.

**Tourism**

Canada is an excellent destination for outdoor enthusiasts and city lovers alike:

1. **National Parks**: Over 40 national parks and historic sites offer diverse landscapes and experiences (e.g., Banff National Park, Jasper National Park).
2. **Wildlife**: Encounter bears, moose, whales, and other animals in their natural habitats.
3. **Outdoor Activities**: Enjoy skiing, hiking, kayaking, fishing, or simply take a scenic drive through the countryside.
4. **Cities**: Explore vibrant cities like Toronto, Vancouver, Montreal, and more.

**Fun Facts**

1. **Maple Syrup**: Canada produces 70% of the world's maple syrup!
2. **Cows and Moose**: The country has more cows than people in some provinces (e.g., Quebec).
3. **Beaver**: The beaver is a national animal, celebrated for its industry and engineering prowess.
4. **Mounties**: The Royal Canadian Mounted Police are Canada's iconic law enforcement agency.

Now that you know more about Canada, do you have any specific questions or would you like to learn more about a particular aspect?

再来看如何约束LLM输出结构化数据:

官方示例:

from ollama import chat
from pydantic import BaseModel

class Country(BaseModel):
  name: str
  capital: str
  languages: list[str]

response = chat(
  messages=[
    {
      'role': 'user',
      'content': 'Tell me about Canada.',
    }
  ],
  model='llama3.1',
  format=Country.model_json_schema(),
)

country = Country.model_validate_json(response.message.content)
print(country)

输出结果:

name='Canada' capital='Ottawa' languages=['English', 'French']

使用方法:

使用Ollama Python库,将 schema 作为 JSON 对象传递给 format 参数,可以是 dict 类型,或者使用 Pydantic(推荐)通过 model_json_schema() 方法序列化 schema。

  1. 定义结构化消息类Country或者字典
  2. 在LLM的chat函数里,给format传递参数
  3. 获取结构化消息:country = Country.model_validate_json(response.message.content)

其他例子

数据抽取Data extraction

from ollama import chat
from pydantic import BaseModel

class Pet(BaseModel):
  name: str
  animal: str
  age: int
  color: str | None
  favorite_toy: str | None

class PetList(BaseModel):
  pets: list[Pet]

response = chat(
  messages=[
    {
      'role': 'user',
      'content': '''
        I have two pets.
        A cat named Luna who is 5 years old and loves playing with yarn. She has grey fur.
        I also have a 2 year old black cat named Loki who loves tennis balls.
      ''',
    }
  ],
  model='llama3.1',
  format=PetList.model_json_schema(),
)

pets = PetList.model_validate_json(response.message.content)
print(pets)

输出结果:

pets=[Pet(name='Luna', animal='cat', age=5, color='grey', favorite_toy='yarn'), Pet(name='Loki', animal='cat', age=2, color='black', favorite_toy='tennis balls')]

图片描述

结构化输出也可以与视觉模型一起使用。例如,以下代码使用 llama3.2-vision 来描述以下图像,并返回一个结构化输出:

在这里插入图片描述

官方代码:

from ollama import chat
from pydantic import BaseModel
from typing import Optional, Literal

class Object(BaseModel):
  name: str
  confidence: float
  attributes: str 

class ImageDescription(BaseModel):
  summary: str
  objects: List[Object]
  scene: str
  colors: List[str]
  time_of_day: Literal['Morning', 'Afternoon', 'Evening', 'Night']
  setting: Literal['Indoor', 'Outdoor', 'Unknown']
  text_content: Optional[str] = None

path = './image.jpg'

response = chat(
  model='llava',
  format=ImageDescription.model_json_schema(),  # Pass in the schema for the response
  messages=[
    {
      'role': 'user',
      'content': 'Analyze this image and describe what you see, including any objects, the scene, colors and any text you can detect.',
      'images': [path],
    },
  ],
  options={'temperature': 0},  # Set temperature to 0 for more deterministic output
)

image_description = ImageDescription.model_validate_json(response.message.content)
print(image_description)

输出结果:

summary="The image shows a bottle of wine with a label that reads 'BENGOD'S HIGHLAND PARK VIP 407', accompanied by a glass of red wine and a bunch of grapes on a table. The background is neutral, emphasizing the wine as the main subject." 

objects=[Object(name='Wine bottle', confidence=95.0, attributes="The bottle has a label that reads 'BENGOD'S HIGHLAND PARK VIP 407'. It is standing upright on the table."), Object(name='Glass of wine', confidence=85.0, attributes='The glass contains red wine and is placed next to the bottle.'), Object(name='Grapes', confidence=90.0, attributes='A bunch of grapes is on the table near the wine bottle and glass.'), Object(name='Table', confidence=85.0, attributes='The table is a dark surface with a few items on it. The background is blurred but appears to be an indoor setting.')] 

scene='The scene suggests a wine tasting or a moment of relaxation with a glass of wine and some grapes, possibly indicating the enjoyment of fine wines.' 
colors=['#FFC0', '#8B00', '#E64A', '#3F3F'] 
time_of_day='Night' 
setting='Indoor' 
text_content="The text on the wine bottle is 'BENGOD'S HIGHLAND PARK VIP 407'. There are no other texts visible in the image."

使用JavaScript来调用

可以使用Ollama提供的JavaScript库,将JSON格式的schema对象传递给format餐宿,或者使用Zod(推荐)的zodToJsonSchema()方法来序列化shcema

import ollama from 'ollama';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

const Country = z.object({
    name: z.string(),
    capital: z.string(), 
    languages: z.array(z.string()),
});

const response = await ollama.chat({
    model: 'llama3.1',
    messages: [{ role: 'user', content: 'Tell me about Canada.' }],
    format: zodToJsonSchema(Country),
});

const country = Country.parse(JSON.parse(response.message.content));
console.log(country);

使用OpenAI兼容接口格式

from openai import OpenAI
import openai
from pydantic import BaseModel

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

class Pet(BaseModel):
    name: str
    animal: str
    age: int
    color: str | None
    favorite_toy: str | None

class PetList(BaseModel):
    pets: list[Pet]

try:
    completion = client.beta.chat.completions.parse(
        temperature=0,
        model="llama3.1:8b",
        messages=[
            {"role": "user", "content": '''
                I have two pets.
                A cat named Luna who is 5 years old and loves playing with yarn. She has grey fur.
                I also have a 2 year old black cat named Loki who loves tennis balls.
            '''}
        ],
        response_format=PetList,
    )

    pet_response = completion.choices[0].message
    if pet_response.parsed:
        print(pet_response.parsed)
    elif pet_response.refusal:
        print(pet_response.refusal)
except Exception as e:
    if type(e) == openai.LengthFinishReasonError:
        print("Too many tokens: ", e)
        pass
    else:
        print(e)
        pass

看起来复杂很多

提示

为了更稳定的使用结构化输出,建议:

  1. 使用Pydantic(python)或者Zod(javascript)来定义返回结果的数据schema
  2. 添加"使用json格式返回"到prompt来帮助模型理解需求
  3. temperature参数设置为0来获得更确定的输出
    注意
    只能确保输出的是结构化数据,但是语义是否正确完全取决于模型本身的性能,因为需要对输出结果进行语义上的二次校验。

网站公告

今日签到

点亮在社区的每一天
去签到