如何解析JSON输出
Prerequisites
本指南假设您熟悉以下概念:
虽然一些模型提供商支持内置的方式返回结构化输出,但并非所有都支持。我们可以使用输出解析器来帮助用户通过提示指定任意的JSON模式,查询模型以获取符合该模式的输出,并最终将该模式解析为JSON。
note
请记住,大型语言模型是存在泄露的抽象!您需要使用具有足够容量的LLM来生成格式良好的JSON。
JsonOutputParser
是一个内置选项,用于提示并解析JSON输出。虽然它在功能上与 PydanticOutputParser
类似,但它还支持流式返回部分JSON对象。
这里有一个例子,展示了如何与Pydantic一起使用,以便方便地声明预期的模式:
%pip install -qU langchain langchain-openai
import os
from getpass import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass()
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
model = ChatOpenAI(temperature=0)
# Define your desired data structure.
class Joke(BaseModel):
setup: str = Field(description="question to set up a joke")
punchline: str = Field(description="answer to resolve the joke")
# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."
# Set up a parser + inject instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=Joke)
prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
chain = prompt | model | parser
chain.invoke({"query": joke_query})
{'setup': "Why couldn't the bicycle stand up by itself?",
'punchline': 'Because it was two tired!'}
请注意,我们正在将format_instructions
从解析器直接传递到提示中。您可以并且应该尝试在提示的其他部分添加您自己的格式提示,以增强或替换默认指令:
parser.get_format_instructions()
'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n\`\`\`\n{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}\n\`\`\`'
流处理
如上所述,JsonOutputParser
和 PydanticOutputParser
之间的一个关键区别是 JsonOutputParser
输出解析器支持流式传输部分块。以下是其外观:
for s in chain.stream({"query": joke_query}):
print(s)
{}
{'setup': ''}
{'setup': 'Why'}
{'setup': 'Why couldn'}
{'setup': "Why couldn't"}
{'setup': "Why couldn't the"}
{'setup': "Why couldn't the bicycle"}
{'setup': "Why couldn't the bicycle stand"}
{'setup': "Why couldn't the bicycle stand up"}
{'setup': "Why couldn't the bicycle stand up by"}
{'setup': "Why couldn't the bicycle stand up by itself"}
{'setup': "Why couldn't the bicycle stand up by itself?"}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': ''}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it was'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it was two'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it was two tired'}
{'setup': "Why couldn't the bicycle stand up by itself?", 'punchline': 'Because it was two tired!'}
没有使用 Pydantic
你也可以不使用Pydantic来使用JsonOutputParser
。这将提示模型返回JSON,但不提供关于模式应该是什么的具体细节。
joke_query = "Tell me a joke."
parser = JsonOutputParser()
prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
chain = prompt | model | parser
chain.invoke({"query": joke_query})
{'response': "Sure! Here's a joke for you: Why couldn't the bicycle stand up by itself? Because it was two tired!"}
下一步
你现在已经学会了一种提示模型返回结构化JSON的方法。接下来,查看获取结构化输出的更广泛指南以了解其他技术。