LLM Pydantic程序¶

本指南向您展示如何使用我们的LLMTextCompletionProgram生成结构化数据。给定一个LLM以及一个输出的Pydantic类，可以生成一个结构化的Pydantic对象。

对于目标对象，您可以选择直接指定output_cls，或者指定一个PydanticOutputParser或任何其他生成Pydantic对象的BaseOutputParser。

在下面的示例中，我们将向您展示不同的提取方式，将其提取到Album对象中（该对象可以包含一个Song对象的列表）。

将内容提取到 `Album` 类中¶

这是一个简单的示例，将输出解析为一个 Album 模式，其中可以包含多首歌曲。

只需在初始化 LLMTextCompletionProgram 时将 Album 传递给 output_cls 属性即可。

如果您在Colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

from pydantic import BaseModel
from typing import List

from llama_index.core.program import LLMTextCompletionProgram
from pydantic import BaseModel
from typing import List

from llama_index.core.program import LLMTextCompletionProgram

定义输出模式

In [ ]:

Copied!





class Song(BaseModel):
    """歌曲的数据模型。"""

    title: str
    length_seconds: int


class Album(BaseModel):
    """专辑的数据模型。"""

    name: str
    artist: str
    songs: List[Song]
class Song(BaseModel):
    """歌曲的数据模型。"""

    title: str
    length_seconds: int


class Album(BaseModel):
    """专辑的数据模型。"""

    name: str
    artist: str
    songs: List[Song]

定义LLM pydantic程序

In [ ]:

Copied!

from llama_index.core.program import LLMTextCompletionProgram
from llama_index.core.program import LLMTextCompletionProgram

In [ ]:

Copied!





prompt_template_str = """\
生成一个示例专辑，包括一个艺术家和一组歌曲。以电影 {movie_name} 为灵感。\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)
prompt_template_str = """\
生成一个示例专辑，包括一个艺术家和一组歌曲。以电影 {movie_name} 为灵感。\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)

请运行程序以获取结构化输出。

In [ ]:

Copied!

output = program(movie_name="The Shining")
output = program(movie_name="The Shining")

输出是一个有效的Pydantic对象，我们可以使用它来调用函数/API。

In [ ]:

Copied!

output
output

Out[ ]:

Album(name='The Overlook', artist='Jack Torrance', songs=[Song(title='Redrum', length_seconds=240), Song(title="Here's Johnny", length_seconds=180), Song(title='Room 237', length_seconds=300), Song(title='All Work and No Play', length_seconds=210), Song(title='The Maze', length_seconds=270)])

使用Pydantic输出解析器进行初始化¶

上述代码等同于定义一个Pydantic输出解析器，并将其传递给output_cls而不是直接传递。

In [ ]:

Copied!





from llama_index.core.output_parsers import PydanticOutputParser

program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(output_cls=Album),
    prompt_template_str=prompt_template_str,
    verbose=True,
)
from llama_index.core.output_parsers import PydanticOutputParser

program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(output_cls=Album),
    prompt_template_str=prompt_template_str,
    verbose=True,
)

In [ ]:

Copied!

output = program(movie_name="Lord of the Rings")
output
output = program(movie_name="Lord of the Rings")
output

Out[ ]:

Album(name='The Fellowship of the Ring', artist='Middle-earth Ensemble', songs=[Song(title='The Shire', length_seconds=240), Song(title='Concerning Hobbits', length_seconds=180), Song(title='The Ring Goes South', length_seconds=300), Song(title='A Knife in the Dark', length_seconds=270), Song(title='Flight to the Ford', length_seconds=210), Song(title='Many Meetings', length_seconds=240), Song(title='The Council of Elrond', length_seconds=330), Song(title='The Great Eye', length_seconds=180), Song(title='The Breaking of the Fellowship', length_seconds=360)])

定义自定义输出解析器¶

有时候你可能希望以自己的方式将输出解析成一个JSON对象。

In [ ]:

Copied!





from llama_index.core.output_parsers import ChainableOutputParser


class CustomAlbumOutputParser(ChainableOutputParser):
    """自定义专辑输出解析器。

    假设第一行是专辑名称和艺术家。

    假设每个后续行是歌曲。

    """

    def __init__(self, verbose: bool = False):
        self.verbose = verbose

    def parse(self, output: str) -> Album:
        """解析输出。"""
        if self.verbose:
            print(f"> 原始输出：{output}")
        lines = output.split("\n")
        name, artist = lines[0].split(",")
        songs = []
        for i in range(1, len(lines)):
            title, length_seconds = lines[i].split(",")
            songs.append(Song(title=title, length_seconds=length_seconds))

        return Album(name=name, artist=artist, songs=songs)
from llama_index.core.output_parsers import ChainableOutputParser


class CustomAlbumOutputParser(ChainableOutputParser):
    """自定义专辑输出解析器。

    假设第一行是专辑名称和艺术家。

    假设每个后续行是歌曲。

    """

    def __init__(self, verbose: bool = False):
        self.verbose = verbose

    def parse(self, output: str) -> Album:
        """解析输出。"""
        if self.verbose:
            print(f"> 原始输出：{output}")
        lines = output.split("\n")
        name, artist = lines[0].split(",")
        songs = []
        for i in range(1, len(lines)):
            title, length_seconds = lines[i].split(",")
            songs.append(Song(title=title, length_seconds=length_seconds))

        return Album(name=name, artist=artist, songs=songs)

In [ ]:

Copied!





prompt_template_str = """\
生成一个示例专辑，包括一个艺术家和一组歌曲。以电影 {movie_name} 为灵感。

以以下格式返回答案。
第一行是：
<album_name>, <album_artist>
随后的每一行是一个歌曲，格式为：
<song_title>, <song_length_seconds>
"""

program = LLMTextCompletionProgram.from_defaults(
    output_parser=CustomAlbumOutputParser(verbose=True),
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)
prompt_template_str = """\
生成一个示例专辑，包括一个艺术家和一组歌曲。以电影 {movie_name} 为灵感。

以以下格式返回答案。
第一行是：
, 
随后的每一行是一个歌曲，格式为：
, 
"""

program = LLMTextCompletionProgram.from_defaults(
    output_parser=CustomAlbumOutputParser(verbose=True),
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)

In [ ]:

Copied!

output = program(movie_name="The Dark Knight")
output = program(movie_name="The Dark Knight")

> Raw output: Gotham's Reckoning, The Dark Knight
A Dark Knight Rises, 240
The Joker's Symphony, 180
Harvey Dent's Lament, 210
Gotham's Guardian, 195
The Batmobile Chase, 225
The Dark Knight's Theme, 150
The Joker's Mind Games, 180
Rachel's Tragedy, 210
Gotham's Last Stand, 240
The Dark Knight's Triumph, 180

In [ ]:

Copied!

output
output

Out[ ]:

Album(name="Gotham's Reckoning", artist=' The Dark Knight', songs=[Song(title='A Dark Knight Rises', length_seconds=240), Song(title="The Joker's Symphony", length_seconds=180), Song(title="Harvey Dent's Lament", length_seconds=210), Song(title="Gotham's Guardian", length_seconds=195), Song(title='The Batmobile Chase', length_seconds=225), Song(title="The Dark Knight's Theme", length_seconds=150), Song(title="The Joker's Mind Games", length_seconds=180), Song(title="Rachel's Tragedy", length_seconds=210), Song(title="Gotham's Last Stand", length_seconds=240), Song(title="The Dark Knight's Triumph", length_seconds=180)])

LLM Pydantic程序¶

将内容提取到 Album 类中¶

使用Pydantic输出解析器进行初始化¶

定义自定义输出解析器¶

将内容提取到 `Album` 类中¶