ChatGPT¶
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档,构建VectorStoreIndex¶
In [ ]:
Copied!
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
In [ ]:
Copied!
# 设置全局设置配置
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512
# 设置全局设置配置
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512
In [ ]:
Copied!
index = VectorStoreIndex.from_documents(documents)
index = VectorStoreIndex.from_documents(documents)
查询索引¶
默认情况下,借助langchain的PromptSelector抽象,如果使用ChatGPT模型,则会使用经过修改的细化提示,以适应ChatGPT的使用。
In [ ]:
Copied!
query_engine = index.as_query_engine(
similarity_top_k=3,
streaming=True,
)
response = query_engine.query(
"What did the author do growing up?",
)
query_engine = index.as_query_engine(
similarity_top_k=3,
streaming=True,
)
response = query_engine.query(
"What did the author do growing up?",
)
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens > [retrieve] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens > [retrieve] Total embedding token usage: 8 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens > [get_response] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens
In [ ]:
Copied!
response.print_response_stream()
response.print_response_stream()
Before college, the author worked on writing short stories and programming on an IBM 1401 using an early version of Fortran. They also worked on programming with microcomputers and eventually created a new dialect of Lisp called Arc. They later realized the potential of publishing essays on the web and began writing and publishing them. The author also worked on spam filters, painting, and cooking for groups.
In [ ]:
Copied!
query_engine = index.as_query_engine(
similarity_top_k=5,
streaming=True,
)
response = query_engine.query(
"What did the author do during his time at RISD?",
)
query_engine = index.as_query_engine(
similarity_top_k=5,
streaming=True,
)
response = query_engine.query(
"What did the author do during his time at RISD?",
)
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens > [retrieve] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 12 tokens > [retrieve] Total embedding token usage: 12 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens > [get_response] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens
In [ ]:
Copied!
response.print_response_stream()
response.print_response_stream()
The author attended RISD and took classes in fundamental subjects like drawing, color, and design. They also learned a lot in the color class they took, but otherwise, they were basically teaching themselves to paint. The author dropped out of RISD in 1993.
细化提示:这是聊天细化提示
In [ ]:
Copied!
from llama_index.core.prompts.chat_prompts import CHAT_REFINE_PROMPT
from llama_index.core.prompts.chat_prompts import CHAT_REFINE_PROMPT
In [ ]:
Copied!
dict(CHAT_REFINE_PROMPT.prompt)
dict(CHAT_REFINE_PROMPT.prompt)
查询索引(使用标准的改进提示)¶
如果我们使用“标准”的改进提示(其中提示是一个文本模板而不是多个消息),我们发现在ChatGPT上的结果更糟糕。
In [ ]:
Copied!
from llama_index.core.prompts.default_prompts import DEFAULT_REFINE_PROMPT
from llama_index.core.prompts.default_prompts import DEFAULT_REFINE_PROMPT
In [ ]:
Copied!
query_engine = index.as_query_engine(
refine_template=DEFAULT_REFINE_PROMPT,
similarity_top_k=5,
streaming=True,
)
response = query_engine.query(
"What did the author do during his time at RISD?",
)
query_engine = index.as_query_engine(
refine_template=DEFAULT_REFINE_PROMPT,
similarity_top_k=5,
streaming=True,
)
response = query_engine.query(
"What did the author do during his time at RISD?",
)
In [ ]:
Copied!
response.print_response_stream()
response.print_response_stream()