Token计数处理程序¶
本笔记本介绍了如何使用TokenCountingHandler以及如何使用它来跟踪您的提示、完成和嵌入标记的使用情况。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
设置¶
在这里,我们设置回调和服务上下文。我们设置全局设置,这样我们就不必担心将其传递给索引和查询。
In [ ]:
Copied!
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
In [ ]:
Copied!
import tiktoken
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2)
Settings.callback_manager = CallbackManager([token_counter])
import tiktoken
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2)
Settings.callback_manager = CallbackManager([token_counter])
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
In [ ]:
Copied!
print(token_counter.total_embedding_token_count)
print(token_counter.total_embedding_token_count)
20723
看起来没问题!在继续之前,让我们重置计数。
In [ ]:
Copied!
token_counter.reset_counts()
token_counter.reset_counts()
LLM + 嵌入式令牌用法¶
接下来,让我们测试一个查询,看看计数是什么样子。
In [ ]:
Copied!
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What did the author do growing up?")
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What did the author do growing up?")
In [ ]:
Copied!
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
Embedding Tokens: 8 LLM Prompt Tokens: 4518 LLM Completion Tokens: 45 Total LLM Token Count: 4563
In [ ]:
Copied!
token_counter.reset_counts()query_engine = index.as_query_engine(similarity_top_k=4, streaming=True)response = query_engine.query("Interleaf发生了什么?")# 完成流式处理for token in response.response_gen: # print(token, end="", flush=True) continue
token_counter.reset_counts()query_engine = index.as_query_engine(similarity_top_k=4, streaming=True)response = query_engine.query("Interleaf发生了什么?")# 完成流式处理for token in response.response_gen: # print(token, end="", flush=True) continue
In [ ]:
Copied!
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
Embedding Tokens: 6 LLM Prompt Tokens: 4563 LLM Completion Tokens: 123 Total LLM Token Count: 4686
高级用法¶
令牌计数器跟踪每个令牌使用事件,存储在一个名为TokenCountingEvent
的对象中。该对象具有以下属性:
- prompt -> 发送到LLM或嵌入模型的提示字符串
- prompt_token_count -> LLM提示的令牌计数
- completion -> 从LLM接收到的字符串完成(嵌入模型不使用)
- completion_token_count -> LLM完成的令牌计数(嵌入模型不使用)
- total_token_count -> 事件的提示+完成令牌的总数
- event_id -> 事件的字符串ID,与其他回调处理程序对齐
这些事件在令牌计数器上以两个列表进行跟踪:
- llm_token_counts
- embedding_token_counts
让我们来看看它们是什么样子!
In [ ]:
Copied!
print("Num LLM token count events: ", len(token_counter.llm_token_counts))
print(
"Num Embedding token count events: ",
len(token_counter.embedding_token_counts),
)
print("Num LLM token count events: ", len(token_counter.llm_token_counts))
print(
"Num Embedding token count events: ",
len(token_counter.embedding_token_counts),
)
Num LLM token count events: 2 Num Embedding token count events: 1
这很有道理!之前的查询嵌入了查询文本,然后进行了2次LLM调用(因为top k为4,而默认的分块大小为1024,所以需要进行两次单独的调用,以便LLM可以读取所有检索到的文本)。
接下来,让我们快速看一下单个事件的这些事件是什么样子。
In [ ]:
Copied!
print("prompt: ", token_counter.llm_token_counts[0].prompt[:100], "...\n")
print(
"prompt token count: ",
token_counter.llm_token_counts[0].prompt_token_count,
"\n",
)
print(
"completion: ", token_counter.llm_token_counts[0].completion[:100], "...\n"
)
print(
"completion token count: ",
token_counter.llm_token_counts[0].completion_token_count,
"\n",
)
print("total token count", token_counter.llm_token_counts[0].total_token_count)
print("prompt: ", token_counter.llm_token_counts[0].prompt[:100], "...\n")
print(
"prompt token count: ",
token_counter.llm_token_counts[0].prompt_token_count,
"\n",
)
print(
"completion: ", token_counter.llm_token_counts[0].completion[:100], "...\n"
)
print(
"completion token count: ",
token_counter.llm_token_counts[0].completion_token_count,
"\n",
)
print("total token count", token_counter.llm_token_counts[0].total_token_count)
prompt: system: You are an expert Q&A system that is trusted around the world. Always answer the query using ... prompt token count: 3873 completion: assistant: At Interleaf, the company had added a scripting language inspired by Emacs and made it a ... completion token count: 95 total token count 3968