Token计数处理程序¶

本笔记本介绍了如何使用TokenCountingHandler以及如何使用它来跟踪您的提示、完成和嵌入标记的使用情况。

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

Copied!

%pip install llama-index-llms-openai
%pip install llama-index-llms-openai

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

设置¶

在这里，我们设置回调和服务上下文。我们设置全局设置，这样我们就不必担心将其传递给索引和查询。

In [ ]:

Copied!

import os

os.environ["OPENAI_API_KEY"] = "sk-..."
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

In [ ]:

Copied!





import tiktoken
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings


token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2)
Settings.callback_manager = CallbackManager([token_counter])
import tiktoken
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings


token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2)
Settings.callback_manager = CallbackManager([token_counter])

令牌计数¶

令牌计数器将跟踪嵌入、提示和完成令牌的使用情况。令牌计数是__累积__的，只有在您选择这样做时才会重置，使用token_counter.reset_counts()。

嵌入令牌使用情况¶

现在设置好了设置，让我们来跟踪我们的嵌入令牌使用情况。

下载数据¶

In [ ]:

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [ ]:

Copied!

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham").load_data()

In [ ]:

Copied!

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

In [ ]:

Copied!

print(token_counter.total_embedding_token_count)
print(token_counter.total_embedding_token_count)

看起来没问题！在继续之前，让我们重置计数。

In [ ]:

Copied!

token_counter.reset_counts()
token_counter.reset_counts()

LLM + 嵌入式令牌用法¶

接下来，让我们测试一个查询，看看计数是什么样子。

In [ ]:

Copied!

query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What did the author do growing up?")
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What did the author do growing up?")

In [ ]:

Copied!





print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
    "\n",
)
print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
    "\n",
)

Embedding Tokens:  8 
 LLM Prompt Tokens:  4518 
 LLM Completion Tokens:  45 
 Total LLM Token Count:  4563

Token Counting + Streaming（令牌计数 + 流处理）！¶

令牌计数处理程序还可以在流处理期间进行令牌计数。

在这里，令牌计数只会在流完成后进行。

In [ ]:

Copied!

token_counter.reset_counts()query_engine = index.as_query_engine(similarity_top_k=4, streaming=True)response = query_engine.query("Interleaf发生了什么？")# 完成流式处理for token in response.response_gen:    # print(token, end="", flush=True)    continue
token_counter.reset_counts()query_engine = index.as_query_engine(similarity_top_k=4, streaming=True)response = query_engine.query("Interleaf发生了什么？")# 完成流式处理for token in response.response_gen:    # print(token, end="", flush=True)    continue

In [ ]:

Copied!





print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
    "\n",
)
print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
    "\n",
)

Embedding Tokens:  6 
 LLM Prompt Tokens:  4563 
 LLM Completion Tokens:  123 
 Total LLM Token Count:  4686

高级用法¶

令牌计数器跟踪每个令牌使用事件，存储在一个名为TokenCountingEvent的对象中。该对象具有以下属性：

prompt -> 发送到LLM或嵌入模型的提示字符串
prompt_token_count -> LLM提示的令牌计数
completion -> 从LLM接收到的字符串完成（嵌入模型不使用）
completion_token_count -> LLM完成的令牌计数（嵌入模型不使用）
total_token_count -> 事件的提示+完成令牌的总数
event_id -> 事件的字符串ID，与其他回调处理程序对齐

这些事件在令牌计数器上以两个列表进行跟踪：

llm_token_counts
embedding_token_counts

让我们来看看它们是什么样子！

In [ ]:

Copied!





print("Num LLM token count events: ", len(token_counter.llm_token_counts))
print(
    "Num Embedding token count events: ",
    len(token_counter.embedding_token_counts),
)
print("Num LLM token count events: ", len(token_counter.llm_token_counts))
print(
    "Num Embedding token count events: ",
    len(token_counter.embedding_token_counts),
)

Num LLM token count events:  2
Num Embedding token count events:  1

这很有道理！之前的查询嵌入了查询文本，然后进行了2次LLM调用（因为top k为4，而默认的分块大小为1024，所以需要进行两次单独的调用，以便LLM可以读取所有检索到的文本）。

接下来，让我们快速看一下单个事件的这些事件是什么样子。

In [ ]:

Copied!





print("prompt: ", token_counter.llm_token_counts[0].prompt[:100], "...\n")
print(
    "prompt token count: ",
    token_counter.llm_token_counts[0].prompt_token_count,
    "\n",
)

print(
    "completion: ", token_counter.llm_token_counts[0].completion[:100], "...\n"
)
print(
    "completion token count: ",
    token_counter.llm_token_counts[0].completion_token_count,
    "\n",
)

print("total token count", token_counter.llm_token_counts[0].total_token_count)
print("prompt: ", token_counter.llm_token_counts[0].prompt[:100], "...\n")
print(
    "prompt token count: ",
    token_counter.llm_token_counts[0].prompt_token_count,
    "\n",
)

print(
    "completion: ", token_counter.llm_token_counts[0].completion[:100], "...\n"
)
print(
    "completion token count: ",
    token_counter.llm_token_counts[0].completion_token_count,
    "\n",
)

print("total token count", token_counter.llm_token_counts[0].total_token_count)

prompt:  system: You are an expert Q&A system that is trusted around the world.
Always answer the query using ...

prompt token count:  3873 

completion:  assistant: At Interleaf, the company had added a scripting language inspired by Emacs and made it a  ...

completion token count:  95 

total token count 3968