HuggingFace LLM - Camel-5b¶
这是一个基于Camel-5b模型的HuggingFace LLM(语言模型)示例。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-llms-huggingface
%pip install llama-index-llms-huggingface
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import Settings
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import Settings
INFO:numexpr.utils:Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. INFO:numexpr.utils:NumExpr defaulting to 8 threads. NumExpr defaulting to 8 threads.
/home/loganm/miniconda3/envs/gpt_index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档,构建VectorStoreIndex¶
In [ ]:
Copied!
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
In [ ]:
Copied!
# 设置提示 - 特定于 StableLM
from llama_index.core import PromptTemplate
# 这将包装llama-index内部的默认提示
# 取自https://huggingface.co/Writer/camel-5b-hf
query_wrapper_prompt = PromptTemplate(
"以下是描述任务的指示。"
"编写一个适当完成请求的响应。\n\n"
"### 指示:\n{query_str}\n\n### 响应:"
)
# 设置提示 - 特定于 StableLM
from llama_index.core import PromptTemplate
# 这将包装llama-index内部的默认提示
# 取自https://huggingface.co/Writer/camel-5b-hf
query_wrapper_prompt = PromptTemplate(
"以下是描述任务的指示。"
"编写一个适当完成请求的响应。\n\n"
"### 指示:\n{query_str}\n\n### 响应:"
)
In [ ]:
Copied!
import torch
llm = HuggingFaceLLM(
context_window=2048,
max_new_tokens=256,
generate_kwargs={"temperature": 0.25, "do_sample": False},
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="Writer/camel-5b-hf",
model_name="Writer/camel-5b-hf",
device_map="auto",
tokenizer_kwargs={"max_length": 2048},
# 如果使用CUDA来减少内存使用量,请取消注释下面的内容
# model_kwargs={"torch_dtype": torch.float16}
)
Settings.chunk_size = 512
Settings.llm = llm
import torch
llm = HuggingFaceLLM(
context_window=2048,
max_new_tokens=256,
generate_kwargs={"temperature": 0.25, "do_sample": False},
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="Writer/camel-5b-hf",
model_name="Writer/camel-5b-hf",
device_map="auto",
tokenizer_kwargs={"max_length": 2048},
# 如果使用CUDA来减少内存使用量,请取消注释下面的内容
# model_kwargs={"torch_dtype": torch.float16}
)
Settings.chunk_size = 512
Settings.llm = llm
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:43<00:00, 14.34s/it]
In [ ]:
Copied!
index = VectorStoreIndex.from_documents(documents)
index = VectorStoreIndex.from_documents(documents)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens > [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 27212 tokens > [build_index_from_nodes] Total embedding token usage: 27212 tokens
# Query Index
This notebook demonstrates how to query the index of a DataFrame in pandas.
## Querying the Index
To query the index of a DataFrame, you can use the `index` attribute of the DataFrame.
```python
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 40, 45]}
df = pd.DataFrame(data)
# Query the index
index = df.index
print(index)
The output will be the index of the DataFrame:
RangeIndex(start=0, stop=5, step=1)
In this example, the index is a RangeIndex
starting from 0 and ending at 5 (exclusive) with a step of 1.
You can also use the iloc
attribute to query the index by position:
# Query the index using iloc
index_value = df.iloc[2].name
print(index_value)
The output will be the index value at the specified position:
2
This demonstrates how to query the index of a DataFrame in pandas.
In [ ]:
Copied!
# 将日志级别设置为DEBUG,以获得更详细的输出
query_engine = index.as_query_engine()
response = query_engine.query("作者在成长过程中做了什么?")
# 将日志级别设置为DEBUG,以获得更详细的输出
query_engine = index.as_query_engine()
response = query_engine.query("作者在成长过程中做了什么?")
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens > [retrieve] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens > [retrieve] Total embedding token usage: 8 tokens
Token indices sequence length is longer than the specified maximum sequence length for this model (954 > 512). Running this sequence through the model will result in indexing errors Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1026 tokens > [get_response] Total LLM token usage: 1026 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens
In [ ]:
Copied!
print(response)
print(response)
The author grew up in a small town in England, attended a prestigious private school, and then went to Cambridge University, where he studied computer science. Afterward, he worked on web infrastructure, wrote essays, and then realized he could write about startups. He then started giving talks, wrote a book, and started interviewing founders for a book on startups.
查询索引 - 流式处理¶
In [ ]:
Copied!
query_engine = index.as_query_engine(streaming=True)
query_engine = index.as_query_engine(streaming=True)
In [ ]:
Copied!
# 将日志级别设置为DEBUG,以获得更详细的输出
response_stream = query_engine.query("作者在成长过程中做了什么?")
# 将日志级别设置为DEBUG,以获得更详细的输出
response_stream = query_engine.query("作者在成长过程中做了什么?")
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens > [retrieve] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens > [retrieve] Total embedding token usage: 8 tokens
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens > [get_response] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens > [get_response] Total embedding token usage: 0 tokens
In [ ]:
Copied!
# 可能在开始流式传输时速度较慢,因为羊驼索引通常涉及许多 LLM 调用
response_stream.print_response_stream()
# 可能在开始流式传输时速度较慢,因为羊驼索引通常涉及许多 LLM 调用
response_stream.print_response_stream()
The author grew up in a small town in England, attended a prestigious private school, and then went to Cambridge University, where he studied computer science. Afterward, he worked on web infrastructure, wrote essays, and then realized he could write about startups. He then started giving talks, wrote a book, and started interviewing founders for a book on startups.<|endoftext|>
In [ ]:
Copied!
# 也可以获得一个普通的响应对象
response = response_stream.get_response()
print(response)
# 也可以获得一个普通的响应对象
response = response_stream.get_response()
print(response)
In [ ]:
Copied!
# 也可以自己迭代生成器
generated_text = ""
for text in response.response_gen:
generated_text += text
# 也可以自己迭代生成器
generated_text = ""
for text in response.response_gen:
generated_text += text