使用Intel® Extension for Transformers优化的BGE嵌入模型¶
LlamaIndex支持加载由Intel® Extension for Transformers (ITREX)生成的量化BGE嵌入模型,并使用ITREX Neural Engine,这是一个高性能的NLP后端,可以加速模型的推断而不影响准确性。
请参考我们的博客使用Intel Extension for Transformers实现高效的自然语言嵌入模型和BGE优化示例以获取更多详细信息。
为了能够加载和使用量化模型,请安装所需的依赖项pip install intel-extension-for-transformers torch accelerate datasets onnx
。
加载使用ItrexQuantizedBgeEmbedding
类完成;使用方式类似于HuggingFace的本地嵌入模型;参见示例:
In [ ]:
Copied!
%pip install llama-index-embeddings-huggingface-itrex
%pip install llama-index-embeddings-huggingface-itrex
In [ ]:
Copied!
from llama_index.embeddings.huggingface_itrex import ItrexQuantizedBgeEmbedding
embed_model = ItrexQuantizedBgeEmbedding(
"Intel/bge-small-en-v1.5-sts-int8-static-inc"
)
from llama_index.embeddings.huggingface_itrex import ItrexQuantizedBgeEmbedding
embed_model = ItrexQuantizedBgeEmbedding(
"Intel/bge-small-en-v1.5-sts-int8-static-inc"
)
/home/yuwenzho/.conda/envs/yuwen/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm 2024-03-29 15:40:42 [INFO] Start to extarct onnx model ops... 2024-03-29 15:40:42 [INFO] Extract onnxruntime model done... 2024-03-29 15:40:42 [INFO] Start to implement Sub-Graph matching and replacing... 2024-03-29 15:40:43 [INFO] Sub-Graph match and replace done...
In [ ]:
Copied!
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
384 [-0.005477035418152809, -0.000541043293196708, 0.036467909812927246, -0.04861024394631386, 0.0288068987429142]