英特尔® Transformers 量化文本嵌入扩展
加载由Intel® Extension for Transformers (ITREX)生成的量化BGE嵌入模型,并使用ITREX的Neural Engine,一个高性能的NLP后端,来加速模型的推理而不影响准确性。
请参考我们的博客使用英特尔扩展的高效自然语言嵌入模型和BGE优化示例以获取更多详细信息。
from langchain_community.embeddings import QuantizedBgeEmbeddings
model_name = "Intel/bge-small-en-v1.5-sts-int8-static-inc"
encode_kwargs = {"normalize_embeddings": True} # set True to compute cosine similarity
model = QuantizedBgeEmbeddings(
model_name=model_name,
encode_kwargs=encode_kwargs,
query_instruction="Represent this sentence for searching relevant passages: ",
)
API Reference:QuantizedBgeEmbeddings
/home/yuwenzho/.conda/envs/bge/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
2024-03-04 10:17:17 [INFO] Start to extarct onnx model ops...
2024-03-04 10:17:17 [INFO] Extract onnxruntime model done...
2024-03-04 10:17:17 [INFO] Start to implement Sub-Graph matching and replacing...
2024-03-04 10:17:18 [INFO] Sub-Graph match and replace done...
用法
text = "This is a test document."
query_result = model.embed_query(text)
doc_result = model.embed_documents([text])