无穷大

Infinity 允许使用 MIT 许可的嵌入服务器创建 Embeddings。

本笔记将介绍如何使用 Infinity Github 项目中的 Embeddings 与 Langchain。

导入

from langchain_community.embeddings import InfinityEmbeddings, InfinityEmbeddingsLocal

选项 1：从 Python 使用 infinity

可选：安装 infinity

要安装 infinity，请使用以下命令。有关更多详细信息，请查看 Github 上的文档。

安装 torch 和 onnx 依赖项。

pip install infinity_emb[torch,optimum]

documents = [
    "法棍是一道菜。",
    "巴黎是法国的首都。",
    "numpy 是用于线性代数的库",
    "你逃脱了我逃脱的东西 - 你也会在巴黎玩得很疯狂",
]
query = "巴黎在哪里？"

embeddings = InfinityEmbeddingsLocal(
    model="sentence-transformers/all-MiniLM-L6-v2",
    # revision
    revision=None,
    # 最好保持在 32
    batch_size=32,
    # 适用于 AMD/Nvidia GPU 的 torch
    device="cuda",
    # 执行前先热身模型
)

/home/michael/langchain/libs/langchain/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
/home/michael/langchain/libs/langchain/.venv/lib/python3.10/site-packages/optimum/bettertransformer/models/encoder_models.py:301: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:177.)
  hidden_states = torch._nested_tensor_from_mask(hidden_states, ~attention_mask)

# 运行异步代码，可以按照您的喜好进行操作
# 如果您在 jupyter notebook 中，可以使用以下方法
documents_embedded, query_result = await embed()

# (演示) 计算相似度
import numpy as np
scores = np.array(documents_embedded) @ np.array(query_result).T
dict(zip(documents, scores))

选项 2：运行服务器，并通过 API 连接

可选：确保启动 Infinity 实例

要安装 infinity，请使用以下命令。有关更多详细信息，请查看 Github 上的文档。

pip install infinity_emb[all]

# 安装 infinity 包
%pip install --upgrade --quiet  infinity_emb[all]

启动服务器 - 最好从单独的终端进行，而不是在 Jupyter Notebook 中进行

model=sentence-transformers/all-MiniLM-L6-v2
port=7797
infinity_emb --port $port --model-name-or-path $model

或者可以使用 docker：

model=sentence-transformers/all-MiniLM-L6-v2
port=7797
docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path $model --port $port

使用您的 Infinity 实例嵌入您的文档

documents = [
    "法棍是一道菜。",
    "巴黎是法国的首都。",
    "numpy 是用于线性代数的库",
    "你逃脱了我逃脱的东西 - 你也会在巴黎玩得很疯狂",
]
query = "巴黎在哪里？"

#
infinity_api_url = "http://localhost:7797/v1"
# model is currently not validated.
embeddings = InfinityEmbeddings(
    model="sentence-transformers/all-MiniLM-L6-v2", infinity_api_url=infinity_api_url
)
try:
    documents_embedded = embeddings.embed_documents(documents)
    query_result = embeddings.embed_query(query)
    print("embeddings created successful")
except Exception as ex:
    print(
        "Make sure the infinity instance is running. Verify by clicking on "
        f"{infinity_api_url.replace('v1','docs')} Exception: {ex}. "
    )

确保无限实例正在运行。通过单击 http://localhost:7797/docs 进行验证。异常: HTTPConnectionPool(host='localhost', port=7797): Max retries exceeded with url: /v1/embeddings (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f91c35dbd30>: Failed to establish a new connection: [Errno 111] Connection refused'))。

# (演示) 计算相似度
import numpy as np
scores = np.array(documents_embedded) @ np.array(query_result).T
dict(zip(documents, scores))

{'Baguette is a dish.': 0.31344215908661155,

'Paris is the capital of France.': 0.8148670296896388,

'numpy is a lib for linear algebra': 0.004429399861302009,

"You escaped what I've escaped - You'd be in Paris getting fucked up too": 0.5088476180154582}

无穷大

导入

选项 1：从 Python 使用 infinity

可选：安装 infinity

选项 2：运行服务器，并通过 API 连接

可选：确保启动 Infinity 实例

使用您的 Infinity 实例嵌入您的文档

Was this page helpful?

You can leave detailed feedback on GitHub.

无穷大

导入​

选项 1：从 Python 使用 infinity

可选：安装 infinity​

选项 2：运行服务器，并通过 API 连接

可选：确保启动 Infinity 实例​

使用您的 Infinity 实例嵌入您的文档​

Was this page helpful?

You can leave detailed feedback on GitHub.

导入

可选：安装 infinity

可选：确保启动 Infinity 实例

使用您的 Infinity 实例嵌入您的文档