备注

Sentence Transformers v3.0 刚刚发布，引入了新的 Sentence Transformer 模型训练 API。阅读 SentenceTransformer > 训练概述以了解更多关于训练 API 的信息，并查看 v3.0 发布说明获取其他变更的详细信息。

SentenceTransformers 文档¶

Sentence Transformers (a.k.a. SBERT) is the go-to Python module for accessing, using, and training state-of-the-art text and image embedding models. It can be used to compute embeddings using Sentence Transformer models (quickstart) or to calculate similarity scores using Cross-Encoder models (quickstart). This unlocks a wide range of applications, including semantic search, semantic textual similarity, and paraphrase mining.

在 🤗 Hugging Face 上，有超过 5,000 个预训练的 Sentence Transformers 模型可供立即使用，其中包括许多来自 Massive Text Embeddings Benchmark (MTEB) 排行榜的最新模型。此外，使用 Sentence Transformers 可以轻松地训练或微调你自己的模型，使你能够为特定的使用场景创建定制模型。

Sentence Transformers 由 UKPLab 创建，并由 🤗 Hugging Face 维护。如果遇到问题或有进一步的问题，请不要犹豫在 Sentence Transformers 仓库上开启一个议题。

用法¶

参见

有关如何使用 Sentence Transformers 的更多快速信息，请参阅快速入门。

使用 Sentence Transformer 模型非常简单：

from sentence_transformers import SentenceTransformer

# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")

# The sentences to encode
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
#         [0.6660, 1.0000, 0.1411],
#         [0.1046, 0.1411, 1.0000]])

下一步是什么？¶

考虑阅读以下部分之一以回答相关问题：

如何**使用** Sentence Transformer 模型？ Sentence Transformers > 使用
我可以使用哪些 Sentence Transformer 模型？ Sentence Transformers > 预训练模型
如何 训练/微调 一个 Sentence Transformer 模型？ Sentence Transformers > 训练概述
如何**使用**交叉编码器模型？交叉编码器 > 使用
我可以使用哪些 Cross Encoder 模型？ Cross Encoder > 预训练模型

引用¶

如果你发现这个仓库有帮助，欢迎引用我们的出版物 Sentence-BERT: 使用孪生BERT网络的句子嵌入：

@inproceedings{reimers-2019-sentence-bert,
  title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
  author = "Reimers, Nils and Gurevych, Iryna",
  booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
  month = "11",
  year = "2019",
  publisher = "Association for Computational Linguistics",
  url = "https://arxiv.org/abs/1908.10084",
}

如果你使用的是多语言模型之一，欢迎引用我们的出版物 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation：

@inproceedings{reimers-2020-multilingual-sentence-bert,
  title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
  author = "Reimers, Nils and Gurevych, Iryna",
  booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
  month = "11",
  year = "2020",
  publisher = "Association for Computational Linguistics",
  url = "https://arxiv.org/abs/2004.09813",
}

如果你使用数据增强的代码，请随意引用我们的出版物增强的SBERT: 改进成对句子评分任务的双编码器数据增强方法:

@inproceedings{thakur-2020-AugSBERT,
  title = "Augmented {SBERT}: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks",
  author = "Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes  and Gurevych, Iryna",
  booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
  month = jun,
  year = "2021",
  address = "Online",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2021.naacl-main.28",
  pages = "296--310",
}