ChatDatabricks
Databricks Lakehouse 平台在一个平台上统一了数据、分析和人工智能。
本笔记本提供了快速入门Databricks 聊天模型的概述。有关ChatDatabricks所有功能和配置的详细文档,请访问API参考。
概述
ChatDatabricks
类封装了一个托管在 Databricks Model Serving 上的聊天模型端点。此示例笔记本展示了如何封装您的服务端点并在您的 LangChain 应用程序中将其用作聊天模型。
集成详情
类 | 包 | 本地 | 可序列化 | 包下载量 | 包最新版本 |
---|---|---|---|---|---|
ChatDatabricks | databricks-langchain | ❌ | beta |
模型特性
工具调用 | 结构化输出 | JSON模式 | 图像输入 | 音频输入 | 视频输入 | 令牌级流式传输 | 原生异步 | 令牌使用 | Logprobs |
---|---|---|---|---|---|---|---|---|---|
✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ |
支持的方法
ChatDatabricks
支持 ChatModel
的所有方法,包括异步 API。
端点要求
服务端点 ChatDatabricks
包装的必须具有与 OpenAI 兼容的聊天输入/输出格式(参考)。只要输入格式兼容,ChatDatabricks
就可以用于托管在 Databricks 模型服务 上的任何端点类型:
- 基础模型 - 精选的最先进基础模型列表,如DRBX、Llama3、Mixtral-8x7B等。这些端点无需任何设置即可在您的Databricks工作区中使用。
- 自定义模型 - 您还可以通过MLflow将自定义模型部署到服务端点,选择您喜欢的框架,如LangChain、Pytorch、Transformers等。
- 外部模型 - Databricks 端点可以作为代理服务于托管在 Databricks 外部的模型,例如像 OpenAI GPT4 这样的专有模型服务。
设置
要访问Databricks模型,您需要创建一个Databricks账户,设置凭据(仅当您在Databricks工作区外部时),并安装所需的包。
凭证(仅当您在Databricks外部时)
如果您在Databricks中运行LangChain应用程序,可以跳过此步骤。
否则,您需要手动将Databricks工作区主机名和个人访问令牌分别设置为DATABRICKS_HOST
和DATABRICKS_TOKEN
环境变量。有关如何获取访问令牌的信息,请参阅认证文档。
import getpass
import os
os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
if "DATABRICKS_TOKEN" not in os.environ:
os.environ["DATABRICKS_TOKEN"] = getpass.getpass(
"Enter your Databricks access token: "
)
Enter your Databricks access token: ········
安装
LangChain Databricks 集成位于 databricks-langchain
包中。
%pip install -qU databricks-langchain
我们首先演示如何使用ChatDatabricks
查询作为基础模型端点托管的DBRX-instruct模型。
对于其他类型的端点,设置端点本身的方式有一些不同,然而,一旦端点准备就绪,使用ChatDatabricks
查询它的方式没有区别。请参考本笔记本底部的其他类型端点的示例。
实例化
from databricks_langchain import ChatDatabricks
chat_model = ChatDatabricks(
endpoint="databricks-dbrx-instruct",
temperature=0.1,
max_tokens=256,
# See https://python.langchain.com/api_reference/community/chat_models/langchain_community.chat_models.databricks.ChatDatabricks.html for other supported parameters
)
调用
chat_model.invoke("What is MLflow?")
AIMessage(content='MLflow is an open-source platform for managing end-to-end machine learning workflows. It was introduced by Databricks in 2018. MLflow provides tools for tracking experiments, packaging and sharing code, and deploying models. It is designed to work with any machine learning library and can be used in a variety of environments, including local machines, virtual machines, and cloud-based clusters. MLflow aims to streamline the machine learning development lifecycle, making it easier for data scientists and engineers to collaborate and deploy models into production.', response_metadata={'prompt_tokens': 229, 'completion_tokens': 104, 'total_tokens': 333}, id='run-d3fb4d06-3e10-4471-83c9-c282cc62b74d-0')
# You can also pass a list of messages
messages = [
("system", "You are a chatbot that can answer questions about Databricks."),
("user", "What is Databricks Model Serving?"),
]
chat_model.invoke(messages)
AIMessage(content='Databricks Model Serving is a feature of the Databricks platform that allows data scientists and engineers to easily deploy machine learning models into production. With Model Serving, you can host, manage, and serve machine learning models as APIs, making it easy to integrate them into applications and business processes. It supports a variety of popular machine learning frameworks, including TensorFlow, PyTorch, and scikit-learn, and provides tools for monitoring and managing the performance of deployed models. Model Serving is designed to be scalable, secure, and easy to use, making it a great choice for organizations that want to quickly and efficiently deploy machine learning models into production.', response_metadata={'prompt_tokens': 35, 'completion_tokens': 130, 'total_tokens': 165}, id='run-b3feea21-223e-4105-8627-41d647d5ccab-0')
链式调用
与其他聊天模型类似,ChatDatabricks
可以作为复杂链的一部分使用。
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a chatbot that can answer questions about {topic}.",
),
("user", "{question}"),
]
)
chain = prompt | chat_model
chain.invoke(
{
"topic": "Databricks",
"question": "What is Unity Catalog?",
}
)
AIMessage(content="Unity Catalog is a new data catalog feature in Databricks that allows you to discover, manage, and govern all your data assets across your data landscape, including data lakes, data warehouses, and data marts. It provides a centralized repository for storing and managing metadata, data lineage, and access controls for all your data assets. Unity Catalog enables data teams to easily discover and access the data they need, while ensuring compliance with data privacy and security regulations. It is designed to work seamlessly with Databricks' Lakehouse platform, providing a unified experience for managing and analyzing all your data.", response_metadata={'prompt_tokens': 32, 'completion_tokens': 118, 'total_tokens': 150}, id='run-82d72624-f8df-4c0d-a976-919feec09a55-0')
调用(流式)
for chunk in chat_model.stream("How are you?"):
print(chunk.content, end="|")
I|'m| an| AI| and| don|'t| have| feelings|,| but| I|'m| here| and| ready| to| assist| you|.| How| can| I| help| you| today|?||
异步调用
import asyncio
country = ["Japan", "Italy", "Australia"]
futures = [chat_model.ainvoke(f"Where is the capital of {c}?") for c in country]
await asyncio.gather(*futures)
工具调用
ChatDatabricks 支持与 OpenAI 兼容的工具调用 API,该 API 允许您描述工具及其参数,并让模型返回一个 JSON 对象,其中包含要调用的工具以及该工具的输入。工具调用对于构建使用工具的链和代理,以及更普遍地从模型中获取结构化输出非常有用。
通过ChatDatabricks.bind_tools
,我们可以轻松地将Pydantic类、字典模式、LangChain工具甚至函数作为工具传递给模型。在底层,这些被转换为与OpenAI兼容的工具模式,看起来像:
{
"name": "...",
"description": "...",
"parameters": {...} # JSONSchema
}
并在每次模型调用中传递。
from pydantic import BaseModel, Field
class GetWeather(BaseModel):
"""Get the current weather in a given location"""
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
class GetPopulation(BaseModel):
"""Get the current population in a given location"""
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
llm_with_tools = chat_model.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke(
"Which city is hotter today and which is bigger: LA or NY?"
)
print(ai_msg.tool_calls)
包装自定义模型端点
先决条件:
- 一个LLM已通过MLflow注册并部署到Databricks服务端点。该端点必须具有与OpenAI兼容的聊天输入/输出格式(参考)
- 您对端点拥有"Can Query"权限。
一旦端点准备就绪,使用模式与基础模型相同。
chat_model_custom = ChatDatabricks(
endpoint="YOUR_ENDPOINT_NAME",
temperature=0.1,
max_tokens=256,
)
chat_model_custom.invoke("How are you?")
包装外部模型
前提条件:创建代理端点
首先,创建一个新的Databricks服务端点,该端点将请求代理到目标外部模型。对于代理外部模型,端点的创建应该相当快速。
这需要在Databricks秘密管理器中注册您的OpenAI API密钥,如下所示:
# Replace `<scope>` with your scope
databricks secrets create-scope <scope>
databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY
有关如何设置Databricks CLI和管理密钥,请参考https://docs.databricks.com/en/security/secrets/secrets.html
from mlflow.deployments import get_deploy_client
client = get_deploy_client("databricks")
secret = "secrets/<scope>/openai-api-key" # replace `<scope>` with your scope
endpoint_name = "my-chat" # rename this if my-chat already exists
client.create_endpoint(
name=endpoint_name,
config={
"served_entities": [
{
"name": "my-chat",
"external_model": {
"name": "gpt-3.5-turbo",
"provider": "openai",
"task": "llm/v1/chat",
"openai_config": {
"openai_api_key": "{{" + secret + "}}",
},
},
}
],
},
)
一旦端点状态变为“就绪”,您可以像查询其他类型的端点一样查询该端点。
chat_model_external = ChatDatabricks(
endpoint=endpoint_name,
temperature=0.1,
max_tokens=256,
)
chat_model_external.invoke("How to use Databricks?")
在Databricks上调用函数
Databricks 函数调用与 OpenAI 兼容,仅在模型服务期间作为基础模型 API 的一部分可用。
请参阅Databricks 函数调用介绍以了解支持的模型。
llm = ChatDatabricks(endpoint="databricks-meta-llama-3-70b-instruct")
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
},
},
}
]
# supported tool_choice values: "auto", "required", "none", function name in string format,
# or a dictionary as {"type": "function", "function": {"name": <<tool_name>>}}
model = llm.bind_tools(tools, tool_choice="auto")
messages = [{"role": "user", "content": "What is the current temperature of Chicago?"}]
print(model.invoke(messages))
请参阅Databricks Unity Catalog了解如何在链中使用UC函数。
API参考
有关所有ChatDatabricks功能和配置的详细文档,请访问API参考:https://python.langchain.com/api_reference/databricks/chat_models/langchain_databricks.chat_models.ChatDatabricks.html