在 Kubernetes 上部署 StableDiffusion 文本到图像模型#

注意： Ray Serve 应用程序及其客户端的 Python 文件位于 ray-project/serve_config_examples 仓库和 Ray 文档中。

步骤 1：使用 GPU 创建 Kubernetes 集群#

按照 aws-eks-gpu-cluster.md 或 gcp-gke-gpu-cluster.md 创建一个包含1个CPU节点和1个GPU节点的Kubernetes集群。

步骤 2: 安装 KubeRay 操作员#

按照这份文档通过 Helm 仓库安装最新稳定的 KubeRay 操作员。请注意，此示例中的 YAML 文件使用了 serveConfigV2，这是从 KubeRay v0.6.0 开始支持的。

步骤 3：安装 RayService#

# Step 3.1: Download `ray-service.stable-diffusion.yaml`
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.0.0/ray-operator/config/samples/ray-service.stable-diffusion.yaml

# Step 3.2: Create a RayService
kubectl apply -f ray-service.stable-diffusion.yaml

此 RayService 配置包含一些重要设置：

工作节点的 tolerations 允许它们在没有污点的节点上或具有特定污点的节点上进行调度。然而，工作节点只会被调度到GPU节点上，因为我们已经在Pod的资源配置中设置了 nvidia.com/gpu: 1。
```
# Please add the following taints to the GPU node.
tolerations:
    - key: "ray.io/node-type"
    operator: "Equal"
    value: "worker"
    effect: "NoSchedule"
```
由于 ray-ml 镜像中默认不包含 diffusers 包，因此在 runtime_env 中包含了 diffusers。

步骤 4：转发 Serve 的端口#

首先从该命令中获取服务名称。

kubectl get services

然后，端口转发到服务器。

kubectl port-forward svc/stable-diffusion-serve-svc 8000

请注意，RayService 的 Kubernetes 服务将在 Serve 应用程序准备就绪并运行后创建。此过程可能在 RayCluster 中的所有 Pod 运行后大约需要 1 分钟。

步骤5：向文本到图像模型发送请求#

# Step 5.1: Download `stable_diffusion_req.py`
curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/stable_diffusion/stable_diffusion_req.py

# Step 5.2: Set your `prompt` in `stable_diffusion_req.py`.

# Step 5.3: Send a request to the Stable Diffusion model.
python stable_diffusion_req.py
# Check output.png

您可以参考文档 “服务一个稳定扩散模型” 以获取示例输出图像。