在 Kubernetes 上部署文本摘要器#

注意： Ray Serve 应用程序及其客户端的 Python 文件位于 ray-project/serve_config_examples 仓库中。

步骤 1：使用 GPU 创建 Kubernetes 集群#

按照 aws-eks-gpu-cluster.md 或 gcp-gke-gpu-cluster.md 创建一个包含1个CPU节点和1个GPU节点的Kubernetes集群。

步骤 2: 安装 KubeRay 操作员#

按照这份文档通过 Helm 仓库安装最新稳定的 KubeRay 操作员。请注意，此示例中的 YAML 文件使用了 serveConfigV2，这是从 KubeRay v0.6.0 开始支持的。

步骤 3：安装 RayService#

# Step 3.1: Download `ray-service.text-summarizer.yaml`
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.0.0/ray-operator/config/samples/ray-service.text-summarizer.yaml

# Step 3.2: Create a RayService
kubectl apply -f ray-service.text-summarizer.yaml

此 RayService 配置包含一些重要设置：

工作节点的 tolerations 允许它们在没有污点的节点上或具有特定污点的节点上进行调度。然而，工作节点只会被调度到GPU节点上，因为我们已经在Pod的资源配置中设置了 nvidia.com/gpu: 1。
```
# Please add the following taints to the GPU node.
tolerations:
    - key: "ray.io/node-type"
    operator: "Equal"
    value: "worker"
    effect: "NoSchedule"
```

步骤 4：转发 Serve 的端口#

首先从该命令中获取服务名称。

kubectl get services

然后，端口转发到服务器。

kubectl port-forward svc/text-summarizer-serve-svc 8000

请注意，RayService 的 Kubernetes 服务将在 Serve 应用程序准备就绪并运行后创建。此过程可能在 RayCluster 中的所有 Pod 运行后大约需要 1 分钟。

步骤 5：向 text_summarizer 模型发送请求#

# Step 5.1: Download `text_summarizer_req.py`
curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/text_summarizer/text_summarizer_req.py

# Step 5.2: Send a request to the Summarizer model.
python text_summarizer_req.py
# Check printed to console

步骤 6：删除您的服务#

# path: ray-operator/config/samples/
kubectl delete -f ray-service.text-summarizer.yaml

步骤 7：卸载您的 kuberay 操作员#

按照此文档通过 Helm 仓库卸载最新稳定的 KubeRay 操作符。