RayService 高可用性#

RayService 为服务提供了高可用性（HA），以在 Ray 头 Pod 失败时继续处理请求。

先决条件#

使用 RayService 配合 KubeRay 1.0.0 或更高版本。
在 RayService 中启用 GCS 容错。

快速入门#

步骤 1：使用 Kind 创建一个 Kubernetes 集群#

kind create cluster --image=kindest/node:v1.26.0

步骤 2：安装 KubeRay 操作员#

按照这个文档从 Helm 仓库安装最新稳定版本的 KubeRay 操作员。

步骤 3：使用 GCS 故障容错安装 RayService#

curl -LO https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-service.high-availability.yaml
kubectl apply -f ray-service.high-availability.yaml

文件 ray-service.high-availability.yaml 包含多个 Kubernetes 对象：

Redis: Redis 是使 GCS 具有容错能力的必要条件。更多详情请参见 GCS 容错。
RayService: 这个 RayService 自定义资源包含一个 3 节点的 RayCluster 和一个简单的 Ray Serve 应用。
Ray Pod: 此 Pod 向 RayService 发送请求。

步骤 4：验证 Kubernetes Serve 服务#

检查以下命令的输出，以验证您是否成功启动了 Kubernetes Serve 服务：

# Step 4.1: KubeRay creates the K8s service `rayservice-ha-serve-svc` after the Ray Serve applications are ready.
kubectl describe svc rayservice-ha-serve-svc

# Step 4.2: `rayservice-ha-serve-svc` should have 3 endpoints, including the Ray head and two Ray workers.
# Endpoints:         10.244.0.29:8000,10.244.0.30:8000,10.244.0.32:8000

步骤 5：验证服务应用程序#

在 ray-service.high-availability.yaml 文件中，serveConfigV2 参数为每个 Ray Serve 部署指定了 num_replicas: 2 和 max_replicas_per_node: 1。此外，YAML 将 rayStartParams 参数设置为 num-cpus: "0"，以确保系统不会在 Ray head Pod 上调度任何 Ray Serve 副本。

总的来说，每个 Ray Serve 部署有两个副本，每个 Ray 节点最多可以有一个这些 Ray Serve 副本。此外，Ray Serve 副本不能调度在 Ray 头 Pod 上。因此，每个工作节点应该为每个 Ray Serve 部署恰好有一个 Ray Serve 副本。

对于 Ray Serve，Ray 头节点始终有一个 HTTPProxyActor，无论它是否有 Ray Serve 副本。Ray 工作节点只有在有 Ray Serve 副本时才会有 HTTPProxyActors。因此，上一步中的 rayservice-ha-serve-svc 服务有 3 个端点。

# Port forward the Ray Dashboard.
kubectl port-forward svc/rayservice-ha-head-svc 8265:8265
# Visit ${YOUR_IP}:8265 in your browser for the Dashboard (e.g. 127.0.0.1:8265)
# Check:
# (1) Both head and worker nodes have HTTPProxyActors.
# (2) Only worker nodes have Ray Serve replicas.
# (3) Each worker node has one Ray Serve replica for each Ray Serve deployment.

步骤 6：向 RayService 发送请求#

# Log into the separate Ray Pod.
kubectl exec -it ray-pod -- bash

# Send requests to the RayService.
python3 samples/query.py

# This script sends the same request to the RayService consecutively, ensuring at most one in-flight request at a time.
# The request is equivalent to `curl -X POST -H 'Content-Type: application/json' localhost:8000/fruit/ -d '["PEAR", 12]'`.

# [Example output]
# req_index : 2197, num_fail: 0
# response: 12
# req_index : 2198, num_fail: 0
# response: 12
# req_index : 2199, num_fail: 0

步骤 7：删除 Ray 头部 Pod#

# Step 7.1: Delete the Ray head Pod.
export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers)
kubectl delete pod $HEAD_POD

在这个例子中，query.py 确保在任何给定时间最多只有一个请求在处理中。此外，Ray 头 Pod 没有任何 Ray Serve 副本。只有在 Ray 头 Pod 上的 HTTPProxyActor 处理请求时，请求才可能失败。因此，在删除和恢复 Ray 头 Pod 期间，失败发生的可能性非常低。你可以在 Ray 脚本中实现重试逻辑来处理这些失败。

# [Expected output]: The `num_fail` is highly likely to be 0.
req_index : 32503, num_fail: 0
response: 12
req_index : 32504, num_fail: 0
response: 12

步骤 8：清理#

kind delete cluster