KubeRay 自动伸缩#

本指南解释了如何在Kubernetes上配置Ray Autoscaler。Ray Autoscaler是一个Ray集群进程，它根据资源需求自动扩展和缩减集群。Autoscaler通过根据任务、角色或放置组所需的资源调整集群中的节点（Ray Pods）数量来实现这一点。

Autoscaler 利用逻辑资源请求，这些请求在 @ray.remote 中指示并在 ray status 中显示，而不是物理机器的利用率，来进行扩展。如果你启动一个 actor、任务或放置组，并且资源不足，Autoscaler 会将请求排队。它会调整节点数量以满足队列需求，并随着时间的推移移除没有任务、actor 或对象的空闲节点。

何时使用自动扩展？

自动扩展可以降低工作负载成本，但会增加节点启动开销，并且配置起来可能很棘手。如果你是Ray的新手，我们建议从非自动扩展集群开始。

Ray 自动扩展 V2 alpha 版与 KubeRay (@ray 2.10.0)

在 Ray 2.10 中，Ray Autoscaler V2 alpha 版本与 KubeRay 一起可用。它在可观察性和稳定性方面有所改进。详情请参见 section。

概述#

下图展示了 Ray Autoscaler 与 KubeRay 操作符的集成。虽然为了清晰起见被描绘为一个独立的实体，但实际上 Ray Autoscaler 是在实际实现中 Ray 头 Pod 内的一个边车容器。

KubeRay 中的 3 级自动扩展

Ray actor/任务：一些 Ray 库，如 Ray Serve，可以根据传入的请求量自动调整 Serve 副本（即 Ray actor）的数量。
Ray节点：Ray Autoscaler 根据 Ray 角色/任务的资源需求自动调整 Ray 节点（即 Ray Pods）的数量。
Kubernetes 节点：如果 Kubernetes 集群缺乏足够的资源来创建 Ray Autoscaler 生成的新 Ray Pod，Kubernetes Autoscaler 可以提供一个新的 Kubernetes 节点。您必须自行配置 Kubernetes Autoscaler。

Autoscaler 通过以下事件序列扩展集群：
1. 用户提交了一个 Ray 工作负载。
2. Ray 头容器聚合了工作负载的资源需求，并将它们传达给 Ray Autoscaler 边车。
3. 自动缩放器决定添加一个 Ray 工作节点 Pod 以满足工作负载的资源需求。
4. Autoscaler 通过增加 RayCluster CR 的 replicas 字段来请求额外的 worker Pod。
5. KubeRay 操作符会创建一个 Ray 工作 Pod 以匹配新的 replicas 规格。
6. Ray 调度器将用户的工作负载放置在新建的工作者 Pod 上。
自动伸缩器还会通过移除空闲的工作者Pod来缩减集群。如果它发现一个空闲的工作者Pod，它会减少RayCluster CR的replicas字段的计数，并将识别出的Pod添加到CR的workersToDelete字段中。然后，KubeRay操作员会删除workersToDelete字段中的Pod。

快速入门#

步骤 1：使用 Kind 创建一个 Kubernetes 集群#

kind create cluster --image=kindest/node:v1.26.0

步骤 2：安装 KubeRay 操作员#

按照此文档通过 Helm 仓库安装最新稳定版本的 KubeRay 操作员。

步骤 3：创建一个启用了自动扩展的 RayCluster 自定义资源#

kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.1.1/ray-operator/config/samples/ray-cluster.autoscaler.yaml

步骤 4：验证 Kubernetes 集群状态#

# Step 4.1: List all Ray Pods in the `default` namespace.
kubectl get pods -l=ray.io/is-ray-node=yes

# [Example output]
# NAME                               READY   STATUS    RESTARTS   AGE
# raycluster-autoscaler-head-6zc2t   2/2     Running   0          107s

# Step 4.2: Check the ConfigMap in the `default` namespace.
kubectl get configmaps

# [Example output]
# NAME                  DATA   AGE
# ray-example           2      21s
# ...

RayCluster 有一个头部 Pod 和零个工作 Pod。头部 Pod 有两个容器：一个 Ray 头部容器和一个 Ray Autoscaler 辅助容器。此外，ray-cluster.autoscaler.yaml 包含一个名为 ray-example 的 ConfigMap，其中存放了两个 Python 脚本：detached_actor.py 和 terminate_detached_actor.py。

detached_actor.py 是一个创建需要1个CPU的分离角色的Python脚本。

import ray
import sys

@ray.remote(num_cpus=1)
class Actor:
  pass

ray.init(namespace="default_namespace")
Actor.options(name=sys.argv[1], lifetime="detached").remote()

terminate_detached_actor.py 是一个终止分离角色的Python脚本。

import ray
import sys

ray.init(namespace="default_namespace")
detached_actor = ray.get_actor(sys.argv[1])
ray.kill(detached_actor)

步骤5：通过创建分离的执行者来触发RayCluster的扩展#

# Step 5.1: Create a detached actor "actor1" which requires 1 CPU.
export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers)
kubectl exec -it $HEAD_POD -- python3 /home/ray/samples/detached_actor.py actor1

# Step 5.2: The Ray Autoscaler creates a new worker Pod.
kubectl get pods -l=ray.io/is-ray-node=yes

# [Example output]
# NAME                                             READY   STATUS    RESTARTS   AGE
# raycluster-autoscaler-head-xxxxx                 2/2     Running   0          xxm
# raycluster-autoscaler-worker-small-group-yyyyy   1/1     Running   0          xxm

# Step 5.3: Create a detached actor which requires 1 CPU.
kubectl exec -it $HEAD_POD -- python3 /home/ray/samples/detached_actor.py actor2
kubectl get pods -l=ray.io/is-ray-node=yes

# [Example output]
# NAME                                             READY   STATUS    RESTARTS   AGE
# raycluster-autoscaler-head-xxxxx                 2/2     Running   0          xxm
# raycluster-autoscaler-worker-small-group-yyyyy   1/1     Running   0          xxm
# raycluster-autoscaler-worker-small-group-zzzzz   1/1     Running   0          xxm

# Step 5.4: List all actors in the Ray cluster.
kubectl exec -it $HEAD_POD -- ray list actors


# ======= List: 2023-09-06 13:26:49.228594 ========
# Stats:
# ------------------------------
# Total: 2

# Table:
# ------------------------------
#     ACTOR_ID  CLASS_NAME    STATE    JOB_ID    NAME    ...
#  0  xxxxxxxx  Actor         ALIVE    02000000  actor1  ...
#  1  xxxxxxxx  Actor         ALIVE    03000000  actor2  ...

Ray Autoscaler 为每个新的分离角色生成一个新的工作 Pod。这是因为 Ray 头中的 rayStartParams 字段指定了 num-cpus: "0"，阻止了 Ray 调度器在 Ray 头 Pod 上调度任何 Ray 角色或任务。此外，每个 Ray 工作 Pod 都有 1 个 CPU 的容量，因此 Autoscaler 创建一个新的工作 Pod 以满足分离角色所需的 1 个 CPU 的资源要求。

使用分离的执行者并不是触发集群扩展的必要条件。普通的执行者和任务也可以启动它。分离的执行者即使在作业的驱动进程退出后仍然保持持久，这就是为什么当 detached_actor.py 进程退出时，自动缩放器不会自动缩小集群规模，从而使本教程更加方便。
在这个 RayCluster 自定义资源中，从 Ray Autoscaler 的角度来看，每个 Ray worker Pod 只拥有 1 个逻辑 CPU。因此，如果你创建一个带有 @ray.remote(num_cpus=2) 的分离角色，Autoscaler 不会启动创建新的 worker Pod，因为现有 Pod 的容量限制为 1 个 CPU。
(高级) Ray Autoscaler 还提供了一个 Python SDK，使高级用户（如 Ray 维护者）能够直接从 Autoscaler 请求资源。通常，大多数用户不需要使用该 SDK。

步骤 6：通过终止分离的执行者来触发 RayCluster 的缩减#

# Step 6.1: Terminate the detached actor "actor1".
kubectl exec -it $HEAD_POD -- python3 /home/ray/samples/terminate_detached_actor.py actor1

# Step 6.2: A worker Pod will be deleted after `idleTimeoutSeconds` (default 60s) seconds.
kubectl get pods -l=ray.io/is-ray-node=yes

# [Example output]
# NAME                                             READY   STATUS    RESTARTS   AGE
# raycluster-autoscaler-head-xxxxx                 2/2     Running   0          xxm
# raycluster-autoscaler-worker-small-group-zzzzz   1/1     Running   0          xxm

# Step 6.3: Terminate the detached actor "actor2".
kubectl exec -it $HEAD_POD -- python3 /home/ray/samples/terminate_detached_actor.py actor2

# Step 6.4: A worker Pod will be deleted after `idleTimeoutSeconds` (default 60s) seconds.
kubectl get pods -l=ray.io/is-ray-node=yes

# [Example output]
# NAME                                             READY   STATUS    RESTARTS   AGE
# raycluster-autoscaler-head-xxxxx                 2/2     Running   0          xxm

步骤 7：Ray Autoscaler 可观察性#

# Method 1: "ray status"
kubectl exec $HEAD_POD -it -c ray-head -- ray status

# [Example output]:
# ======== Autoscaler status: 2023-09-06 13:42:46.372683 ========
# Node status
# ---------------------------------------------------------------
# Healthy:
#  1 head-group
# Pending:
#  (no pending nodes)
# Recent failures:
#  (no failures)

# Resources
# ---------------------------------------------------------------
# Usage:
#  0B/1.86GiB memory
#  0B/514.69MiB object_store_memory

# Demands:
#  (no resource demands)

# Method 2: "kubectl logs"
kubectl logs $HEAD_POD -c autoscaler | tail -n 20

# [Example output]:
# 2023-09-06 13:43:22,029 INFO autoscaler.py:421 --
# ======== Autoscaler status: 2023-09-06 13:43:22.028870 ========
# Node status
# ---------------------------------------------------------------
# Healthy:
#  1 head-group
# Pending:
#  (no pending nodes)
# Recent failures:
#  (no failures)

# Resources
# ---------------------------------------------------------------
# Usage:
#  0B/1.86GiB memory
#  0B/514.69MiB object_store_memory

# Demands:
#  (no resource demands)
# 2023-09-06 13:43:22,029 INFO autoscaler.py:464 -- The autoscaler took 0.036 seconds to complete the update iteration.

步骤 8：清理 Kubernetes 集群#

# Delete RayCluster and ConfigMap
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.1.1/ray-operator/config/samples/ray-cluster.autoscaler.yaml

# Uninstall the KubeRay operator
helm uninstall kuberay-operator