为 Ray 头节点/工作节点 Pod 指定容器命令#

KubeRay 为每个 Ray Pod 生成一个 ray start 命令。有时，您可能希望在 ray start 命令之前或之后执行某些命令，或者您可能希望自己定义容器的命令。本文档将向您展示如何做到这一点。

第1部分：指定一个自定义容器命令，可选地包括生成的 `ray start` 命令#

从 KubeRay v1.1.0 开始，如果用户在 RayCluster 中添加注解 ray.io/overwrite-container-cmd: "true"，KubeRay 将尊重用户提供的容器 command 和 args，而不包括任何生成的命令，包括 ulimit 和 ray start 命令，后者存储在环境变量 KUBERAY_GEN_RAY_START_CMD 中。

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  annotations:
    # If this annotation is set to "true", KubeRay will respect the container `command` and `args`.
    ray.io/overwrite-container-cmd: "true"
  ...
spec:
  headGroupSpec:
    rayStartParams: {}
    # Pod template
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.8.0
          # Because the annotation "ray.io/overwrite-container-cmd" is set to "true",
          # KubeRay will overwrite the generated container command with `command` and
          # `args` in the following. Hence, you need to specify the `ulimit` command
          # by yourself to avoid Ray scalability issues.
          command: ["/bin/bash", "-lc", "--"]
          # Starting from v1.1.0, KubeRay injects the environment variable `KUBERAY_GEN_RAY_START_CMD`
          # into the Ray container. This variable can be used to retrieve the generated Ray start command.
          # Note that this environment variable does not include the `ulimit` command.
          args: ["ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD"]
          ...

前面的 YAML 示例是 ray-cluster.overwrite-command.yaml 的一部分。

metadata.annotations.ray.io/overwrite-container-cmd: "true": 此注解告诉 KubeRay 尊重用户提供的容器 command 和 args，而不包含任何生成的命令。如果您将注解设置为 “false” 或根本不设置，请参阅第二部分了解默认行为。
ulimit -n 65536: 此命令是必要的，以避免由于文件描述符耗尽而导致的Ray可扩展性问题。如果你不设置注解，KubeRay会自动将ulimit命令注入到容器中。
$KUBERAY_GEN_RAY_START_CMD: 从 KubeRay v1.1.0 开始，KubeRay 将环境变量 KUBERAY_GEN_RAY_START_CMD 注入到 Ray 容器中，用于存储 KubeRay 生成的 ray start 命令，适用于头节点和工作节点 Pod。请注意，此环境变量不包括 ulimit 命令。
```
# Example of the environment variable `KUBERAY_GEN_RAY_START_CMD` in the head Pod.
ray start --head  --dashboard-host=0.0.0.0  --num-cpus=1  --block  --metrics-export-port=8080  --memory=2147483648
```

头部Pod的 command/args 如下所示：

Command:
  /bin/bash
  -lc
  --
Args:
  ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD

第二部分：在生成的 `ray start` 命令之前执行命令#

如果你只想在生成的命令之前执行命令，你不需要设置注解 ray.io/overwrite-container-cmd: "true"。一些用户使用这种方法来设置 ray start 使用的环境变量。

# https://github.com/ray-project/kuberay/ray-operator/config/samples/ray-cluster.head-command.yaml
    rayStartParams:
        ...
    #pod template
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.8.0
          resources:
            ...
          ports:
            ...
          # `command` and `args` will become a part of `spec.containers.0.args` in the head Pod.
          command: ["echo 123"]
          args: ["456"]

spec.containers.0.command: KubeRay 将容器的命令硬编码为 ["/bin/bash", "-lc", "--"]。
spec.containers.0.args 包含两个部分：
- 用户指定的命令：一个字符串，将 headGroupSpec.template.spec.containers.0.command 和 headGroupSpec.template.spec.containers.0.args 连接在一起。
- ray start 命令: KubeRay 根据 RayCluster 中指定的 rayStartParams 创建命令。该命令类似于 ulimit -n 65536; ray start ...。
- 总结来说，spec.containers.0.args 是 $(用户指定的命令) && $(ray启动命令)。

示例

# Prerequisite: There is a KubeRay operator in the Kubernetes cluster.

# Download `ray-cluster.head-command.yaml`
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.0.0/ray-operator/config/samples/ray-cluster.head-command.yaml

# Create a RayCluster
kubectl apply -f ray-cluster.head-command.yaml

# Check ${RAYCLUSTER_HEAD_POD}
kubectl get pod -l ray.io/node-type=head

# Check `spec.containers.0.command` and `spec.containers.0.args`.
kubectl describe pod ${RAYCLUSTER_HEAD_POD}

# Command:
#   /bin/bash
#   -lc
#   --
# Args:
#    echo 123  456  && ulimit -n 65536; ray start --head  --dashboard-host=0.0.0.0  --num-cpus=1  --block  --metrics-export-port=8080  --memory=2147483648

为 Ray 头节点/工作节点 Pod 指定容器命令#

第1部分：指定一个自定义容器命令，可选地包括生成的 ray start 命令#

第二部分：在生成的 ray start 命令之前执行命令#

第1部分：指定一个自定义容器命令，可选地包括生成的 `ray start` 命令#

第二部分：在生成的 `ray start` 命令之前执行命令#