Register for Ray Summit 2024 with keynotes from Mira Murati, Marc Andreessen, and Anastasis Germanidis.

在 RayCluster 上开发 Ray Serve Python 脚本#

在本教程中，您将学习如何有效地调试针对 RayCluster 的 Ray Serve 脚本，从而实现比直接使用 RayService 开发脚本更高的可观察性和更快的迭代速度。许多 RayService 问题与 Ray Serve Python 脚本相关，因此在将脚本部署到 RayService 之前，确保脚本的正确性非常重要。本教程将向您展示如何在 RayCluster 上为 MobileNet 图像分类器开发 Ray Serve Python 脚本。您可以在本地 Kind 集群上部署和提供分类器，而无需 GPU。更多详情请参阅 ray-service.mobilenet.yaml 和 mobilenet-rayservice.md。

步骤 1：安装 KubeRay 集群#

按照此文档通过 Helm 仓库安装最新稳定版本的 KubeRay 操作员。

步骤 2：创建一个 RayCluster CR#

helm install raycluster kuberay/ray-cluster --version 1.0.0

步骤 3：登录到主 Pod#

export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers)
kubectl exec -it $HEAD_POD -- bash

步骤4：准备您的 Ray Serve Python 脚本并运行 Ray Serve 应用程序#

# Execute the following command in the head Pod
git clone https://github.com/ray-project/serve_config_examples.git
cd serve_config_examples

# Try to launch the Ray Serve application
serve run mobilenet.mobilenet:app
# [Error message]
#     from tensorflow.keras.preprocessing import image
# ModuleNotFoundError: No module named 'tensorflow'

serve run mobilenet.mobilenet:app: 第一个 mobilenet 是 serve_config_examples/ 目录中的目录名称，第二个 mobilenet 是 mobilenet/ 目录中的 Python 文件名称，而 app 是 Python 文件中代表 Ray Serve 应用程序的变量名称。更多详情请参见 rayservice-troubleshooting.md 中的 “import_path” 部分。

步骤 5：将 Ray 镜像从 `rayproject/ray:${RAY_VERSION}` 更改为 `rayproject/ray-ml:${RAY_VERSION}`#

# Uninstall RayCluster
helm uninstall raycluster

# Install the RayCluster CR with the Ray image `rayproject/ray-ml:${RAY_VERSION}`
helm install raycluster kuberay/ray-cluster --version 1.0.0 --set image.repository=rayproject/ray-ml

步骤4中的错误信息表明，Ray 镜像 rayproject/ray:${RAY_VERSION} 没有 TensorFlow 包。由于 TensorFlow 的体积较大，我们选择使用一个以 TensorFlow 为基础的镜像，而不是在运行时环境中安装它。在此步骤中，我们将 Ray 镜像从 rayproject/ray:${RAY_VERSION} 更改为 rayproject/ray-ml:${RAY_VERSION}。

步骤6：重复步骤3和步骤4#

# Repeat Step 3 and Step 4 to log in to the new head Pod and run the Ray Serve application.
# You should successfully launch the Ray Serve application this time.
serve run mobilenet.mobilenet:app

# [Example output]
# (ServeReplica:default_ImageClassifier pid=139, ip=10.244.0.8) Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5
#     8192/14536120 [..............................] - ETA: 0s)
#  4202496/14536120 [=======>......................] - ETA: 0s)
# 12902400/14536120 [=========================>....] - ETA: 0s)
# 14536120/14536120 [==============================] - 0s 0us/step
# 2023-07-17 14:04:43,737 SUCC scripts.py:424 -- Deployed Serve app successfully.

步骤 7：向 Ray Serve 应用程序提交请求#

# (On your local machine) Forward the serve port of the head Pod
kubectl port-forward $HEAD_POD 8000

# Clone the repository on your local machine
git clone https://github.com/ray-project/serve_config_examples.git
cd serve_config_examples/mobilenet

# Prepare a sample image file. `stable_diffusion_example.png` is a cat image generated by the Stable Diffusion model.
curl -O https://raw.githubusercontent.com/ray-project/kuberay/master/docs/images/stable_diffusion_example.png

# Update `image_path` in `mobilenet_req.py` to the path of `stable_diffusion_example.png`
# Send a request to the Ray Serve application.
python3 mobilenet_req.py

# [Error message]
# Unexpected error, traceback: ray::ServeReplica:default_ImageClassifier.handle_request() (pid=139, ip=10.244.0.8)
#   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/_private/utils.py", line 254, in wrap_to_ray_error
#     raise exception
#   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/_private/replica.py", line 550, in invoke_single
#     result = await method_to_call(*args, **kwargs)
#   File "./mobilenet/mobilenet.py", line 24, in __call__
#   File "/home/ray/anaconda3/lib/python3.7/site-packages/starlette/requests.py", line 256, in _get_form
#     ), "The `python-multipart` library must be installed to use form parsing."
# AssertionError: The `python-multipart` library must be installed to use form parsing..

python-multipart 是请求解析函数 starlette.requests.form() 所必需的，因此当我们向 Ray Serve 应用程序发送请求时，会报告错误消息。

步骤 8：使用运行时环境重新启动 Ray Serve 应用程序。#

# In the head Pod, stop the Ray Serve application
serve shutdown

# Check the Ray Serve application status
serve status
# [Example output]
# There are no applications running on this cluster.

# Launch the Ray Serve application with runtime environment.
serve run mobilenet.mobilenet:app --runtime-env-json='{"pip": ["python-multipart==0.0.6"]}'

# (On your local machine) Submit a request to the Ray Serve application again, and you should get the correct prediction.
python3 mobilenet_req.py
# [Example output]
# {"prediction": ["n02123159", "tiger_cat", 0.2994779646396637]}

步骤 9：创建一个 RayService YAML 文件#

在之前的步骤中，我们发现可以使用 Ray 镜像 rayproject/ray-ml:${RAY_VERSION} 和运行时环境 python-multipart==0.0.6 成功启动 Ray Serve 应用程序。因此，我们可以使用相同的 Ray 镜像和运行时环境创建一个 RayService YAML 文件。更多详情，请参阅 ray-service.mobilenet.yaml 和 mobilenet-rayservice.md。