在 RayCluster 上开发 Ray Serve Python 脚本#

在本教程中,您将学习如何有效地调试针对 RayCluster 的 Ray Serve 脚本,从而实现比直接使用 RayService 开发脚本更高的可观察性和更快的迭代速度。许多 RayService 问题与 Ray Serve Python 脚本相关,因此在将脚本部署到 RayService 之前,确保脚本的正确性非常重要。本教程将向您展示如何在 RayCluster 上为 MobileNet 图像分类器开发 Ray Serve Python 脚本。您可以在本地 Kind 集群上部署和提供分类器,而无需 GPU。更多详情请参阅 ray-service.mobilenet.yamlmobilenet-rayservice.md

步骤 1:安装 KubeRay 集群#

按照 此文档 通过 Helm 仓库安装最新稳定版本的 KubeRay 操作员。

步骤 2:创建一个 RayCluster CR#

helm install raycluster kuberay/ray-cluster --version 1.0.0

步骤 3:登录到主 Pod#

export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers)
kubectl exec -it $HEAD_POD -- bash

步骤4:准备您的 Ray Serve Python 脚本并运行 Ray Serve 应用程序#

# Execute the following command in the head Pod
git clone https://github.com/ray-project/serve_config_examples.git
cd serve_config_examples

# Try to launch the Ray Serve application
serve run mobilenet.mobilenet:app
# [Error message]
#     from tensorflow.keras.preprocessing import image
# ModuleNotFoundError: No module named 'tensorflow'
  • serve run mobilenet.mobilenet:app: 第一个 mobilenetserve_config_examples/ 目录中的目录名称,第二个 mobilenetmobilenet/ 目录中的 Python 文件名称,而 app 是 Python 文件中代表 Ray Serve 应用程序的变量名称。更多详情请参见 rayservice-troubleshooting.md 中的 “import_path” 部分。

步骤 5:将 Ray 镜像从 rayproject/ray:${RAY_VERSION} 更改为 rayproject/ray-ml:${RAY_VERSION}#

# Uninstall RayCluster
helm uninstall raycluster

# Install the RayCluster CR with the Ray image `rayproject/ray-ml:${RAY_VERSION}`
helm install raycluster kuberay/ray-cluster --version 1.0.0 --set image.repository=rayproject/ray-ml

步骤4中的错误信息表明,Ray 镜像 rayproject/ray:${RAY_VERSION} 没有 TensorFlow 包。由于 TensorFlow 的体积较大,我们选择使用一个以 TensorFlow 为基础的镜像,而不是在 运行时环境 中安装它。在此步骤中,我们将 Ray 镜像从 rayproject/ray:${RAY_VERSION} 更改为 rayproject/ray-ml:${RAY_VERSION}

步骤6:重复步骤3和步骤4#

# Repeat Step 3 and Step 4 to log in to the new head Pod and run the Ray Serve application.
# You should successfully launch the Ray Serve application this time.
serve run mobilenet.mobilenet:app

# [Example output]
# (ServeReplica:default_ImageClassifier pid=139, ip=10.244.0.8) Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5
#     8192/14536120 [..............................] - ETA: 0s)
#  4202496/14536120 [=======>......................] - ETA: 0s)
# 12902400/14536120 [=========================>....] - ETA: 0s)
# 14536120/14536120 [==============================] - 0s 0us/step
# 2023-07-17 14:04:43,737 SUCC scripts.py:424 -- Deployed Serve app successfully.

步骤 7:向 Ray Serve 应用程序提交请求#

# (On your local machine) Forward the serve port of the head Pod
kubectl port-forward $HEAD_POD 8000

# Clone the repository on your local machine
git clone https://github.com/ray-project/serve_config_examples.git
cd serve_config_examples/mobilenet

# Prepare a sample image file. `stable_diffusion_example.png` is a cat image generated by the Stable Diffusion model.
curl -O https://raw.githubusercontent.com/ray-project/kuberay/master/docs/images/stable_diffusion_example.png

# Update `image_path` in `mobilenet_req.py` to the path of `stable_diffusion_example.png`
# Send a request to the Ray Serve application.
python3 mobilenet_req.py

# [Error message]
# Unexpected error, traceback: ray::ServeReplica:default_ImageClassifier.handle_request() (pid=139, ip=10.244.0.8)
#   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/_private/utils.py", line 254, in wrap_to_ray_error
#     raise exception
#   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/_private/replica.py", line 550, in invoke_single
#     result = await method_to_call(*args, **kwargs)
#   File "./mobilenet/mobilenet.py", line 24, in __call__
#   File "/home/ray/anaconda3/lib/python3.7/site-packages/starlette/requests.py", line 256, in _get_form
#     ), "The `python-multipart` library must be installed to use form parsing."
# AssertionError: The `python-multipart` library must be installed to use form parsing..

python-multipart 是请求解析函数 starlette.requests.form() 所必需的,因此当我们向 Ray Serve 应用程序发送请求时,会报告错误消息。

步骤 8:使用运行时环境重新启动 Ray Serve 应用程序。#

# In the head Pod, stop the Ray Serve application
serve shutdown

# Check the Ray Serve application status
serve status
# [Example output]
# There are no applications running on this cluster.

# Launch the Ray Serve application with runtime environment.
serve run mobilenet.mobilenet:app --runtime-env-json='{"pip": ["python-multipart==0.0.6"]}'

# (On your local machine) Submit a request to the Ray Serve application again, and you should get the correct prediction.
python3 mobilenet_req.py
# [Example output]
# {"prediction": ["n02123159", "tiger_cat", 0.2994779646396637]}

步骤 9:创建一个 RayService YAML 文件#

在之前的步骤中,我们发现可以使用 Ray 镜像 rayproject/ray-ml:${RAY_VERSION}运行时环境 python-multipart==0.0.6 成功启动 Ray Serve 应用程序。因此,我们可以使用相同的 Ray 镜像和运行时环境创建一个 RayService YAML 文件。更多详情,请参阅 ray-service.mobilenet.yamlmobilenet-rayservice.md