在 RayCluster 上开发 Ray Serve Python 脚本#
在本教程中,您将学习如何有效地调试针对 RayCluster 的 Ray Serve 脚本,从而实现比直接使用 RayService 开发脚本更高的可观察性和更快的迭代速度。许多 RayService 问题与 Ray Serve Python 脚本相关,因此在将脚本部署到 RayService 之前,确保脚本的正确性非常重要。本教程将向您展示如何在 RayCluster 上为 MobileNet 图像分类器开发 Ray Serve Python 脚本。您可以在本地 Kind 集群上部署和提供分类器,而无需 GPU。更多详情请参阅 ray-service.mobilenet.yaml 和 mobilenet-rayservice.md。
步骤 1:安装 KubeRay 集群#
按照 此文档 通过 Helm 仓库安装最新稳定版本的 KubeRay 操作员。
步骤 2:创建一个 RayCluster CR#
helm install raycluster kuberay/ray-cluster --version 1.0.0
步骤 3:登录到主 Pod#
export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers)
kubectl exec -it $HEAD_POD -- bash
步骤4:准备您的 Ray Serve Python 脚本并运行 Ray Serve 应用程序#
# Execute the following command in the head Pod
git clone https://github.com/ray-project/serve_config_examples.git
cd serve_config_examples
# Try to launch the Ray Serve application
serve run mobilenet.mobilenet:app
# [Error message]
# from tensorflow.keras.preprocessing import image
# ModuleNotFoundError: No module named 'tensorflow'
serve run mobilenet.mobilenet:app
: 第一个mobilenet
是serve_config_examples/
目录中的目录名称,第二个mobilenet
是mobilenet/
目录中的 Python 文件名称,而app
是 Python 文件中代表 Ray Serve 应用程序的变量名称。更多详情请参见 rayservice-troubleshooting.md 中的 “import_path” 部分。
步骤 5:将 Ray 镜像从 rayproject/ray:${RAY_VERSION}
更改为 rayproject/ray-ml:${RAY_VERSION}
#
# Uninstall RayCluster
helm uninstall raycluster
# Install the RayCluster CR with the Ray image `rayproject/ray-ml:${RAY_VERSION}`
helm install raycluster kuberay/ray-cluster --version 1.0.0 --set image.repository=rayproject/ray-ml
步骤4中的错误信息表明,Ray 镜像 rayproject/ray:${RAY_VERSION}
没有 TensorFlow 包。由于 TensorFlow 的体积较大,我们选择使用一个以 TensorFlow 为基础的镜像,而不是在 运行时环境 中安装它。在此步骤中,我们将 Ray 镜像从 rayproject/ray:${RAY_VERSION}
更改为 rayproject/ray-ml:${RAY_VERSION}
。
步骤6:重复步骤3和步骤4#
# Repeat Step 3 and Step 4 to log in to the new head Pod and run the Ray Serve application.
# You should successfully launch the Ray Serve application this time.
serve run mobilenet.mobilenet:app
# [Example output]
# (ServeReplica:default_ImageClassifier pid=139, ip=10.244.0.8) Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5
# 8192/14536120 [..............................] - ETA: 0s)
# 4202496/14536120 [=======>......................] - ETA: 0s)
# 12902400/14536120 [=========================>....] - ETA: 0s)
# 14536120/14536120 [==============================] - 0s 0us/step
# 2023-07-17 14:04:43,737 SUCC scripts.py:424 -- Deployed Serve app successfully.
步骤 7:向 Ray Serve 应用程序提交请求#
# (On your local machine) Forward the serve port of the head Pod
kubectl port-forward $HEAD_POD 8000
# Clone the repository on your local machine
git clone https://github.com/ray-project/serve_config_examples.git
cd serve_config_examples/mobilenet
# Prepare a sample image file. `stable_diffusion_example.png` is a cat image generated by the Stable Diffusion model.
curl -O https://raw.githubusercontent.com/ray-project/kuberay/master/docs/images/stable_diffusion_example.png
# Update `image_path` in `mobilenet_req.py` to the path of `stable_diffusion_example.png`
# Send a request to the Ray Serve application.
python3 mobilenet_req.py
# [Error message]
# Unexpected error, traceback: ray::ServeReplica:default_ImageClassifier.handle_request() (pid=139, ip=10.244.0.8)
# File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/_private/utils.py", line 254, in wrap_to_ray_error
# raise exception
# File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/_private/replica.py", line 550, in invoke_single
# result = await method_to_call(*args, **kwargs)
# File "./mobilenet/mobilenet.py", line 24, in __call__
# File "/home/ray/anaconda3/lib/python3.7/site-packages/starlette/requests.py", line 256, in _get_form
# ), "The `python-multipart` library must be installed to use form parsing."
# AssertionError: The `python-multipart` library must be installed to use form parsing..
python-multipart
是请求解析函数 starlette.requests.form()
所必需的,因此当我们向 Ray Serve 应用程序发送请求时,会报告错误消息。
步骤 8:使用运行时环境重新启动 Ray Serve 应用程序。#
# In the head Pod, stop the Ray Serve application
serve shutdown
# Check the Ray Serve application status
serve status
# [Example output]
# There are no applications running on this cluster.
# Launch the Ray Serve application with runtime environment.
serve run mobilenet.mobilenet:app --runtime-env-json='{"pip": ["python-multipart==0.0.6"]}'
# (On your local machine) Submit a request to the Ray Serve application again, and you should get the correct prediction.
python3 mobilenet_req.py
# [Example output]
# {"prediction": ["n02123159", "tiger_cat", 0.2994779646396637]}
步骤 9:创建一个 RayService YAML 文件#
在之前的步骤中,我们发现可以使用 Ray 镜像 rayproject/ray-ml:${RAY_VERSION}
和 运行时环境 python-multipart==0.0.6
成功启动 Ray Serve 应用程序。因此,我们可以使用相同的 Ray 镜像和运行时环境创建一个 RayService YAML 文件。更多详情,请参阅 ray-service.mobilenet.yaml 和 mobilenet-rayservice.md。