在 Kubernetes 上使用 Modin 和 Ray#
此示例在Kubernetes上使用RayJob运行Modin官方仓库中的使用Modin处理NYC出租车数据集示例的修改版本。
步骤 1:安装 KubeRay 操作员#
按照 RayCluster 快速入门 指南中的步骤 1 和 2 来安装 KubeRay 操作员。
步骤 2:使用 RayJob 运行 Modin 示例#
使用以下命令创建一个运行 Modin 示例的 RayJob:
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-job.modin.yaml
步骤 3:检查输出#
运行以下命令以检查输出:
kubectl logs -l=job-name=rayjob-sample
# [Example output]
# 2024-07-05 10:01:00,945 INFO worker.py:1446 -- Using address 10.244.0.4:6379 set in the environment variable RAY_ADDRESS
# 2024-07-05 10:01:00,945 INFO worker.py:1586 -- Connecting to existing Ray cluster at address: 10.244.0.4:6379...
# 2024-07-05 10:01:00,948 INFO worker.py:1762 -- Connected to Ray cluster. View the dashboard at 10.244.0.4:8265
# Modin Engine: Ray
# FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
# Time to compute isnull: 0.065887747972738
# Time to compute rounded_trip_distance: 0.34410698304418474
# 2024-07-05 10:01:23,069 SUCC cli.py:60 -- -----------------------------------
# 2024-07-05 10:01:23,069 SUCC cli.py:61 -- Job 'rayjob-sample-zt8wj' succeeded
# 2024-07-05 10:01:23,069 SUCC cli.py:62 -- -----------------------------------