使用CLI进行文件夹同步
此示例展示了如何使用clearml-data
文件夹同步功能。
clearml-data
文件夹同步模式在用户有一个单一的真实来源(即一个文件夹)并且该来源会不时更新的情况下非常有用。当真实来源更新时,用户可以调用 clearml-data sync
,并且这些更改(文件的添加、修改或删除)将反映在 ClearML 中。
先决条件
-
首先,确保你已经克隆了clearml仓库。它包含了所有需要的文件。
-
打开终端并切换到克隆的仓库的示例文件夹
cd clearml/examples/reporting
同步文件夹
创建一个数据集并将data_samples
文件夹从仓库同步到ClearML
clearml-data sync --project datasets --name sync_folder --folder data_samples
预期响应:
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=0d8f5f3e5ebd4f849bfb218021be1ede
Syncing dataset id 0d8f5f3e5ebd4f849bfb218021be1ede to local folder data_samples
Generating SHA2 hash for 5 files
Hash generation completed
Sync completed: 0 files removed, 5 added / modified
Finalizing dataset
Pending uploads, starting dataset upload to https://files.community.clear.ml
Uploading compressed dataset changes (5 files, total 222.17 KB) to https://files.community.clear.ml
Upload completed (222.17 KB)
2021-05-04 09:57:56,809 - clearml.Task - INFO - Waiting to finish uploads
2021-05-04 09:57:57,581 - clearml.Task - INFO - Finished uploading
Dataset closed and finalized
可以看出,clearml-data sync
命令创建数据集,然后上传文件,并关闭数据集。
修改同步文件夹
修改数据文件夹:
-
在
data_samples
文件夹中的其中一个文件添加另一行。 -
将文件添加到sample_data文件夹。
运行echo "data data data" > data_samples/new_data.txt
(这将创建文件new_data.txt
并将其放入data_samples
文件夹中) -
重复创建新数据集的过程,将前一个数据集作为其父数据集,并同步文件夹。
clearml-data sync --project datasets --name second_ds --parents a1ddc8b0711b4178828f6c6e6e994b7c --folder data_samples
预期响应:
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=0992dd6bae6144388e0f2ef131d9724a
Syncing dataset id 0992dd6bae6144388e0f2ef131d9724a to local folder data_samples
Generating SHA2 hash for 6 files
Hash generation completed
Sync completed: 0 files removed, 2 added / modified
Finalizing dataset
Pending uploads, starting dataset upload to https://files.community.clear.ml
Uploading compressed dataset changes (2 files, total 742 bytes) to https://files.community.clear.ml
Upload completed (742 bytes)
2021-05-04 10:05:42,353 - clearml.Task - INFO - Waiting to finish uploads
2021-05-04 10:05:43,106 - clearml.Task - INFO - Finished uploading
Dataset closed and finalized看到有2个文件被添加或修改,正如预期的那样!