Skip to main content

使用CLI进行文件夹同步

此示例展示了如何使用clearml-data文件夹同步功能。

clearml-data 文件夹同步模式在用户有一个单一的真实来源(即一个文件夹)并且该来源会不时更新的情况下非常有用。当真实来源更新时,用户可以调用 clearml-data sync,并且这些更改(文件的添加、修改或删除)将反映在 ClearML 中。

先决条件

  1. 首先,确保你已经克隆了clearml仓库。它包含了所有需要的文件。

  2. 打开终端并切换到克隆的仓库的示例文件夹

    cd clearml/examples/reporting

同步文件夹

创建一个数据集并将data_samples文件夹从仓库同步到ClearML

clearml-data sync --project datasets --name sync_folder --folder data_samples

预期响应:

clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=0d8f5f3e5ebd4f849bfb218021be1ede
Syncing dataset id 0d8f5f3e5ebd4f849bfb218021be1ede to local folder data_samples
Generating SHA2 hash for 5 files
Hash generation completed
Sync completed: 0 files removed, 5 added / modified
Finalizing dataset
Pending uploads, starting dataset upload to https://files.community.clear.ml
Uploading compressed dataset changes (5 files, total 222.17 KB) to https://files.community.clear.ml
Upload completed (222.17 KB)
2021-05-04 09:57:56,809 - clearml.Task - INFO - Waiting to finish uploads
2021-05-04 09:57:57,581 - clearml.Task - INFO - Finished uploading
Dataset closed and finalized

可以看出,clearml-data sync 命令创建数据集,然后上传文件,并关闭数据集。

修改同步文件夹

修改数据文件夹:

  1. data_samples文件夹中的其中一个文件添加另一行。

  2. 将文件添加到sample_data文件夹。
    运行echo "data data data" > data_samples/new_data.txt(这将创建文件new_data.txt并将其放入data_samples文件夹中)

  3. 重复创建新数据集的过程,将前一个数据集作为其父数据集,并同步文件夹。

    clearml-data sync --project datasets --name second_ds --parents a1ddc8b0711b4178828f6c6e6e994b7c --folder data_samples

    预期响应:

    clearml-data - Dataset Management & Versioning CLI
    Creating a new dataset:
    New dataset created id=0992dd6bae6144388e0f2ef131d9724a
    Syncing dataset id 0992dd6bae6144388e0f2ef131d9724a to local folder data_samples
    Generating SHA2 hash for 6 files
    Hash generation completed
    Sync completed: 0 files removed, 2 added / modified
    Finalizing dataset
    Pending uploads, starting dataset upload to https://files.community.clear.ml
    Uploading compressed dataset changes (2 files, total 742 bytes) to https://files.community.clear.ml
    Upload completed (742 bytes)
    2021-05-04 10:05:42,353 - clearml.Task - INFO - Waiting to finish uploads
    2021-05-04 10:05:43,106 - clearml.Task - INFO - Finished uploading
    Dataset closed and finalized

    看到有2个文件被添加或修改,正如预期的那样!