实验跟踪在机器学习中是至关重要的,因为它使数据科学家和研究人员能够有效地管理和重现他们的实验。通过跟踪实验的各个方面,如超参数、模型架构和训练数据,可以更容易地理解和解释结果。实验跟踪还允许团队成员之间更好的协作和知识共享,因为它提供了一个集中式的实验库及其相关的元数据。此外,跟踪实验有助于调试和故障排除,因为它可以识别导致成功或失败结果的特定设置或条件。总体而言,实验跟踪在确保机器学习工作流程的透明性、可重现性和持续改进中发挥着至关重要的作用。
现在让我们看看如何使用Weights & Biases和PyTorch Tabular免费获取所有这些好处。
导入库¶
from pytorch_tabular import TabularModel
from pytorch_tabular.models import (
CategoryEmbeddingModelConfig,
FTTransformerConfig,
TabNetModelConfig,
GANDALFConfig,
)
from pytorch_tabular.config import (
DataConfig,
OptimizerConfig,
TrainerConfig,
ExperimentConfig,
)
from pytorch_tabular.models.common.heads import LinearHeadConfig
常见配置¶
data_config = DataConfig(
target=[
target_col
], # 目标应始终为一个列表。对于回归任务,支持多目标;而多任务分类功能尚未实现。
continuous_cols=num_col_names,
categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
auto_lr_find=True, # 运行LRFinder以自动推导学习率
batch_size=1024,
max_epochs=100,
early_stopping="valid_loss", # 监视有效损失以进行提前停止
early_stopping_mode="min", # 将模式设置为 min,因为在 val_loss 中,数值越低越好。
early_stopping_patience=5, # 在终止之前,将等待的降解训练的轮次数
checkpoints="valid_loss", # 保存最佳检查点监控验证损失
load_best=True, # 训练完成后,加载最佳检查点。
)
optimizer_config = OptimizerConfig()
head_config = LinearHeadConfig(
layers="", # 头部没有额外的层,仅有一个映射层输出到output_dim。
dropout=0.1,
initialization="kaiming",
).__dict__ # 转换为字典以传递给模型配置(OmegaConf不接受对象)
EXP_PROJECT_NAME = "pytorch-tabular-covertype"
类别嵌入模型¶
model_config = CategoryEmbeddingModelConfig(
task="classification",
layers="1024-512-512", # 每一层的节点数量
activation="LeakyReLU", # 各层之间的激活
learning_rate=1e-3,
head="LinearHead", # 线性磁头
head_config=head_config, # 线性磁头配置
)
experiment_config = ExperimentConfig(
project_name=EXP_PROJECT_NAME,
run_name="CategoryEmbeddingModel",
exp_watch="gradients",
log_target="wandb",
log_logits=True,
)
tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
experiment_config=experiment_config,
verbose=False,
suppress_lightning_logger=True,
)
tabular_model.fit(train=train, validation=val)
FT 变换器¶
model_config = FTTransformerConfig(
task="classification",
num_attn_blocks=3,
num_heads=4,
learning_rate=1e-3,
head="LinearHead", # 线性磁头
head_config=head_config, # 线性磁头配置
)
experiment_config = ExperimentConfig(
project_name=EXP_PROJECT_NAME,
run_name="FTTransformer",
exp_watch="gradients",
log_target="wandb",
log_logits=True,
)
tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
experiment_config=experiment_config,
verbose=False,
suppress_lightning_logger=True,
)
tabular_model.fit(train=train, validation=val)
甘道夫¶
model_config = GANDALFConfig(
task="classification",
gflu_stages=10,
learning_rate=1e-3,
head="LinearHead", # 线性磁头
head_config=head_config, # 线性磁头配置
)
experiment_config = ExperimentConfig(
project_name=EXP_PROJECT_NAME,
run_name="GANDALF",
exp_watch="gradients",
log_target="wandb",
log_logits=True,
)
tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
experiment_config=experiment_config,
verbose=False,
suppress_lightning_logger=True,
)
tabular_model.fit(train=train, validation=val)
TabNet 模型¶
model_config = TabNetModelConfig(
task="classification",
learning_rate=1e-5,
n_d=16,
n_a=16,
n_steps=4,
head="LinearHead", # 线性磁头
head_config=head_config, # 线性磁头配置
)
experiment_config = ExperimentConfig(
project_name=EXP_PROJECT_NAME,
run_name="TabNet",
exp_watch="gradients",
log_target="wandb",
log_logits=True,
)
tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
experiment_config=experiment_config,
verbose=False,
suppress_lightning_logger=True,
)
tabular_model.fit(train=train, validation=val)
访问实验¶
我们可以访问运行情况 @ https://wandb.ai/manujosephv/pytorch-tabular-covertype/
我们还可以检查模型每个组件中的梯度流以便进行调试。