基于pytorch,更轻松地进行分布式训练和混合精度训练,提高模型训练效率 https://github.com/huggingface/accelerate
|
10 months ago | |
---|---|---|
README.md | 10 months ago |
基于pytorch,更轻松地进行分布式训练和混合精度训练,提高模型训练效率
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import DistributedDataParallelKwargs
# 初始化 Accelerator
kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
accelerator = Accelerator(kwargs_handlers=[kwargs])
# 移除代码中所有的 to(device),会自动分配gpu,原有的model交给 prepare 处理
model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dataloader)
# 训练循环
for epoch in range(10):
for source, targets in train_dataloader:
optimizer.zero_grad()
output = model(source)
loss = F.cross_entropy(output, targets)
accelerator.backward(loss)
optimizer.step()
分布式部署:
# 配置生成在 ~/.cache/huggingface/accelerate
accelerate config
accelerate launch path_to_script.py --args_for_the_script
保存模型:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
accelerator.save(unwrapped_model.state_dict(), path)
加载模型:
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.load_state_dict(torch.load(path))