modelscope

Mirrors/modelscope

Fork 0

mirror of https://github.com/modelscope/modelscope.git synced 2025-12-21 10:39:24 +01:00

Commit Graph

Author	SHA1	Message	Date
yuze.zyz	a0bc5549a1	trainer support parallel_groups Design doc: https://yuque.alibaba-inc.com/suluyan.sly/yh1rvu/yx0owblyebpa2b3l?singleDoc#flU3s 1. Add parallel_group field in trainer to support DP, TP, PP. 2. Move the construction of common hooks(except optimizer/lrscheduler hook) to trainer's init method to support after_init stage. after_init is to support DP, TP, PP's initializing https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986 3. Add before_eval/after_eval stage to support model wrapping. to solve the order problem of apex amp & ddp wrapping. https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986 4. Exporter supports lazy importing. https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48122780 5. Fold all megatron imports to megatron hook. https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986 6. Add compile method to TorchModel ,Pipeline,Trainer to support torch2.0 https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=46869415 7. Fix bug: Lrscheduler builder does not support torch2.0 8. Add callbacks for trainer https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48210342 Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11849932	2023-03-09 21:33:35 +08:00
yuze.zyz	7181e667f6	Refactor hooks Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11651547	2023-02-28 13:51:01 +08:00
laiyin.lyc	09d2296f36	[to #44847108 ] add sparsity hook (pst algorithm) Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10198228 * [to #44847108] add sparsity hook (pst algorithm)	2022-10-11 16:05:20 +08:00

Author

SHA1

Message

Date

yuze.zyz

a0bc5549a1

trainer support parallel_groups

Design doc: https://yuque.alibaba-inc.com/suluyan.sly/yh1rvu/yx0owblyebpa2b3l?singleDoc#flU3s

1. Add parallel_group field in trainer to support DP, TP, PP.
2. Move the construction of common hooks(except optimizer/lrscheduler hook) to trainer's init method to support after_init stage.
	after_init is to support DP, TP, PP's initializing
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
3. Add before_eval/after_eval stage to support model wrapping.
	to solve the order problem of apex amp & ddp wrapping.
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
4. Exporter supports lazy importing.
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48122780
5. Fold all megatron imports to megatron hook.
         https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48099986
6. Add compile method to TorchModel ,Pipeline,Trainer to support torch2.0
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=46869415
7. Fix bug: Lrscheduler builder does not support torch2.0
8. Add callbacks for trainer
	https://aone.alibaba-inc.com/v2/workitem#viewIdentifier=1c46ee8637e0c978f115b6f7&openWorkitemIdentifier=48210342
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11849932

2023-03-09 21:33:35 +08:00

yuze.zyz

7181e667f6

Refactor hooks

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11651547

2023-02-28 13:51:01 +08:00

laiyin.lyc

09d2296f36

[to #44847108 ] add sparsity hook (pst algorithm)

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10198228

    * [to #44847108] add sparsity hook (pst algorithm)

2022-10-11 16:05:20 +08:00

3 Commits