References Summary

papers notes

发布时间 : 2021-03-02 17:42

阅读 :

评论:

Points
Work on Pipeline Paralleism
Work on GPU memory -> CPU memory
Models

Points

item	Source	Reference
SOTA的神经网络的参数大致每2.4年翻一番	Ref 10	Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
更大的DNN模型在复杂任务上能获得更好的精度	Ref 16,47,55,58,60
考虑到收敛性，工业界一般很少用异步训练	Ref 47	Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei Lin, and Yangqing Jia. 2019. Characterizing Deep Learning Training Workloads on Alibaba-PAI. arXiv preprint arXiv:1910.05930 (2019).

Work on Pipeline Paralleism

Papers	Illustration
NIPS’19 Gpipe	将全局batch size切分成多个micro-batches，注入到pipeline中提高效率
PPoPP’21 DAPPLE
SOSP’19 PipeDream

Work on GPU memory -> CPU memory

Papers	Illustration
arXiv’21 ZeRO-Offload	在ZeRO工作的基础之上增加了Offload，支持多机多GPU
ASPLOS’20 SwapAdvisor	扩大搜索空间，结合scheduling和memory allocation，给出一个swap plan。不适用于多GPU，仅适合静态图
CoRR’19 Megatron-LM	从用模型并行，可以训练数百亿个参数的模型
arXiv’20 L2L	支持训练很深的Transformer网络，原理是同一时间仅在显存中保留一个Transformer块
SC’20 ZeRO	通过消除GPU之间的数据冗余，增强了数据并行，可以训练更大的模型
EuroSys’18 TensorFlow’s swap extension	swap工作，没有利用DNN中已知的flow信息来优化swap
arXiv’18 TFLMS, MICRO’16 vDNN	swap工作，仅交换了根据拓扑排序确定的激活Tensor
PPoPP’18 SuperNeurons	swap工作，仅交换了卷积操作的数据

Models

Model	Illustration
Wide ResNet	加宽了的ResNet，模型很大

转载请注明来源，欢迎对文章中的引用来源进行考证，欢迎指出任何有错误或不够清晰的表达。

©2021 zzqq2199

Built with Hexo and 3-hexo theme