GPipe startup

  1. Open-Source Repository
  2. Software versions requirements
  3. Startup
    1. docker(pytorch)
    2. in docker(gpipe)
    3. run code
  4. Results
    1. ResNet101
  5. Question

Open-Source Repository

Software versions requirements

travis.yml

Startup

docker(pytorch)

  1. docker pull pytorch/pytorch:1.4-cuda10.1-cudnn7-devel
  2. nvidia-docker run -itd --name=gpipe --net=host -v=/data:/data pytorch/pytorch:1.4-cuda10.1-cudnn7-devel bash
  3. docker exec -it gpipe bash

in docker(gpipe)

  1. pip install torchgpipe

run code

  1. git clone https://github.com/kakaobrain/torchgpipe
  2. cd torchgpipe
  3. cd benchmarks/resnet101-speed
  4. vim main.py # 修改batchsize,否则在1080Ti上跑不起来
  5. python main.py pipeline-4

Results

ResNet101

balance-type throughput Mem_GPU0 Mem_GPU1 Mem_GPU2 Mem_GPU3 balance-value
by_time ~65 2157 3445 2357 2847 [66,99,111,94]
by_size ~61 2107 6485 2397 2847 [46,105,125,94]
maximize speed ~65 2091 5049 2195 2505 [44,92,124,110]

Question

  1. 参数chunks的意义与作用?chunks argument specifies the number of micro-batches. 值每个gpu上有几个chunks。

转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。