SingSGD

논문제목: SingSGD

큰 신경망을 학습하는데에는 여러 gpu를 사용해야 한다. 이 상황에서 gradient들 을 합칠 때 bottleneck이 생긴다. gradient를 전송하기 전에 압축하면 비용을 낮출 수 있다. signSGD는 minibatch stochastic gradient 별로 sign 값을 취해서 해당 문 제점을 해결한다. 즉, 각 worker로 하여금 1-bit 으로 gradient를 압축하여 전달하 라는 것이다. Stochastic gradient에 sign을 취한 것은 실제 gradient를 approximation한 것으로 기존의 SGD보다 분석에 어려움이 있었다. 제안된 업데 이트 방식은 다음과 같다. 위와 같은 majority vote을 이용하여 각 work 사이의 communication을 효과적으 로 할 수 있고, variance reduction도 증명되었다. SingSGD는 l1 geometry 문제들 에 효과적이다.

'컴퓨터공학 > 딥러닝 논문리뷰' 카테고리의 다른 글

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction (0)	2023.06.25
Adam: A method for stochastic optimization (0)	2023.06.24
A decision-theoretic generalization of on-line learning and an application to boosting (0)	2023.06.17
Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity. (0)	2023.06.16
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction 리뷰 (0)	2021.12.04

SingSGD

'컴퓨터공학 > 딥러닝 논문리뷰' 카테고리의 다른 글

'컴퓨터공학/딥러닝 논문리뷰' Related Articles

티스토리툴바