Accelerating Randomly Projected Gradient with Variance Reduction

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 111
  • Download : 0
Parallel training methods of stochastic gradient descent (SGD) for deep learning have attracted a huge attention because of excellent scalability properties. In the parallel training methods, workers and a server exchange their gradient vectors (or parameter vectors) each other. However, the communications between workers and server take a lot of time since the dimension of the gradient vectors is extremely high. Transmission needs to cater for network bandwidth, so message compression is necessary. Although message compression algorithms have been proposed for the parallel training methods, there are concerns about the stability and the performance of the compressed training algorithms. hi general, the compressed training algorithms generate random message vectors, and the variance hinders the training. In this paper, we propose a novel compressed training algorithm that uses random projection with a variance reduction trick. Here, we introduce an average gradient vector that reduces the variance. We test the proposed compression method with AlexNet, ResNet20 models with CIFAR100 data. The proposed algorithm shows almost the same performance as the original SGD while uses 16 times compressed message.
Publisher
IEEE
Issue Date
2020-02
Language
English
Citation

IEEE International Conference on Big Data and Smart Computing (BigComp), pp.531 - 534

ISSN
2375-933X
DOI
10.1109/BigComp48618.2020.00-11
URI
http://hdl.handle.net/10203/288836
Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0