Unreasonable effectiveness of noise: training deep neural networks by optimizing over hyperparameters paths
Hyperparameter optimization remains important problem in training of deep architectures. Despite many recent advances most of the approaches intrinsically linked to the sampling of hyperparameter space or greedy search.
We show that at a negligible additional computational cost, results can be improved by sampling nonlocal paths instead of points in hyperparameter space. To this end, we interpret hyperparameters as controlling the level of correlated noise in training, which can be mapped to an effective temperature. We then perform training in the joint hyperparameter/model-parameter space with an optimal training protocol corresponding to the path in this space.
We observe faster training and improved resistance to overfitting and show a systematic decrease in the absolute validation error, improving over benchmark results.