horde-ad-0.2.0.0: Higher Order Reverse Derivatives Efficiently - Automatic Differentiation
Safe HaskellNone
LanguageGHC2024

TestMnistRNNS

Description

Tests of MnistRnnShaped2 recurrent neural networks using a few different optimization pipelines.

Not LSTM. Doesn't train without Adam, regardless of whether mini-batches used. It does train with Adam, but only after very carefully tweaking initialization. This is extremely sensitive to initial parameters, more than to anything else. Probably, gradient is vanishing if parameters are initialized with a probability distribution that doesn't have the right variance. See https://stats.stackexchange.com/questions/301285/what-is-vanishing-gradient. Regularization/normalization might help as well.

Documentation