Presentation Title

Mean Field Analysis of Recurrent Neural Networks

Presenter Information

Sarang MittalFollow

Faculty Mentor

Maria Spiropolu

Start Date

18-11-2017 9:59 AM

End Date

18-11-2017 11:00 AM

Location

BSC-Ursa Minor 142

Session

Poster 1

Type of Presentation

Poster

Subject Area

physical_mathematical_sciences

Abstract

We analyze the dynamics of recurrent neural networks whose initial weights and biases are randomly distributed. Utilizing mean field theory and making some mean field approximations, we develop a theoretical basis for length scales which limit the depth of signal propagation in a one layer recurrent network with long input or prediction sequences. For specific hyperparameter choices, these length scales diverge. Building on the conclusions of Schoenholz et. al. (1), we argue that random recurrent networks can only be trained if the input signal can propagate through them. This suggests that recurrent networks can be trained with arbitrarily long sequences provided they are initialized sufficiently close to criticality. We begin searching for empirical evidence in support of our conclusions, and provide some preliminary results.

This document is currently not available here.

Share

COinS
 
Nov 18th, 9:59 AM Nov 18th, 11:00 AM

Mean Field Analysis of Recurrent Neural Networks

BSC-Ursa Minor 142

We analyze the dynamics of recurrent neural networks whose initial weights and biases are randomly distributed. Utilizing mean field theory and making some mean field approximations, we develop a theoretical basis for length scales which limit the depth of signal propagation in a one layer recurrent network with long input or prediction sequences. For specific hyperparameter choices, these length scales diverge. Building on the conclusions of Schoenholz et. al. (1), we argue that random recurrent networks can only be trained if the input signal can propagate through them. This suggests that recurrent networks can be trained with arbitrarily long sequences provided they are initialized sufficiently close to criticality. We begin searching for empirical evidence in support of our conclusions, and provide some preliminary results.