This work extends the space of probabilistic models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space.Expand

The utility of using Recurrent Neural Networks to model student learning and the learned model can be used for intelligent curriculum design and allows straightforward interpretation and discovery of structure in student tasks are explored.Expand

This work introduces a method to stabilize Generative Adversarial Networks by defining the generator objective with respect to an unrolled optimization of the discriminator, and shows how this technique solves the common problem of mode collapse, stabilizes training of GANs with complex recurrent generators, and increases diversity and coverage of the data distribution by the generator.Expand

This work shows that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.Expand

The theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.Expand

We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing… Expand

The presence of dropout destroys the order-to-chaos critical point and therefore strongly limits the maximum trainable depth for random networks, and a mean field theory for backpropagation is developed that shows that the ordered and chaotic phases correspond to regions of vanishing and exploding gradient respectively.Expand

This work introduces a modification to the continuous relaxation of discrete variables and shows that the tightness of the relaxation can be adapted online, removing it as a hyperparameter, leading to faster convergence to a better final log-likelihood.Expand

This work experimentally characterize the effects of increasing the batch size on training time, as measured by the number of steps necessary to reach a goal out-of-sample error, and study how this relationship varies with the training algorithm, model, and data set, and finds extremely large variation between workloads.Expand

This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.Expand