Johannes Kepler University of Linz
Title: Modern Hopfield Networks
Abstract: Associative memories are one of the earliest artificial neural models dating back to the 1960s and 1970s. Best known are Hopfield Networks, presented by John Hopfield in 1982. Recently, Modern Hopfield Networks have been introduced, which tremendously increase the storage capacity and converge extremely fast. We generalize the energy function of modern Hopfield Networks to continuous patterns and propose a new update rule. The new Hopfield Network has exponential storage capacity. Its update rule ensures global convergence to energy minima and converges in one update step with exponentially low error. The new Hopfield network has three types of energy minima (fixed points of the update): (1) global fixed point averaging over all patterns, (2) metastable states averaging over a subset of patterns, and (3) fixed points, which store a single pattern. Surprisingly, the transformer attention mechanism is equal to the update rule of our new modern Hopfield Network with continuous states. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they operate in higher layers in metastable states. We provide a new PyTorch layer called “Hopfield”, which allows equipping deep learning architectures with modern Hopfield networks as a new powerful concept comprising pooling, memory, and attention. The layer serves for applications like multiple instance learning, set-based and permutation invariant learning, associative learning, and many more. We show some tasks, for which we could increase the performance by integrating the new Hopfield layer into a deep learning architecture.
Sepp Hochreiter is heading the Institute for Machine Learning, the LIT AI Lab and the AUDI.JKU deep learning center at the Johannes Kepler University of Linz and is director of the Institute of Advanced Research in Artificial Intelligence (IARAI). He is regarded as a pioneer of Deep Learning as he discovered the fundamental deep learning problem: deep neural networks are hard to train, because they suffer from the now famous problem of vanishing or exploding gradients. He is best known for inventing the long short-term memory (LSTM) in his diploma thesis 1991 which was later published in 1997. LSTMs have emerged into the best-performing techniques in speech and language processing and are used in Google’s Android, in Apple’s iOS, Google’s translate, Amazon’s Alexa, and Facebook’s translation. Currently, Sepp Hochreiter is advancing the theoretical foundation of Deep Learning, investigates new algorithms for deep learning, and reinforcement learning. His current research projects include Deep Learning for climate change, smart cities, drug design, for text and language analysis, for vision, and in particular for autonomous driving.