Papers
[2002.05645] Training Large Neural Networks with Constant Memory using a New Execution Algorithm
[1812.04948] A Style-Based Generator Architecture for Generative Adversarial Networks
[2107.06917] A Field Guide to Federated Optimization
[1511.05641] Net2Net: Accelerating Learning via Knowledge Transfer
[2104.03113] Scaling Scaling Laws with Board Games
[2105.12806] A Universal Law of Robustness via Isoperimetry
[2006.10621] On the Predictability of Pruning Across Scales
[2106.05237] Knowledge distillation: A good teacher is patient and consistent
[2009.06807] The Radicalization Risks of GPT-3 and Advanced Neural Language Models
[1712.02950] CycleGAN, a Master of Steganography
[2010.03660] Fast Stencil-Code Computation on a Wafer-Scale Processor
[2205.05131] UL2: Unifying Language Learning Paradigms
[2203.15556] Training Compute-Optimal Large Language Models
[2203.03466] Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
[2108.07686] Untitled Document
[2110.05457] Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World
[1910.07113] Solving Rubik’s Cube with a Robot Hand
[1901.08652] Learning Agile and Dynamic Motor Skills for Legged Robots
[2202.06009] Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
[1608.05343] Decoupled Neural Interfaces using Synthetic Gradients
[1809.02942] Cellular automata as convolutional neural networks
[2102.02579] Regenerating Soft Robots through Neural Cellular Automata
[2103.08737] Growing 3D Artefacts and Functional Machines with Neural Cellular Automata
[2105.07299] Texture Generation with Neural Cellular Automata
[2201.12360] Variational Neural Cellular Automata
[2111.13545] 𝜇NCA: Texture Generation with Ultra-Compact Neural Cellular Automata
[1904.11455] Ray Interference: a Source of Plateaus in Deep Reinforcement Learning
[1609.09106] Untitled Document
[1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
[2103.10948] The Shape of Learning Curves: a Review
[2106.10207] Distributed Deep Learning In Open Collaborations
[2004.08366] DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications
[1812.06162] An Empirical Model of Large-Batch Training
[1802.08864] One Big Net For Everything Technical Report
[2102.01293] Scaling Laws for Transfer
[1906.01820] Risks from Learned Optimization in Advanced Machine Learning Systems
[2205.06175] A Generalist Agent
[2005.14165] Language Models are Few-Shot Learners
[2106.08254] BEiT: BERT Pre-Training of Image Transformers
[2112.09332] WebGPT: Browser-assisted question-answering with human feedback
[2202.08137] A data-driven approach for learning to control computers
[2107.12808] Open-Ended Learning Leads to Generally Capable Agents
[2105.12196] From Motor Control to Team Play in Simulated Humanoid Football
[2009.01719] Grounded Language Learning Fast and Slow
[2012.05672] Imitating Interactive Intelligence
[2110.15349] Learning to Ground Multi-Agent Communication with Autoencoders
[2110.08176] Collaborating with Humans without Human Data
[2201.01816] Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria
[2103.04000] Off-Belief Learning
[2104.07219] Multitasking Inhibits Semantic Drift
[2004.02967] Evolving Normalization-Activation Layers
[2007.03898] NVAE: A Deep Hierarchical Variational Autoencoder
[2011.10650] Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
[1512.03385] Deep Residual Learning for Image Recognition
[2105.04663] GSPMD: General and Scalable Parallelization for ML Computation Graphs
[2107.06925] Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
[2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
[2102.07988] TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
[2104.04657] Meta-Learning Bidirectional Update Rules
[2012.14905] Meta Learning Backpropagation And Improving It
[2003.03384] AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
[1706.03762] Attention Is All You Need
[1610.06258] Using Fast Weights to Attend to the Recent Past
[1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model