Papers of the day   All papers

Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Comments

Greg Yang: 1/ Why do wide, random neural networks form Gaussian processes, *regardless of architecture*? Let me give an overview in case you are too lazy to check out the paper https://arxiv.org/abs/1910.12478 or the code https://github.com/thegregyang/GP4A. The proof has two parts… https://t.co/cKtfpRGMQd

9 replies, 1076 likes


Greg Yang: 1/ I can't teach you how to dougie but I can teach you how to compute the Gaussian Process corresponding to infinite-width neural network of ANY architecture, feedforward or recurrent, eg: resnet, GRU, transformers, etc ... RT plz💪http://arxiv.org/abs/1910.12478 https://t.co/TgCBmf1OcA

4 replies, 398 likes


Greg Yang: RNNs and batchnorm will be coming soon, but you can already play with them here https://github.com/thegregyang/GP4A The general theory for this is based on tensor programs https://arxiv.org/abs/1902.04760 https://arxiv.org/abs/1910.12478 Give Neural Tangents a try and let us know what you think!

1 replies, 250 likes


Greg Yang: 1/ Neural networks are Gaussian Processes --- the Poster Edition from #NeurIPS2019 last week. In case you missed it, here’s a twitter version of the poster presentation, following the format of @colinraffel; and here’s the previous tweet thread https://twitter.com/TheGregYang/status/1202608248534077440?s=20 https://t.co/lHJgH43gqa

1 replies, 209 likes


Greg Yang: 1/2 A wide NN w/ rand weights is a GP, aka Neural network-Gaussian process (NNGP) correspondence https://arxiv.org/abs/1910.12478. @G_Naveh @HSompolinsky et al show it also occurs when *training the NN w/ weight decay & grad noise* https://arxiv.org/abs/2004.01190 Neat! https://t.co/anDCgS0M5v

4 replies, 173 likes


Microsoft Research: Explore the open-source implementations of the Gaussian Process kernels of simple RNN, GRU, transformer, and batchnorm+ReLU network on GitHub: https://github.com/thegregyang/GP4A

0 replies, 47 likes


Greg Yang: Hit me up @NeurIPSConf if you wanna learn more about wide neural networks and come to my poster session on Wednesday 5pm to 7pm, east exhibition hall B+C, poster #242 https://whova.com/webapp/event/program/839448/ https://t.co/YUXBuYMU2N

0 replies, 28 likes


Greg Yang: @vincefort @PhilippMarkolin @_hylandSL well it's funny you mention this, since GPT3 is just a transformer and you can play with such a kernel on colab right here https://colab.research.google.com/github/thegregyang/GP4A/blob/master/colab/Transformer.ipynb :) The NTK is https://colab.research.google.com/github/thegregyang/NTK4A/blob/master/colab/Transformer-NTK.ipynb. Papers GP4A: https://arxiv.org/abs/1910.12478 NTK4A: https://arxiv.org/abs/2006.14548

2 replies, 26 likes


Andrey Kurenkov 🤖 @ Neurips: This Twitter thread by @TheGregYang, as well as the associated poster (which I stopped by today, hope you dont mind the not so grear pic 😅), is a great example of communicating tricky math stuff with both depth and accessible & concise clarity! We should all strive for this! :) https://t.co/ZJ1J8Hqdvb

2 replies, 25 likes


Greg Yang: 2/ This paper is 2nd in the *tensor programs* series, following https://arxiv.org/abs/1910.12478 that proves the architectural universality of NNGP correspondence. This series aims to systematically scale up theoretical insights in toy cases to neural networks in practice.

1 replies, 12 likes


Greg Yang: Pairs best with the paper https://arxiv.org/abs/1910.12478 and previous discussion https://twitter.com/TheGregYang/status/1189174848611745792?s=20

1 replies, 5 likes


Greg Yang: @andrewgwils 1/2 This prior for DNNs has been studied recently (extending Neal's work) in the limit of infinite width https://arxiv.org/abs/1711.00165 https://arxiv.org/abs/1810.05148 https://arxiv.org/abs/1804.11271 in particular https://arxiv.org/abs/1910.12478 shows this prior is a GP for *any* DNN architecture

1 replies, 4 likes


Greg Yang: 5/ So it remains to calculate the NNGP kernel and NT kernel for any given architecture. The first is described in http://arxiv.org/abs/1910.12478 and in this thread https://twitter.com/TheGregYang/status/1202608248534077440?s=20

1 replies, 4 likes


Greg Yang: @sschoenholz @stormtroper1721 @alanyttian Thanks for ping, Sam! Here is, for example, a thread on why all NNs look like Gaussian Processes at initialization. https://twitter.com/TheGregYang/status/1202608248534077440?s=19

0 replies, 3 likes


Nicole Radziwill: this is super cool. thanks @BruceTedesco for RTing it

0 replies, 3 likes


Charles 🎉 Frye: @FeiziSoheil Strong recommendation for covering the work of @yasamanbb, @jaschasd , @TheGregYang, and others on the gaussian processes approach to understanding DNNs Tensor Programs: https://arxiv.org/abs/1910.12478 Extension to Attention: https://arxiv.org/abs/2006.10540

1 replies, 3 likes


Matios Berhe: I’m not skilled enough to know why this makes me nervous cc:@paulportesi

0 replies, 1 likes


Kevin Yang 楊凱筌: Another poster I'm really excited to see. I'm basically a sucker for anything that has GPs and NNs together.

0 replies, 1 likes


Hacker News: Wide Neural Networks of Any Architecture Are Gaussian Processes: https://arxiv.org/abs/1910.12478 Comments: https://news.ycombinator.com/item?id=21651113

0 replies, 1 likes


Sham Kakade: cool stuff from @TheGregYang: Tensors, Neural Nets, GPs, and kernels! looks like we can derive a corresponding kernel/GP in a fairly general sense. very curious on broader empirical comparisons to neural nets, which (potentially) draw strength from the non-linear regime!

0 replies, 1 likes


Content

Found on Dec 05 2019 at https://arxiv.org/pdf/1910.12478.pdf

PDF content of a computer science paper: Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes