Generating Diverse High-Fidelity Images with VQ-VAE-2


Aäron van den Oord: VQVAE-2 finally out! Powerful autoregressive models in a hierarchical compressed latent space. No modes were collapsed in the creation of these samples ;) Arixv: With @catamorphist and @vinyals More samples and details 👇 [thread]

Oriol Vinyals: Surprising how simple ideas can yield such a good generative model! -Mean Squared Error loss on pixels -Non-autoregressive image decoder -Discrete latents w/ straight through estimator w/ @catamorphist & @avdnoord VQ-VAE-2: Code:

Oriol Vinyals: Great post by Prof. David McAllester on why discrete representations matter, based on our findings in VQ-VAE2. "Vector quantization seems to be a minimal-bias way for symbols to enter into deep models."

Ben Poole: Big hierarchical VQ-VAEs with autoregressive priors do amazing things. Awesome work from @catamorphist @avdnoord @OriolVinyalsML:

roadrunner01: Generating Diverse High-Fidelity Images with VQ-VAE-2 pdf: abs:

Gene Kogan: For most of the creatives/non-scientists out there, this may seem like just another BigGAN/StyleGAN, but this has important advantages: It's likelihood-based (can be evaluated formally), samples much faster, and should be superior in generator diversity. Really good stuff

Xander Steenbrugge: Generative Modelling space on fire! After Google's #BigGan and Nvidia's #StyleGAN we now finally have autoencoder based models that generate samples of equal/better? quality! The sample diversity is especially striking given that mode collapse has always been an issue for GANs.

Kyle McDonald: VAE-style networks have surpassed the quality of BigGAN and StyleGAN. i always knew they had it in them 🎉

Max Jaderberg: Insanely good samples from the latest incarnation of the VQVAE generative model

François Fleuret: Beside the quantitative evidences that they are more robust to mode collapse than GANs, their roots in "classical" density estimation make VAE more promising as a generic tool. We have "good enough classifiers" since 2015, maybe are we also good for density models...

Danilo J. Rezende: Great results on generative modelling from @catamorphist, @avdnoord and @OriolVinyalsML !

Simon Kornblith: @carlesgelada @timnitGebru @ylecun Here's my favorite example of this, from The left are samples from VQ-VAE2; the right are from BigGAN; both are trained on ImageNet. Isn't it kind of obvious what's going to happen if these algorithms are a trained on a face dataset?

Daisuke Okanohara: VQ-VAE-2 improves VQ-VAE by using1) hierarchical latent variables 2) a prior distribution that matches the marginal posterior using an auto-regressive model with self-attention; achieving diverse and high-fidelity image generation.

d00d: VAE based image generation with quality comparable to GAN generated images, but more variety and faster sampling...

Alex Nichol: The VQ-VAE-2 paper is hilariously vague. E.g. "It consists of a few residual blocks followed by a number of strided transposed convolutions". (Paper:

René Schulte: Impressive new step for generated images. The below photos are all synthesized by an AI 👌 Instead of a GAN they use a Vector Quantized Variational AutoEncoder (VQ-VAE) which makes it easier to handle and much faster. 🚀 #AI #DeepLearning #ML #DNN

Kaixhin: I love this mix between very general (autoregressively-decoded discrete sequences), general (hierarchical structure) and specific (local spatial structure) priors to model complex distributions in the real world 🌏

Alex J. Champandard: 2/ At this stage, we know it's possible to generate HD images with many/most techniques. NVIDIA built StyleGAN, OpenAI developed GLOW, DeepMind created VQVAE, etc. Everyone has their favorite! 🐕 .

Alex J. Champandard: 6/ The idea of working in a smaller and coarser space is not new. It's what made GANs scale to 1024x1024 in the first place (progressive growing) and it's the idea that helped VQVAE catch up.

Kyle Kastner: @selimonder This paper (along with a few others recently such as , are exploiting the multi-scale structure inherent in audio and images. That kind of structure is much harder to get *easily* in language - dependency which may not be local

Seth Stafford: THIS: "The . shift from symbolic logic . to distributed vector representations is . viewed as [a] cornerstone of . deep learning . I . believe . logical symbolic reasoning is necessary for AGI. Vector quantization seems . a minimal-bias way for symbols to enter . deep models."

JFPuget Wash Hands Social Distancing Wear Mask: A clear example of two different ML algorithms. The one on the left has much more diverse outputs (and better quality ones) Not sure though it is less biased.

