Follow

another day, another VAE 

working on a keras/tf adaptation of this paper: arxiv.org/pdf/1911.05343.pdf on character data (words from cmu dict) and I ended up with very good reconstruction loss and bad (I think?) KL loss, like 0.08? the space seems to be smooth when I do interpolations:

moses
mosess
mosees
midsets
maddets
madderon
middleton
middleton
middletown

but sampling from a normal distribution is kinda garbage:

manina
kal
agruh
aar
urosh
'louic
cseq
gb
zani
ias
nsny
huinea
a's
om
ntioo
gante

another day, another VAE 

(I would expect those samples to look more like plausible made-up words, at least as plausible as a markov chain or something trained on the same dataset. but it doesn't look like there's been a model collapse, since the model still performs well otherwise? it's also possible I did the math bad somewhere? I guess the next step is to visualize the latent space and see if it does actually look like a normal distribution)

Show thread

another day, another VAE (nonsense words) 

I got very helpful advice today on this, which is that the distribution the VAE learns might not be centered at zero—after averaging together the latent vectors from a few thousand items from the data set and using *that* as the center of the distribution, I get much better results when sampling!

Show thread

another day, another VAE 

@aparrish funnily enough we‘ve had very similar problems when training generative text models - making one good at interpolation or sampling around a point or style transfer sees to work well enough but optimizing for multiple things at once? weirdly finnicky

another day, another VAE 

@halcy that's good to know, and definitely reflects my experience as well. I should have been suspicious about this paper when I saw they didn't show any samples from the latent space I guess! but to be fair this is the first text VAE architecture I've tried that didn't need loss annealing or weird custom training loops

Sign in to participate in the conversation
Friend Camp

Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.