boop, here's the github repository, since it's not immediately clear how to get back to the repo from the notebook (and again I want to thank the upstream researchers for making their code available!)

@halcy it's binder! it's like colab except, uh, it's still jupyter notebook and you can host it yourself and it's not owned by google, highly recommended

if you're curious about this poetry model, the code is now online and you can play around with it here: (this will launch a cloud-hosted anonymous jupyter notebook—if you're not familiar with jupyter notebooks, just click on the cells with code in them and hit shift+enter, starting from the top)

@jonbro it's not really usable in any sense at this point! but yes, education contexts was exactly what I had in mind.

(if you don't know, p5.js is a creative coding framework, based on Processing, written in JavaScript:

last week at the p5.js contributor's conference, I made a very (extremely) rough prototype of what I think a p5.js-specific notebook environment might look like, and wrote up a bit of my research and principles behind the prototype:

the main trade-off of training a VAE is reconstruction fidelity vs. structure of the latent space, and the epoch 3 version of this model was much "better" at reconstructing arbitrary inputs. at epoch 12, it reconstructs inputs to appear more like the training data, which I kind of prefer? here's smashmouth again with reconstructions from this model

picking a random point in the latent space, then doing greedy decoding from various randomly selected nearby points (essentially generating variations on a line)—

@halcy there's a bit more structural variety with beam search, e.g.:

And rush'd from the city of his son,
And rush'd from the city of his son,
And rush'd from the palace of his son,
And rubs on his former tower,
And tossed with a hundred pounds,
And tossed with a hundred years,
Plucked with a hundred years,
Lambs on his country's command,
Drown'd by a country's throne,
Raging from the city of his father's throne,
And rear'd by the chief's son,
And rush'd from the city of his son,

@halcy not bad:

The sun is passing by one,
The living stream is one,
The living stream is one,
The living creature of a man,
The passing flag of a creature of man,
The passing foot of beauty is a foe,
From out of footsteps of a man of man,
From him of warlike and a man's eyes,
From heaven's mighty hand of the foe,
From heaven of footsteps of a foe,
The sunbeam is a foe,
The sun is passing by one,

@halcy I'm trying to work through it because it is a cool idea. what are latents_a, latents_b, latents_c in that code?

@halcy yeah that was the whole point of training this model and it's working GREAT, here's a greedy decoding of a linear interpolation between two random samples:

Sorrow than more, merrily
Sorrow of human fancy, unto aid
Sorrow of beauty, never beheld me
Good things of beauty, never slumbering
Good things of love! No longer to show
And sweet of love! No longer to behold
And love the world is lovely, my soul
And hear the world of beauty, never to me!
And thou art thou art thou, my soul!

(for reference I've never managed to train a sequence VAE before where greedy decoding of samples did anything but produce stuff like "the heart of the love the heart of hearts the love heart")

greedy decoding from random samples at epoch 12, just at the right point before the model started to collapse. these are actually sorta... breathtaking?

(you can see the subword embeddings at work here—it's learning "diverged" as "diver"+"ged" I think, hence the made-up words "freged" and "alterged" in the interpolation)

this model (after only 3 epochs!) is also much better at generating grammatical & semantically-appropriate interpolations between lines:

Two roads diverged in a yellow wood,
Two roads freged in a yellow wood,
Two roads alterged in a yellow wood,
As are lodged with a green tower,
As we weigh them in a trees street,
And which alter them of the midnight.
And that has made the music of.
And that has made all the smallest.
And that has made all the difference.

(the variational autoencoder works by squeezing down sequences of arbitrary length to fixed-length vectors, then trying to reconstruct the sequences on the other side—this is maybe a bit TOO good at reconstructing the input, or at least guessing semantically similar words—I might retrain with a smaller latent vector!)

(uh, this is that smash mouth song if you didn't guess)

now training a variational autoencoder neural network on the gutenberg poetry corpus but with pre-trained subword embeddings—it's MUCH better at reconstructing inputs now—here's a reconstruction of a little number you might know

