another change I made was having it train on bitmaps of random words weighted by the frequency of the words in a reference corpus (i.e. in this case spaCy's unigram probabilities). the idea was that this would help it learn higher-frequency letter combinations and generate words that mostly replicate the "look" of English in use (rather than words in a word list). the drawback is that it looks like half the latent space is trying to spell out "the"
definitely bit off more than I could chew when it comes to making something that I feel is conceptually sound with this. the instant temptation is to go full "alien artifact" (and include GAN-generated body horror imagery or whatever), or at least make page layouts that resemble those of typical novels. but then the project feels like it's "about" layout, or "about" books as artifacts, which aren't topics that I personally care to spend time making arguments about at the moment
had an inkling to train a separate model for words with initial capitals, so I can introduce some structure (like sentences and paragraphs). the drawback here being that it won't have the same latent space as the lower-case model so interpolations won't work across the two. (training a separate model also for words with final punctuation)
it does full justification and indentation now! shown here zigzagging through interpolations of the latent space. (the capitalized words and end-of-sentence words are separately trained models, which is why they don't look like the surrounding words)
Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.