computer-generated "recipes" that I made as an example in the workshop I'm teaching. the instructions are composed of random transitive verbs plus random direct objects from Bob Brown's _The Complete Book of Cheese_ gutenberg.org/ebooks/14293

doesn't do so well at the inverse task, i.e., generating with the probabilities of any token containing a vowel letter OTHER than 'E' zeroed out

Show thread

getting a language model to write lipograms by simply zeroing out the probability of any token in the vocabulary that has a particular letter in it (in this case, 'E')

logit biasing, markov chain style. here I'm doing it with phonetics—basically I check the possible outcomes for each context, and then artificially boost the probability of predictions that have certain phonetic characteristics. (in this case, more /k/ and /b/ sounds)

Show thread

I like having this extra setting to fiddle with! but based on my limited testing, the temperature doesn't really matter once the length of the ngram hits a certain limit, since most ngrams only have one or two possible continuations. like... with word 3-grams, it's pretty difficult to distinguish 0.35 from 2.5

Show thread

generating with a markov chain using softmax sampling w/temperature (a la neural networks). this is an order 3 character model, and you can really see the difference between low temperature (instantly starts repeating itself) and high temperature (draws from wacky corners of the distribution) (if you've generated text with a markov chain before, it's probably using what amounts to a temperature of 1.0)

Show thread

here it is working on an oov ngram ("you ate books" is not an ngram that appears in Frankenstein. all of this is trained on Frankenstein, I guess I forgot to mention that)

Show thread

another way to find similar ngram contexts: each context has an embedding derived from the sum of positional encoding (they're not just for transformers!) multiplied by "word vectors" (actually just truncated SVD of the transpose of the context matrix). then load 'em up in a nearest neighbor index

(this is cool because I can use it even on ngrams that *don't* occur in the source text, though all of the words themselves need to be in the vocabulary)

Show thread

poking at the edges of markov chain text generation... here I'm using truncated SVD to find similar ngrams, based on the tokens that follow them. (the goal is to add variety to the generation process by plucking possible next tokens from those following similar ngrams)

someday I should develop a poetics where the success condition is something other than "yeahhh now it's giving me a good headache" but... today is not that day

Show thread

hey all, here's a new computer thing I made! it's called the Nonsense Laboratory, and it's a series of weird little tools for manipulating the spelling and sound of words with machine learning: artsexperiments.withgoogle.com

it's part of a series of projects launched yesterday showing the outcomes of the Artists + Machine Intelligence grant program, which you can learn more about here: experiments.withgoogle.com/ami

thinking about the subset of users now forever trapped in an endless webinar

today I learned that Tracery Writer doesn't quote or filter HTML tags

more fun with distilbert! this technique: (1) forward pass of model to transformer hidden state (2) add random noise to hidden state (3) predict tokens from the modified hidden state (noise in each line has increased intensity of noise)

generating 26 little poems about the alphabet by boosting the probability of tokens containing each letter during DistilGPT2 generation

conditional dcgan progress 

I sorta gave up on having the same model produce different fonts—it just didn't work and the samples across classes weren't similar for the same latent variable (which was the effect I was going for in the first place). HOWEVER, I am super pleased with the samples from the model I'm training on Garamond italics...

Show thread

conditional dcgan progress 

this is so tantalizingly close to what I want—I'm training the GAN on images of words, conditioned on labels for different text styles (italics, all caps, title case, etc)—you can clearly see many of the different styles in this sample (trained on about 100k images). I managed to avoid mode collapse, but the GAN unfortunately fails to converge (after 200k images, the generator just makes white noise)

Show thread
Show older
Friend Camp

Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.

<svg xmlns="http://www.w3.org/2000/svg" id="hometownlogo" x="0px" y="0px" viewBox="25 40 50 20" width="100%" height="100%"><g><path d="M55.9,53.9H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,53.9,55.9,53.9z"/><path d="M55.9,58.2H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,58.2,55.9,58.2z"/><path d="M55.9,62.6H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,62.6,55.9,62.6z"/><path d="M64.8,53.9c-0.7,0-1.3,0.6-1.3,1.3v8.8c0,0.7,0.6,1.3,1.3,1.3s1.3-0.6,1.3-1.3v-8.8C66,54.4,65.4,53.9,64.8,53.9z"/><path d="M60.4,53.9c-0.7,0-1.3,0.6-1.3,1.3v8.8c0,0.7,0.6,1.3,1.3,1.3s1.3-0.6,1.3-1.3v-8.8C61.6,54.4,61.1,53.9,60.4,53.9z"/><path d="M63.7,48.3c1.3-0.7,2-2.5,2-5.6c0-3.6-0.9-7.8-3.3-7.8s-3.3,4.2-3.3,7.8c0,3.1,0.7,4.9,2,5.6v2.4c0,0.7,0.6,1.3,1.3,1.3 s1.3-0.6,1.3-1.3V48.3z M62.4,37.8c0.4,0.8,0.8,2.5,0.8,4.9c0,2.5-0.5,3.4-0.8,3.4s-0.8-0.9-0.8-3.4C61.7,40.3,62.1,38.6,62.4,37.8 z"/><path d="M57,42.7c0-0.1-0.1-0.1-0.1-0.2l-3.2-4.1c-0.2-0.3-0.6-0.5-1-0.5h-1.6v-1.9c0-0.7-0.6-1.3-1.3-1.3s-1.3,0.6-1.3,1.3V38 h-3.9h-1.1h-5.2c-0.4,0-0.7,0.2-1,0.5l-3.2,4.1c0,0.1-0.1,0.1-0.1,0.2c0,0-0.1,0.1-0.1,0.1C34,43,34,43.2,34,43.3v7.4 c0,0.7,0.6,1.3,1.3,1.3h5.2h7.4h8c0.7,0,1.3-0.6,1.3-1.3v-7.4c0-0.2,0-0.3-0.1-0.4C57,42.8,57,42.8,57,42.7z M41.7,49.5h-5.2v-4.9 h10.2v4.9H41.7z M48.5,42.1l-1.2-1.6h4.8l1.2,1.6H48.5z M44.1,40.5l1.2,1.6h-7.5l1.2-1.6H44.1z M49.2,44.6h5.5v4.9h-5.5V44.6z"/></g></svg>