another way to find similar ngram contexts: each context has an embedding derived from the sum of positional encoding (they're not just for transformers!) multiplied by "word vectors" (actually just truncated SVD of the transpose of the context matrix). then load 'em up in a nearest neighbor index
(this is cool because I can use it even on ngrams that *don't* occur in the source text, though all of the words themselves need to be in the vocabulary)
generating with a markov chain using softmax sampling w/temperature (a la neural networks). this is an order 3 character model, and you can really see the difference between low temperature (instantly starts repeating itself) and high temperature (draws from wacky corners of the distribution) (if you've generated text with a markov chain before, it's probably using what amounts to a temperature of 1.0)
I like having this extra setting to fiddle with! but based on my limited testing, the temperature doesn't really matter once the length of the ngram hits a certain limit, since most ngrams only have one or two possible continuations. like... with word 3-grams, it's pretty difficult to distinguish 0.35 from 2.5
logit biasing, markov chain style. here I'm doing it with phonetics—basically I check the possible outcomes for each context, and then artificially boost the probability of predictions that have certain phonetic characteristics. (in this case, more /k/ and /b/ sounds)
@aparrish i love seeing some great comp ling stuff on here! what are you getting after here? (i'm familiar with most of the techniques but am curious about the bigger picture)
Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.