computer-generated "recipes" that I made as an example in the workshop I'm teaching. the instructions are composed of random transitive verbs plus random direct objects from Bob Brown's _The Complete Book of Cheese_

oh my gods. they literally have no shame about this.

GitHub Support just straight up confirmed in an email that yes, they used all public GitHub code, for Codex/Copilot regardless of license.

doesn't do so well at the inverse task, i.e., generating with the probabilities of any token containing a vowel letter OTHER than 'E' zeroed out

getting a language model to write lipograms by simply zeroing out the probability of any token in the vocabulary that has a particular letter in it (in this case, 'E')

#Github #Copilot gives an idea why #Microsoft paid so much for Github. They were after data: Tons of food for their AI, millions of contributors that now 'work' for MS for free.
You publish your code under GPLv3, even AGPLv3? So what? The AI learns from your code and uses it to generate code that is possibly proprietary. Does #GPL forbid this practice? (I don't think so)

That's the M$ way to break copyright law.

It's time for alternatives like @codeberg .

the university of milan has released over four hundred meows for non-commercial and research purposes (via

Lately I've been reading a lot of children's picture books, over and over? I thought "Goodnight Moon" was pretty spooky, but I had trouble finding anyone writing about that online. @redoak jokingly suggested that I become the conspiracy theorist blogger I want to see in the world, so... I did it. Here's a totally serious take on why "Goodnight Moon" is an esoteric text, from me, a serious scholar of esotericism (aka podcast listener):

logit biasing, markov chain style. here I'm doing it with phonetics—basically I check the possible outcomes for each context, and then artificially boost the probability of predictions that have certain phonetic characteristics. (in this case, more /k/ and /b/ sounds)

(tomorrow I'm going to see if stealing alternatives from similar ngrams helps... but I am beginning to more viscerally understand why the solution to language modeling that really caught on is just... More Training Data)

I like having this extra setting to fiddle with! but based on my limited testing, the temperature doesn't really matter once the length of the ngram hits a certain limit, since most ngrams only have one or two possible continuations. like... with word 3-grams, it's pretty difficult to distinguish 0.35 from 2.5

