anyone out there working on variations of copyleft-ish licenses that include provisions for machine learning models trained on the content in question? (i'd personally like to have, e.g., MIT and CC-BY licenses that include a clause like "any model trained on this must also include the attribution")
@arivigo this is great, thank you! I agree that code generated from a model like Codex could be used in a way that infringes on copyright (even though OpenAI/GitHub seem to claim that it cannot). what I'm proposing is slightly different, I think, which is an explicit requirement in the license that if you train a model using my work in the data set, you must attribute me when distributing or deploying the model
@aparrish I'd say licenses already cover that by referring to copying the source or any part of a work of yours.
OK, this isn't legal advice (IANAL and such...) but what I would suggest you to do is to explicitly add in the "training a model" reference to cover your bases. The worst it could happen, IMO, is that it'd be redundant with the general need to attribute your work by using parts of it.
Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.