anyone out there working on variations of copyleft-ish licenses that include provisions for machine learning models trained on the content in question? (i'd personally like to have, e.g., MIT and CC-BY licenses that include a clause like "any model trained on this must also include the attribution")
@xuv interesting! yeah this is totally relevant. but i think there's still a need for a license for individual works (since e.g. my blog posts might end up in whatever database)
@clacke (I think it's important to phrase this differently—the "machines" don't "train on" anything. people train models on particular data for particular purposes.) the "going legal theory" that models aren't derivative is obvious nonsense in my mind... the idea behind a license that explicitly requires attribution on trained models would be to change the practice of exploiting labor in this way through public shaming & at least a vague threat of legal action
@aparrish Is there any reason you can't add an extra clause to the MIT/BSD license with that requirement? Those licenses are pretty easily extensible
@aparrish (I'm a bit confused because the examples you gave aren't generally considered copyleft, or at least wouldn't be by the FSF)
@n yeah I'm using "copyleft-ish" in a sorta umbrella way (probably inappropriately), not sure how to encompass everything I want to encompass otherwise
@n of course I *could*, but I'm wondering if people are already working on the problem of wording, publicity, legality, etc.
@aparrish I work on copyright as my job. I highly recommend you read my blog post on GitHub Copilot I wrote some weeks ago (link below!)
@arivigo “No, just kidding, the blame would all be yours.” Made me laugh, again. I agree with you, and that little joke just twists it all around. Perfect. 😆
@arivigo this is great, thank you! I agree that code generated from a model like Codex could be used in a way that infringes on copyright (even though OpenAI/GitHub seem to claim that it cannot). what I'm proposing is slightly different, I think, which is an explicit requirement in the license that if you train a model using my work in the data set, you must attribute me when distributing or deploying the model
@aparrish I'd say licenses already cover that by referring to copying the source or any part of a work of yours.
OK, this isn't legal advice (IANAL and such...) but what I would suggest you to do is to explicitly add in the "training a model" reference to cover your bases. The worst it could happen, IMO, is that it'd be redundant with the general need to attribute your work by using parts of it.
Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.