Follow

the delicious irony: creators of industrial language models are now worried about no longer being able to use the web as their "commons" (i.e. other people's labor that they appropriate and commercialize) because their own outputs are "polluting" it (via mailchi.mp/jack-clark/import-a)

the year is 2025. openai lobbies congress to force websites that publish user-generated content to guarantee content is free of synthetic data (and mark it as such w/metadata in the html). google lobbies for compulsory no-cost licensing of all content published to the web, unless the site owner follows [proprietary standard that costs millions to implement]. facebook pays below poverty wages to thousands of contractors in locked, device-free rooms to type sentences, any sentences as LM fodder

@aparrish i am thinking also about the feedback loop from the penalty of "complex text". This is now so embedded in editors and SEO everywhere (as it makes for less "engagement") thus leading to a collective downhill spiral of ever more simplified expression

@aparrish Do… do they not see how they are the industrial polluters in their own metaphor?

@vortex_egg @aparrish I think "this is equivalent to environmental collapse" means they do?

@aparrish

"Here's how to tag your website as containing machine-generated text."

@aparrish

Meanwhile, every niche community of experts having real conversations on forums moved to discord and they were shut down.

Forums of voluntary experts are so massively underrated, they are massive wells of knowledge that are conveyed in the highly trustable form of human conversation. Yah you can fake identities and such but compared to other mediums, fake conversation and beliefs stick out like a sore thumb.

@aparrish Let’s make all web content automatically AGPL and see Google squirm as they realize that then they also have to release everything into the commons. If my content is part of their program (that’s machine learning), then their program must be free.

I bet that there’s GPL content via that route in their programs.

I want commons that takes back what’s taken out.

@ArneBab @aparrish It seems the current legal consensus is that hiding the source data in a machine learning network erases authorship?

@clacke @aparrish in the EU not really — there’s an allowance for research in universities, the rest is between gray and copyright infringement. But almost impossible to prove.

@clacke @aparrish @ArneBab yeah this, the ML model publishers argue that training a model doesn't require permission from the authors of the original content. See GitHub Copilot.

@aparrish So basically they admit to being parasitic on a data-host they are consequently destroying...

@aparrish

So the future of the web is a permanent white noise?

@aparrish this has already happened for some languages

i think there's more automatically translated Belarusian on the web than normal Belarusian

because Google gives higher ratings to unique content, some SEO people publish automatic translations as unique (albeit useless) content. created by robots for robots, these texts make it harder for people to find something useful

i feel a kind of Schadenfreude since English speakers will finally encounter what we have been encountering for years

@aparrish They are kind of getting ahead of themselves here. Even GPT-3 (which cost millions to train) sucks. Its hard to imagine it polluting the web.

Sign in to participate in the conversation
Friend Camp

Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.

<svg xmlns="http://www.w3.org/2000/svg" id="hometownlogo" x="0px" y="0px" viewBox="25 40 50 20" width="100%" height="100%"><g><path d="M55.9,53.9H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,53.9,55.9,53.9z"/><path d="M55.9,58.2H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,58.2,55.9,58.2z"/><path d="M55.9,62.6H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,62.6,55.9,62.6z"/><path d="M64.8,53.9c-0.7,0-1.3,0.6-1.3,1.3v8.8c0,0.7,0.6,1.3,1.3,1.3s1.3-0.6,1.3-1.3v-8.8C66,54.4,65.4,53.9,64.8,53.9z"/><path d="M60.4,53.9c-0.7,0-1.3,0.6-1.3,1.3v8.8c0,0.7,0.6,1.3,1.3,1.3s1.3-0.6,1.3-1.3v-8.8C61.6,54.4,61.1,53.9,60.4,53.9z"/><path d="M63.7,48.3c1.3-0.7,2-2.5,2-5.6c0-3.6-0.9-7.8-3.3-7.8s-3.3,4.2-3.3,7.8c0,3.1,0.7,4.9,2,5.6v2.4c0,0.7,0.6,1.3,1.3,1.3 s1.3-0.6,1.3-1.3V48.3z M62.4,37.8c0.4,0.8,0.8,2.5,0.8,4.9c0,2.5-0.5,3.4-0.8,3.4s-0.8-0.9-0.8-3.4C61.7,40.3,62.1,38.6,62.4,37.8 z"/><path d="M57,42.7c0-0.1-0.1-0.1-0.1-0.2l-3.2-4.1c-0.2-0.3-0.6-0.5-1-0.5h-1.6v-1.9c0-0.7-0.6-1.3-1.3-1.3s-1.3,0.6-1.3,1.3V38 h-3.9h-1.1h-5.2c-0.4,0-0.7,0.2-1,0.5l-3.2,4.1c0,0.1-0.1,0.1-0.1,0.2c0,0-0.1,0.1-0.1,0.1C34,43,34,43.2,34,43.3v7.4 c0,0.7,0.6,1.3,1.3,1.3h5.2h7.4h8c0.7,0,1.3-0.6,1.3-1.3v-7.4c0-0.2,0-0.3-0.1-0.4C57,42.8,57,42.8,57,42.7z M41.7,49.5h-5.2v-4.9 h10.2v4.9H41.7z M48.5,42.1l-1.2-1.6h4.8l1.2,1.6H48.5z M44.1,40.5l1.2,1.6h-7.5l1.2-1.6H44.1z M49.2,44.6h5.5v4.9h-5.5V44.6z"/></g></svg>