Bad information drives out good

Jan 24, 2023

An argument that GPT-3 is making the future of the web look bleaker by the day:

People running websites need a constant supply fresh web content to keep them at the top of the search engine results and to keep their visitors coming back for more.
With GPT-3 and related models, generating fresh content has become a lot easier and cheaper: you no longer need to employ copy-writers when an algorithm can churn out reams of novel content for (approximately) no cost.
Determining whether information is accurate or not is an extra cost: it requires some kind of research or fact-checking, or at least some common sense. So inaccurate or unreliable content will always be cheaper to generate than accurate. For any model, “accuracy” is just one more constraint to satisfy, if we choose to include it¹.
So the web is gradually filling up with unverified and inaccurate (but fresh! always fresh!) content. This applies to individual websites as well as to social media sites.
And then other people come along and scrape the fresh content to use as training data for the “next-gen text-gen” models. Some argue that we’re already running out of training data, so presumably new content will be much sought after.

The result? The average accuracy of information in web content will trend down. Bad information will tend to drive out good.

Gresham’s law is usually stated as “bad money drives out good”: if two currencies have the same face value but one has greater intrinsic worth, then people will tend to hold on to that one and therefore spend the currency with the lower commodity value. In the equivalent web-content case, consider two websites with the same face value (similar search engine ranking, similar user engagement). One is written by humans and has a higher intrinsic worth, in that it is more reliable. The other site has lower-value content that no one really wants. But it is this bad information that tends to proliferate, and GPT-3 will only accelerate the process.

Maybe search engines will start ranking sites by accuracy rather than freshness, but that sounds expensive and unlikely. I can’t imagine any government legislating for accuracy, and besides the unintended consequences of such censorship would be horrendous. I think we’ll have to rely ever more heavily on known, trusted sources as the wider richness of the web becomes less and less reliable. And that makes me sad.

Even without automation, writing copy without checking facts is, and always has been, cheaper that writing quality content. ↩