Context: We could run out of data to train AI language programs
We should be less worried about running out of data to train language models and more concerned about finding ourselves without anything interesting to say.
The profusion of large language models does not occur in a vacuum. Regardless that our cognitive (as much as linguistic) reflex is to think of complex technological systems as relatively self-contained artefacts and entities, we do well to contemplate the larger form and flow of information, energy and communication of which they are only ever really functional microcosms.
The uptake of these large language models occurs synchronous to, or soon after, the emergence of a near-ubiquitous linguistic transmission medium of social media. This is not necessarily a causal relationship but I think we can view the former in terms of insights derived from the latter.
The arrival of a tidal wave of memetic tropes in language, idioms of political or commercial influence and abbreviated mnemonics indicates that those artefacts, entities and systems that tend to percolate to ascendance are precisely those that are most probable.
What is most probable in the generative inflection of language generation? Precisely those banalities and null statements that most easily combine with others.