Context: DeepMind tells Google it has no idea how to make AI less toxic
The raw data is indeed endemically and intermittently toxic. Filtering out venomous misanthropy from the dataset is a solution but this also requires a language model that does something substantively more than generate probabilistic sequences of linguistic artefacts sans comprehension.
John Searle’s Chinese Room argument clearly illuminates the inability of AI to understand anything. Successfully manipulating the grammatical rules of a language is not tantamount to understanding what anything within that language actually means.
It is probably worth noting that the combinatorial sophistication of language as semantic property of experience and understanding is many orders of magnitude more complex than the probabilistic frameworks it is modelled with.
It is also worth reflecting that the core biases of cognition, culture and communication represent patterns of dissonance as constructive entropy that are (for better and for worse) irreducible as facts and heuristics by and through which information systems and data symmetries in our world recursively self-propagate.
Solving the data problems requires solving problems in the world and solving problems in the world is problematised by the enigma that semantic dissonance is endemic.