#Culture

The artificial intelligence of grammar check is changing us

Cristina Mazzon
agosto 2024 - 4 minuti

Not long ago, my eye rested on an article reported by the Accademia della Crusca in which it was pointed out that the grammar check tool in Google Documents, for some users, corrected the correct form qual è into the infamous qual’è.

After an initial moment of disappointment, I moved on to being intrigued, because while it is true that languages are constantly mutating, it is equally true that change is generally imperceptible, whereas now it is at the tips of our fingers.

It is certainly nothing new thatbreaking the rule becomes the rule: when William Webb Ellis picked up the ball during a football match to score a goal, he invented the game of rugby. This can also happen with the languages we speak.

But what role does artificial intelligence play in language? In the case of Google, a dedicated page explains that ‘spelling suggestions are enhanced by machine learning. Language comprehension models use billions of common locutions and phrases to automatically acquire real-world knowledge, but this also means they can reflect human cognitive biases’.

Google’s machine learning then statistically analyses language usage based on what people write, and learns to distinguish the forms that are most likely to be correct. It follows that if more people make a mistake than those who do not, the game is over: Google’s spell-check will consider the most frequently used form to be correct (as I write this article, the verb will consider is being flagged to me as an error, perhaps because forms such as will consider or a simple consider are much more commonly used than the future indicative).

If we delve further into the world of artificial intelligence, we discover that Natural Language Processing (NLP) is the branch of computer science dedicated to the ability of computers to understand written and spoken words in the same way as the human brain does.

In the old days, treebanks, i.e. bodies of analysed text noting the syntactic or semantic structure of the sentence, were specially prepared – by people. These were then provided to the machines so that they could learn the correct syntax and grammar of the language. Today, as machine learning advances, so do the algorithms used (one of the most efficient is Grammarly‘s). Applications, programmes and computers can learn complex grammatical forms completely independently, requiring less and less help from humans or large databases that require a great deal of programmers’ time and energy.

It is fascinating how some algorithms approach learning in exactly the same way as children. According to research by Stanford University, Google’s Natural Language Processing system BERT not only plays ‘Mad Libs’, a popular children’s game whose aim is to enter the correct words in white spaces, but also manages to derive grammatical rules from these, just as a developing human manages to formulate complete sentences without having to spend hours with his head on grammar manuals. This process – learning a formal syntactic structure in an environment where this structure is hidden – is called Grammar Induction and the most authoritative studies on this date back to the 1960s.

Let’s take some more tangible examples. In the image below, you can see some spelling mistakes I made that were not corrected by Google.

But let us see how Alessandro Manzoni is instead tainted by the scarlet letter of the grammar check.

There is no doubt as to who is a more authoritative expert on the Italian language between myself and the author of ‘I promessi sposi’. But the latter is guilty of not keeping up with the times – and the mistakes – of the 21st century.

With the increasing use of writing systems such as Google Documents, idioms seem to be losing their rigidity and composure, preferring instead a more rapid evolution towards a simplified and slang language.

All this has significant ethical implications : idioms are becoming much more democratic. And it is not only the case for Italian. Let us think of English, the lingua franca of our time: how many linguistic casts, borrowings, creole languages can arise now that anyone can play with words and teach machines innovative grammatical forms, which in turn the machines will signal to us as correct?

With machine learning, we weaken the curb that is usually placed on erroneous forms in order to maintain a certain linguistic continuity, favouring on the contrary the spread of regional, dialectal and increasingly creative varieties of the world’s spoken languages, which in colonial times were often and willingly oppressed, limiting the expression of entire peoples.

No errors are reported in Zora Neale Hurston’s English, which was once considered ‘broken English’, but is now perfectly understandable and accepted by the most popular online writing programme.

I conclude these considerations with a question: Treccani does not consider the existence of the lemma ‘reschedulare’, but Google does.

Who is right, if the term is understood and used by most?