Contacts

Morning Knowledge /5. Language

, by Fabio Todesco
Using a machine learning technique, Dirk Hovy captures dialectal variations in our online language. Do dialects still play a role in the construction of our identity?


A new machine learning technique allows us to capture language and dialect variations and their evolution through the analysis of what people write on social media.

In two recent works, Dirk Hovy, a computational sociolinguist and Associate Professor at Bocconi's Department of Marketing, deploys an innovative method to process large amounts of social media data to capture gradual differences in language variations. The method provides a clear visual reference (a map) that can serve as input for further qualitative studies. It also has direct applications for user profiling (finding out where a social media user is located, for instance).

The algorithm uses a neural network technique to learn patterns from data. At the beginning, the algorithm doesn't know anything about European languages, but it observes linguistic similarities in the geotagged data, and puts them all in a three-dimensional space. Each dimension is then conventionally defined as a quantity of red, green and blue and every point is represented as mixture of these three colors. The values 0.5, 0.5 and 0.5, for example, correspond to a medium gray. The result is a map that clearly catches the different languages spoken in Europe.

Another study applies the same technique to German dialects. The findings directly contradict the common perception that dialects are disappearing in modern life. While they do not distinguish individual towns any more, it shows that dialects are becoming more entrenched at a larger regional level, even on anonymous social media platforms, where people should have little reason to mark their origin.

In 2020, do you think that dialects still play a role in the construction of our identity?

The Colors of Our Online Language

Watch video