How Machine Learning Uses Electoral Campaigns to Understand MPs' Language
The increasing number of publications employing text as the main source of data goes hand in hand with the development of new methods of analysis, using machine learning tools. In their last paper appeared on Political Analysis, Massimo Morelli (Bocconi Department of Economics), Moritz Osnabrügge (Durham University), and Elliott Ash (ETH Zurich), developed a way to analyze text coming from parliamentary speeches relying on already coded text from electoral campaign materials.
This new "cross-domain" method offers several advantages, such as the abatement of costs and the reliance on a body of text already coded by expert researchers. The authors also apply this new method to parliamentary speeches in New Zealand, finding that female MPs discuss welfare-related topics significantly more often than their male counterpart.
Most of the existing studies employing text analysis employ the so-called "within-domain" supervised learning. This method requires the software to be trained on a hand-coded subsample of text in order to expand the method used by an experienced researcher on larger bodies of text. Unfortunately, these methods have two main limitations: the subsample where the machine is trained needs to be of the same kind as the larger body (that's why it is called within-domain) and a human coder is still required to perform the initial annotations on the subsample. In particular, the cost of human coding can be pretty steep and may put additional weight on the investigator's budget constraints.
To obviate these limitations many scholars have employed so-called unsupervised topic models where an algorithm provides an interpretable probability distribution regarding which topic a body of text is expected to be about. This method does not require hand-made text coding. However, the authors argue that even if it can deliver interesting results, it also has a few limitations. In particular, the results of these models are not easy to interpret, do not work on multilingual corpora and can be more sensitive to unobserved perturbations in the data.
In an attempt to obviate the problems connected to both traditional coding and unsupervised topic models, the authors have implemented a method of supervised learning which gets trained on the Manifesto Project (a repository of coded and analyzed text coming from electoral policy statements and speeches of several different countries) and applied to a different domain, which is parliamentary speeches. This provides a large body of multilingual text which is already coded by experienced researchers on several political aspects such as economic planning, environment and many others. This way, Morelli and co-authors can overcome the need for new hand-made coding in the study of parliamentary speeches. Clearly, as of today, the method can be applied only on corpora which are linguistically adjacent to the original coded text. Nonetheless, it delivers promising results when compared to traditionally analyzed text.
Finally, the authors illustrate two cross-domain applications of this method. First, they find that after the 1993 electoral reform in New Zealand, parliamentary speeches showed increasing attention on issues connected to political authority such as political stability and party competence. This is likely connected with the fact that the transition toward a proportional electoral system has incentivized the creation of new parties and the formation of alliances, and has changed the overall experience of political stability in the country. Then, they also study how the gender of parliamentarians is related to debate participation on certain topics. In particular, they find that women speak significantly more about welfare whereas men are more concerned about external relations and foreign policy.
Moritz Osnabrügge, Elliott Ash, Massimo Morelli, "Cross-Domain Topic Classification for Political Texts." Political Analysis, Early View, DOI: https://doi.org/10.1017/pan.2021.37.