Tracking terrorists on Facebook and the Dark Web
Among several valuable applications, machine learning is particularly valuable for measuring previously objects that were previously unmeasurable, like the tone of a text, the colors of a picture and many more. This results in new data on phenomena that were impossible to study due to their lack of quantification. One instance that accurately describes this phenomenon is terrorism recruitment, which I study in my paper "Terrorism Financing, Recruitment and Attacks" published in Econometrica in 2022.
Measuring the recruitment of terrorists, or members of criminal organisations in general, is inherently complex because it is distinctively unobservable. At the same time, one of the many channels through which terrorist groups recruit is via online fora (like Facebook, Reddit, et cetera). For this reason, machine learning algorithms can be useful to detect recruitment and can be built to automatically evaluate each message's content, determining whether it contains elements of terrorism recruitment. To measure recruitment, I gathered data from various online fora operating in Pakistan that disseminate content supportive of jihadism. The Artificial Intelligence Lab at the University of Arizona developed a dataset containing more than four million messages, exchanged between 2003 and 2012 on six different fora operating in Pakistan. Additionally, the database includes more than 2.5 million messages transmitted on platforms in the dark web, an alternative internet network requiring a specific software for its access and navigation. Extremists and terrorist groups have routinely used these platforms to spread the concept of war against the unfaithful (Jihad). The possibility of having access to such a large number of text messages has great importance, as they can be helpful to quantify -and thus obtain a measure of- the phenomenon of terrorist recruitment. Nevertheless, determining which among those messages have the purpose of recruiting terrorists is a difficult task.
Without a specific algorithm, the determination of whether the content of a message concerns terrorist recruitment would be prohibitively expensive: judges and investigators would need to read and analyse each of the four million messages. To lower the cost of this endeavour, it is crucial to implement a machine learning algorithm that aids in categorising a message either as neutral or as having recruitment intent. This goal can be achieved through a technique called Natural Language Processing: a data science system able to understand the contents of scripts, including the contextual nuances of the language within them. The algorithm works through a method defined as supervised learning, which implies that it needs to be trained on a set of already classified data before being able to accurately extract the information contained in each message and categorise it. This is done relying on the initial work of two judges, who independently and manually reviewed a random sample of messages, highlighting and marking all those showing an intent to recruit violent extremists. This sample constitutes the training data and is used to teach the algorithm how to recognise conversations containing any recruitment material. Once the algorithm is trained, it can be applied to all the remaining messages, de facto replicating the work of several judges.
By implementing this method, I was able to characterise a measure of terrorist recruitment, which can be used to understand the determinants of terrorist attacks and can assist national security agencies. My research shows that the effect of terrorism financing on attacks increases strongly and significantly in recruitment. Additionally, this innovative way to classify written texts can have vast applications in future studies, as it can be exploited in any situation where experts are needed to assess third-party material.