Learning How to Apply Data Science Tools in Conflict and Peacebuilding Studies
By: Yared Lemma
The question above has been a key feature in my writings and conversations with others over the past year, regarding what I want to do as a Peace Fellow at the Duke-UNC Rotary Peace Center and beyond. Most of the time, when people hear what I am trying to do, I receive follow-up questions such as what do I mean and what does it take to achieve it?
In addressing these conversations, I go into the nitty gritty of the story behind my goals, which have influenced me since my childhood. I grew up in a poverty and conflict-stricken country, a position that has allowed me to think about the reasons and solutions of these problems repeatedly. Over time, I have had the opportunity to learn that some of the development polices implemented by the previous regime in my country were controversial and could be a source of internal conflict on top of many other causes. Most importantly, news from international media sources were a wakeup call for me, such as the case when I learn about the growing number of people at risk of severe food insecurity and famine in Ethiopia, Yemen, South Sudan, Somalia, and Syria associated with the prevalence of conflict.
Thanks to the Rotary Peace Fellowship, I have been attending programs that are a stepping-stone to achieve my aspiration of engaging in conflict resolution and peacebuilding endeavors. The Applied Field Experience (AFE) is one of the great opportunities that the program provides us to shape our career trajectory and make new connections for us to reintegrate smoothly into the work environment after graduation.Rotary signature in one of the parks I visited for hiking in Boulder, CO
Looking for answers
Therefore, my simple answer to the original question was to take the AFE as a major entry point. I trusted that I could achieve this by working with organizations that would not only leverage my expertise in the knowledge management field but also help me build new skills in the application of conflict resolution and peacebuilding interventions. Hence, I searched for technology companies in the field throughout my AFE search. Fortunately, I got several acceptance letters from different organizations and was able to choose one with the best challenges and learning opportunities.
The organization is Kimetrica, a social enterprise firm dedicated to increasing the effectiveness of spending in the social sector through enhanced decision-making approaches. Kimetrica helps international organizations, governments and nonprofits to increase the impact and efficiency of their social investments, enhance accountability, manage critical risks, and build donor and taxpayer confidence. Their services extend from major project evaluations to provision of early warning services, and monitoring the political, social and economic conditions in fragile states.
At Kimetrica, I am working in the field, assisting in areas of effective data management, analysis and science services in their Data Lab situated in Denver Co., primarily in the project funded by the Defense Advanced Research Projects Agency (DARPA). The Data Lab team consists of data scientists, GIS experts and subject matter specialists who develop tools to inform policy decisions. Their work ranges from modeling social impacts of climate risk to sophisticated, deep learning algorithms for malnutrition detection.
I am assigned to the World Modelers initiative of the DARPA project. World Modelers aims to develop quantitative models to generate key output variables for population, conflict, household economics, water, markets, health, and humanitarian operational responses that impact food security. These models are expected to serve as inputs into an iterative process of developing scenario-based decision-support tools, which will evolve into a web-based interactive tool.
As part of the World Modelers Initiative, I am investigating opportunities to model the effect of conflict on migration as well as the effect that conflict has on natural resources. If time permits, I will also engage in identifying the costs and results of indicators of interventions designed to reduce or mitigate conflict (e.g., conflict prevention, peacebuilding, and peacekeeping) and in developing a better understanding of the variables and the theoretical foundation for designing peace interventions.
Quick Learning Curve
When I first arrived, I had to learn fast! Kimetrica is one of the leading technology companies in the non-profit sector. Learning the sophisticated knowledge and productivity platforms that the company uses and the new field of study was intriguing. My notebook was quickly filled with these new tools and concepts in the first few days of my work. I know that I had to develop skills around the new topics during the three-month period of my field experience. I wanted to find a way to keep these ideas in my “radar,” so I developed “wordclouds” such as the one shown below by applying python programming language (that I have learned for the first time).
New skills, tools and concepts
At the moment, I am less than halfway through my summer internship. However, I am faced with a steep and rewarding learning curve. Below, I would like to briefly discuss my work in a way that shows some of the usefulness of the tools I am learning here.
I am using machine learning techniques to study the effect of conflict on migration. This is based on the Gravity Model of migration that I am developing using the Python programing language. Machine learning is a means of building mathematical models to help understand big data. As such, the Gravity Model of migration is a mathematical representation of the flow of refugees due to a range of push and pull factors at the country level. The original formulation of the model defined migration flows from country i to country j as a function of population at origin (i), population at destination (j), and distance between the two (dij). This basic form of the gravity model was then extended to include economic variables such as wages and unemployment.
Following the above brief description of the model, here are some of the dependent and independent variables I am considering in the new model I am helping to develop.
|Variable (log form)
|Total number of refugees moving from country of origin (i) to destination (j)
|Total number of conflict events
|Approximate distance between the origin and destination countries
|Percentage of education spending
|Percentage of health spending
|Percentage of social spending
|Commitment to reduce income inequality index
|Per capita income (i,j)
|GDP per capita (constant 2010 US$)
|Corruption perception index
|Varieties of Democracy Project
|Rule of Law (i,j)
|Rule of Law Index
|World Justice Project
Exploratory analysis (sample)Figure 1: Refugee Flows
The above chord diagram summarizes the dependent variable which is the number of refugee flows from country i to j especially for the top 20 refugee source countries. These are depicted by having the same arc color for the source countries while different arc colors represent destination countries. Accordingly, the top refugee sources in 2016 were Syria, Afghanistan, South Sudan and Somalia, while top destinations included Turkey, Pakistan, Lebanon, Iran and Jordan.
Before testing the model, the above explanatory variables have to be tested using descriptive analysis along with the dependent variables to indicate the relevance of the hypothesis. As a sample, I have included graphs showing the correlation between the number of refugees in the origin country and other variables such as conflict incidence and corruption index by income groups within the countries.
Conflict events and number of refugees by level of income
Number of refugees and corruption perception index by income group
As can be seen in the above figures, there is a clear relationship between conflict events and refugee numbers as well as between the corruption index and refugee numbers, the former is positive while the later negative. Moreover, the figures show high and upper middle-income countries have a smaller number of refuges with low conflict events and low level of corruption. In other words, when the number of conflict events and the amount of corruption is high, there will be more refugees, especially in low and lower middle countries.
Conclusion and next steps
The exploratory analysis revealed that most of the variables selected have relevance for the model to be tested. From a quick machine learning estimate of the model, the existing model prediction score is just over 20% (using a “within-sample” forecast) and close to 30% (using an “out-of-sample” forecast). A within-sample forecast utilizes a subset of the available data to forecast values outside of the estimation sample and compare them to the corresponding known or actual outcomes. This is done to assess the ability of the model to forecast known values. An out-of-sample forecast instead uses all available data in the sample to estimate a model. A prediction of 20 to 30% may not seem like much, but for policy makers looking for help in deciding where they need to spend more attention, even this modest initial success is meaningful.
The next phase of my work is to include additional variables and “dummies” as per the literature and try to improve the model estimation. Moreover, I will consider using other machine learning models such as the “random forest” to see if better estimation can be obtained. I will also focus on the intervention side of the model and try to develop a theory of change for conflict and peacebuilding interventions, which makes my learning journey more exciting than ever.