(Borrowed from Nations Online)

(Borrowed from Nations Online)

Introduction

The Yemeni Civil War and the humanitarian crisis paralleling this conflict is a poignant issue that deserves more attention in society. According to the World Bank’s indicators for poverty, Yemen is widely considered the poorest nation in the world. Yemen lacks the necessary federal infrastructure, economic stability, and critical resources to sustainably support its nearly 29 million inhabitants. In the last half of our most recent decade, increased violence and conflict due to the Yemeni Civil War has greatly worsened any external support in place for Yemeni people. We introduce our project, The effects of the Yemeni Civil War on Civillian Life, to assess the trends and correlations between violent interactions and life stability indicators.

In this project, we will investigate trends in living conditions in Yemen by key indicators - such as fatalities, GDP Per Capita, Refugee population, and others - to assess how quality of life has changed since the start of the Yemeni Civil War. We utilize scraped data figures from merging analytics from World Bank, the Armed Conflict Location and Event Data Project, United Nations, Wikipedia and a few additional sources to view how different indicators have changed over the years leading up to the Yemeni Civil War, and how those indicators have shifted since the start of the conflict.

Guiding Questions:

How has GDP per capita changed within Yemen over the years leading up to the Yemeni Civil War, and since its start?

What region of Yemen has been hit the hardest by violence and conflict due to the Yemeni Civil War? Are there other indicators that are significantly different in these regions?

How has the Civil War affected Yemeni citizens who are likely not involved in daily conflict, such as children?

Data Scraping and Wrangling Tribulations

We scraped our GDP data from indexmundi.com, which contains data from the CIA. Our refugee and some of the fatality data come from a dataset from the world bank, containing over 1000 metrics of Yemen since 1960. The remaining fatality data comes from a database from the Yemen Data Project, which contains information from all of the air raids that took place in Yemen between March 2015 and July 2019. Lastly, we obtained our data on the populations of the Governorates from scraping a Wikipedia webpage.

For our World Bank dataset, we read in a csv file, but we encountered a few large over-arching errors in the data: each individual variable and corresponding metric occupied a row entry rather than a column entry. Furthermore, the World Bank dataset included various entries from 1976 until 2020, but included rows with blank or space-filling characters for years that there are no data observations. This data table required lots of wrangling with pivot functions in order to transpose the csv into a tidy and practical format.

Another large piece of our tidying efforts focused on Yemen’s individual Governorate data from Wikipedia. Since the English spelling of the Governorates varies, this involved using string operations to change many of the names of the governorates to fit the governorate names from the air raids dataset. Once we tidied this data, we merged it with a subset of the Air Raids data, which we produced by summarizing the total number of Air Raids deaths in each Governorate.

Finally, once all our data was separated into tidy csv files, we merged as much information we could using separation by year as a key for the newly joined data. This was a little challenging because our different data tables had different numbers of rows, so we had to consider what merge was best to display the important information.

New Tricks in R

• Highlighting all chunks and typing “Command + I” (or control + I on windows) automatically reformats your code so that all the code is properly indented. We wish we had picked this trick up earlier, as we spent lots of time trying to make our code look well presented!

• This chunk evaluates the number of NA outputs present within a given data table. There are other methods that use the summary() function, but this is an interesting alternative method to use.

Number_of_empty_columns <- function(GDP){print(GDP %>% is.na()) %>% colSums()}

• We learned how to use “%” units on the y-axis for one of our plots. This took a bit of searching - up until now, simple numerical values were sufficient on our other plots.

GDP Per Capita

GDP_Per_Capita <- GDP %>%
  ggplot(aes(x = Year, y = GDP_Per_Capita)) +
  geom_line(color = "blue") +
  geom_point(color = "blue") +
  coord_cartesian(xlim = c(2000.5, 2020)) +
  labs(title = "GDP Per Capita in 21st century Yemen",
       y = "GDP Per Capita (in Purchase Polarity Dollars)",
       caption = "The 2011 and 2014 GDP decreases correspond to the start of the revolution and the civil war, respectively.")

ggplotly(GDP_Per_Capita)

This plot shows the GDP Purchase Power Polarity dollars per person over the years starting in the year 2000. From 2000-2010, we see a steady growth trend in GDP per capita that signals a slow growth economy. This period correlates with an era of relative government stability and decreased conflict across the nation. In the year 2011, we observe a sharp drop in GDP per capita. This correlates to increased violence in Yemen; from the span of January 2011 to February 2012, a violent revolution categorized under the Arab Spring occured in Yemen, leading to a shift of power stability in the region. From 2012-2014, new ledership attempted to stabilize the country with import export investments to stimulate economic growth. We see that this time period correlates to a slight increase in GDP per capita from 2012-2014. In the year 2014, we see the GDP drop significantly again. In September of 2014, a group of rebels captured the Yemeni capital city, marking the start of the Yemeni Civil war, bringing devestation throughout the country. Through combining this information with other plots of frequencies of violent conflict along with the severity of violent conflict, we will show how GDP per capita in Purchase Power Polarity dollars is influenced by violent intervention in Yemen.

Total Deaths in Different Age Groups

Deaths_by_agegroup <- World_Bank %>%
  ggplot(mapping = aes(x = year)) +
  geom_point(mapping = aes(x = year, y = SH.DTH.1519, color = "Citizens Aged 15-19")) +
  geom_point(mapping = aes(x = year, y = SH.DTH.2024, color = "Citizens Aged 20-24")) +
  geom_point(mapping = aes(x = year, y = SH.DTH.1014, color = "Citizens Aged 10-14")) +
  geom_point(mapping = aes(x = year, y = SH.DTH.0509, color = "Citizens Aged 5-9")) +
  geom_line(mapping = aes(x = year, y = SH.DTH.1519, color = "Citizens Aged 15-19"))+
  geom_line(mapping = aes(x = year, y = SH.DTH.2024, color = "Citizens Aged 20-24"))+
  geom_line(mapping = aes(x = year, y = SH.DTH.1014, color = "Citizens Aged 10-14"))+
  geom_line(mapping = aes(x = year, y = SH.DTH.0509, color = "Citizens Aged 5-9"))+
  coord_cartesian(xlim = c(2005,2020)) +
  labs(title = "Number of Deaths per Age Group by Year", 
       x = "Year", y = "Number of Deaths per Age Group", fill = "Age Group")
  
ggplotly(Deaths_by_agegroup)

In this plot, we illustrate Number of total deaths per year for age groups ranging from 5 to 25 years. We observe a large increse in deaths in correlation with increasing civil war conflict for all age groups. For example, in the 20-24 age group, the number of deaths is almost three times larger in 2018 than the number of deaths in 2011. This offers a clear depiction of how severely this crisis impacts not just those of military age, but the entire Yemen society as a whole. Overall, this plot is very similar to what we would expect, given the historical context of shifting power and greatly increased conflict introduced with the previous illustration. We see a small spike in 2012 after the Arab Spring revolution, and a much larger spike in 2015 from the start of the civil war. We also see another spike in 2018, which indicates an increase in the intensity of conflict. Note that total death toll since 2015 is roughly 100,000 people for these ages.

Battle Deaths

Battle_Deaths <- World_Bank%>%
  filter(row_number() > 32 & row_number() < 44)%>%
  group_by(year)%>%
  mutate(btl_deaths_per1k = VC.BTL.DETH/(SP.POP.TOTL/1000)) %>%
  summarize(Pct_of_Battle_Deaths = (100*btl_deaths_per1k/SP.DYN.CDRT.IN))

Battle_Deaths%>%
  ggplot(mapping = aes(x = year, y = Pct_of_Battle_Deaths)) +
  geom_col(fill = "red") +
  geom_text(mapping = aes(label = formatC(Pct_of_Battle_Deaths)), position = position_nudge(y = 0.2)) +
  scale_y_continuous(labels = function(x) paste0(x, "%")) +
  labs(title = "Percent of Battle Deaths to Total Deaths since 2008", 
       x = "Year", y = "Percent of Battle Deaths to Total Deaths")

This plot visualizes the percent of battle deaths to total deaths in Yemen since 2008. From 2008-2010, we can see the trend of a relatively peacful country, with percents of battle deaths to total deaths of less than 0.2%. After that, we see a large increase, with over 4% of the total deaths in 2015 resulting from battle deaths. Though 4% doesn’t necessarily seem like a huge number, it is over 20 times larger than the rate in 2010. Conflict rate and lethality has risen immensely over the most recent years - and this violent conflict bears a high price in human lives.

Refugees

Refugee_Population<- World_Bank %>%
  ggplot(aes(x = year, y = SM.POP.REFG.OR)) +
  geom_col(fill = "green") +
  coord_cartesian(xlim = c(2005,2020)) +
  labs(title = "Refugee Population in Yemen by Year since 2005", 
       x = "Year", y = "Refugee Population") 

ggplotly(Refugee_Population)

In this visualization, we illustrate the refugee population in Yemen by year over the last 15 years. From the time period of 2005 to 2014, we observe that the refugee population rises at a relatively slow rate, as the refugee population approximately doubles in population over the 10 year data period. In 2015, however, we see a severe spike: The 2015 refugee population climbs to 15,901, which equates to a 606.21% increase in one year. The following years follow in this growth trend until our most recent data figure of 36,518 refugees in 2019, which is a 2,762.4% increase in asylum seekers since 2013. Multiple sources that provided background information on our datasets touched on the idea that the refugee population, while small in comparison to the nation’s total population, nearly equates to Yemen’s total capacity of asylum seekers as they do not have the resources to sustain more refugees. This huge jump in refugee population illustrates a model for displaced families since the start of the civil war. Further investigations into other metrics of Yemeni families will likely correlate to mass displacement of communities which would likely contribute to the catastrophic drop in GDP per capita that we’ve visualized in a previous illustration.

Effect of Crisis on Different Areas of Yemen

Governorates_Prop <- Governorates_Pop_Deaths_Area %>%
  mutate(prop_deaths_pop = total_fatalities/`Division Population`)

Gov_pop_deaths_plot <- Governorates_Prop %>%
  ggplot(aes(x = `Division Population`, y = total_fatalities, color = Governorate)) +
  geom_point()+
  labs(title = "Total Air Raid Fatalities vs Population separated by Governorate", 
       x = "Governorate Population (in millions)", y = "Total Air Raid Fatalities")

ggplotly(Gov_pop_deaths_plot)

This plot shows the total deaths from the air raids in Yemen between March of 2015 and July of 2019. The governorates that have been hit the worst are Saada, Taiz, Hudaydah, Hajja, Sanaa, and the Capital. All of these governorates contain or are part of large cities, so this is where we would expect the conflict to be the worst. Interestingly though, there are a couple governorates with zero air raid fatalaties: Hadramawt and Maharah. Mahrah has a very low population, so that isn’t entirely surprising, but Hadramawt has over 1 million inhabitants. Since these two governorates are on the east side of the country, it seems plausible to infer that most of the civil war’s conflict is in the western portion of the country. Overall this plot gives us an idea of where the most conflict resides, but only a relative description of severity within each Governorate.

Gov_prop_deaths_plot <- Governorates_Prop %>%
  ggplot(aes(x = `Division Population`, y = prop_deaths_pop, color = Governorate)) +
  geom_point()+
  labs(title = "Proportion of Air Raid Deaths to Total Population for all Governorates", 
       x = "Governorate Population (in millions)", y = "Proportion of Air Raid Deaths to Total Population")

ggplotly(Gov_prop_deaths_plot)

This plot shows the proportion of air raid deaths to total deaths for each Governorate of Yemen. Compared to the last plot, it gives us a better illustration of the severity of each Governorate in relation to one another. We can see that some of the same governorates from the last plot also have a high proportion of air raids deaths: Saada, the Capital, Hajja, and Saana. Interestingly though, we see that the governorates of Marib and Jawf both have higher proportions of air raids deaths than the three most populous governorates. Notice that the proportion of air raid deaths in Saada is over 400% the proportion of air raids deaths in all other regions of Yemen. Though Marib and Jawf do not stand out for particularly high number of deaths in comparison to other Governorates, the conflict and effective fatalities are worse with respect to the total populations in these smaller governorates.

Conclusion

Overall, observe that the civil war in Yemen has had a devestating impact on notable quality of life indicators for the Yemeni people. We’ve seen that the GDP per capita of Yemen has taken a large hit since the start of the conflict, and will likely continue to fall as a result of mass displacement of families in refugee camps, increased death rates for younger populations, and widespread insecurity across many of Yemen’s governorates. We’ve also observed the horrible amount of deaths and displaced people that this war has caused, especially in the western portion of Yemen. Through more investment toward probing for statistics across various databases, there is potential to explore how individual cities’ economic outputs are influenced by the effects of constant conflict within the area. Through this project, we highlighted indicators that clearly illustrate a clear and under-verablized reality: Yemen’s people are suffering. We hope that, through staying informed and sharing statistical models with others, viewers will bring attention to this underrepresented issue, and advocate towards a solution that will help improve the lives of the Yemeni people.