1 Introduction

Climate change can affect the intensity and frequency of precipitation. Increasing fluctuations in precipitation can be problematic for economic stability and growth, as precipitation has an impact on many economic indicators such as agricultural yields or GDP. Climate change’s impacts on precipitation will likely be more drastic in more arid regions of the world, such as countries in Africa. I first wanted to investigate if precipitation can be used to predict GDP in African countries. Once a relationship was established, I wanted to see if the driest countries were also the poorest countries and if this relationship has become more pronounced across the past 5 decades. TO answer this question, I used annual precipitation data measured in inches and the land area in sq km from the World Bank data base and the GDP data was pulled from the Gapmider data set [1, 2].

1.1 Countries Used In Study

The countries used in the case study are listed here: Algeria, Angola, Botswana, Chad, Congo, Egypt, Ethiopia, Kenya, Libya, Mali, Mauritania, Mozambique, Namibia, Niger, Nigeria, South Africa, South Sudan, Sudan Tanzania, Zambia

2 Methods

2.1 Upload Packages

library(tidyverse)
library(gapminder)
library(hablar) # for convert

#Cite all loaded packages (citations will automatically be added to end of document)
knitr::write_bib(c(.packages()), "packages.bib") 

2.2 Uploading Data

First, upload the Gapminder data set to get GDP Per Capita data for all African Countries.

# Upload gapminder data
Africa_gpm <- gapminder %>% 
  filter(continent == "Africa") 

Next, upload precipitation data from the world bank for 20 large African countries [1].

# Ethiopia
Ethiopia  <- read_csv("~/R_DataScience/data/ETH_precipitation.csv") %>% 
  select(c(1,2)) #upload precipitation data for Ethiopia
Ethiopia = Ethiopia[-1,] # remove unnecessary columns
Ethiopia <- add_column(Ethiopia, "Ethopia") #add column for country name
colnames(Ethiopia) = c("Year", "Precipitation", "Country") # rename columns


# Algeria
Algeria  <- read_csv("~/R_DataScience/data/DZA_precipitation.csv") %>% 
  select(c(1,2))
Algeria = Algeria[-1,]
Algeria <- add_column(Algeria, "Algeria")
colnames(Algeria) = c("Year", "Precipitation", "Country")


# Chad
Chad  <- read_csv("~/R_DataScience/data/chad_precipitation.csv") %>% 
  select(c(1,2))
Chad = Chad[-1,]
Chad <- add_column(Chad, "Chad")
colnames(Chad) = c("Year", "Precipitation", "Country")


# Congo
Congo  <- read_csv("~/R_DataScience/data/congo_precipitation.csv") %>% 
  select(c(1,2))
Congo = Congo[-1,]
Congo <- add_column(Congo, "Congo")
colnames(Congo) = c("Year", "Precipitation", "Country")

#  Egypt
Egypt  <- read_csv("~/R_DataScience/data/egypt_precipitation.csv") %>% 
  select(c(1,2))
Egypt = Egypt[-1,]
Egypt <- add_column(Egypt, "Egypt")
colnames(Egypt) = c("Year", "Precipitation", "Country")


# Kenya
Kenya  <- read_csv("~/R_DataScience/data/kenya_precipitation.csv") %>% 
  select(c(1,2))
Kenya = Kenya[-1,]
Kenya <- add_column(Kenya, "Kenya")
colnames(Kenya) = c("Year", "Precipitation", "Country")


# Tanzania
Tanzania  <- read_csv("~/R_DataScience/data/tanzania_precipitation.csv") %>%
  select(c(1,2))
Tanzania = Tanzania[-1,]
Tanzania <- add_column(Tanzania, "Tanzania")
colnames(Tanzania) = c("Year", "Precipitation", "Country")



# Angola
Angola  <- read_csv("~/R_DataScience/data/angola_precipitation.csv") %>% 
  select(c(1,2))
Angola = Angola[-1,]
Angola <- add_column(Angola, "Angola")
colnames(Angola) = c("Year", "Precipitation", "Country")


# Namibia
Namibia  <- read_csv("~/R_DataScience/data/namibia_precipitation.csv") %>%
  select(c(1,2))
Namibia = Namibia[-1,]
Namibia <- add_column(Namibia, "Namibia")
colnames(Namibia) = c("Year", "Precipitation", "Country")

# Libya
Libya  <- read_csv("~/R_DataScience/data/libya_precipitation.csv") %>%
  select(c(1,2))
Libya = Libya[-1,]
Libya <- add_column(Libya, "Libya")
colnames(Libya) = c("Year", "Precipitation", "Country")

# South Africa
South_Africa  <- read_csv("~/R_DataScience/data/south_precipitation.csv") %>%
  select(c(1,2))
South_Africa = South_Africa[-1,]
South_Africa <- add_column(South_Africa, "South Africa")
colnames(South_Africa) = c("Year", "Precipitation", "Country")

# Niger
Niger  <- read_csv("~/R_DataScience/data/niger_precipitation.csv") %>%
  select(c(1,2))
Niger = Niger[-1,]
Niger <- add_column(Niger, "Niger")
colnames(Niger) = c("Year", "Precipitation", "Country")

# Mali
Mali  <- read_csv("~/R_DataScience/data/mali_precipitation.csv") %>%
  select(c(1,2))
Mali = Mali[-1,]
Mali <- add_column(Mali, "Mali")
colnames(Mali) = c("Year", "Precipitation", "Country")

# Mauritania
Mauritania  <- read_csv("~/R_DataScience/data/mauritania_precipitation.csv") %>%
  select(c(1,2))
Mauritania = Mauritania[-1,]
Mauritania <- add_column(Mauritania, "Mauritania")
colnames(Mauritania) = c("Year", "Precipitation", "Country")

# Nigeria
Nigeria  <- read_csv("~/R_DataScience/data/nigeria_precipitation.csv") %>%
  select(c(1,2))
Nigeria = Nigeria[-1,]
Nigeria <- add_column(Nigeria, "Nigeria")
colnames(Nigeria) = c("Year", "Precipitation", "Country")

# Botswana 
Botswana  <- read_csv("~/R_DataScience/data/botswanan_precipitation.csv") %>%
  select(c(1,2))
Botswana = Botswana[-1,]
Botswana <- add_column(Botswana, "Botswana")
colnames(Botswana) = c("Year", "Precipitation", "Country")


# Sudan
Sudan  <- read_csv("~/R_DataScience/data/sudan_precipitation.csv") %>%
  select(c(1,2))
Sudan = Sudan[-1,]
Sudan <- add_column(Sudan, "Sudan")
colnames(Sudan) = c("Year", "Precipitation", "Country")


# Zambia
Zambia  <- read_csv("~/R_DataScience/data/zambia_precipitation.csv") %>%
  select(c(1,2))
Zambia = Zambia[-1,]
Zambia <- add_column(Zambia, "Zambia")
colnames(Zambia) = c("Year", "Precipitation", "Country")

# Mozambique
Mozambique  <- read_csv("~/R_DataScience/data/mozambique_precipitation.csv") %>%
  select(c(1,2))
Mozambique = Mozambique[-1,]
Mozambique <- add_column(Mozambique, "Mozambique")
colnames(Mozambique) = c("Year", "Precipitation", "Country")

# South Sudan
South_Sudan  <- read_csv("~/R_DataScience/data/south_sudan_precipitation.csv") %>%
  select(c(1,2))
South_Sudan = South_Sudan[-1,]
South_Sudan <- add_column(South_Sudan, "South_Sudan")
colnames(South_Sudan) = c("Year", "Precipitation", "Country")

To get the precipitation data from the website, nagivate to time series data to download the data as a csv for each country. Select observed data as the collection type, precipitation as the variable, annual as the aggregation, country as the area type, the specific country, and then the historic references for the time period.

Then, upload land area data from the world bank for the 20 large African countries [2]. The land area was collected in 1961, but the assumption was made that the area remained same from 1952-2007.

land_area  <- read_csv("~/R_DataScience/data/land_area.csv") %>%
  select(c(1,6)) # select specific columns 
land_area = land_area[-c(1:3),] # remove unnecessary columns 
colnames(land_area) = c("Country", "area_of_country") # name the remaining columns

Make one large precipitation data frame

#merge all precipitation data together
precipitation <- rbind(Algeria, Angola, Nigeria, Niger, South_Africa, South_Sudan, Zambia, Tanzania, Sudan, Namibia, Mozambique, Mauritania, Mali, Libya, Kenya, Ethiopia, Egypt, Congo, Chad, Botswana) 

precipitation <- precipitation %>% convert(num("Precipitation")) # convert precipitation values to numbers
colnames(precipitation) <- c( "Year", "precip", "Country") 

Make one large data frame with GDP Per Capita, precipitation, and land area

# merge precipitation and GDP Per Capita
gpm_and_precipitation <- merge(Africa_gpm, precipitation, by.x = c("year", "country"), by.y = c("Year", "Country"))

# merge precipitation/GDP Per Capita with land area
precip_gpm_area <- merge(gpm_and_precipitation, land_area,by.x = "country", by.y = "Country" )

# add column for precipitation per sq km
precip_gpm_area <- precip_gpm_area %>% 
  mutate("precip_per_sq_km" = (precip/area_of_country))

3 Graphics

3.1 Average Annual Precipitation

Calculate the average annual precipitation

average_precipitation <- precip_gpm_area %>% 
  select(country, precip)  %>%  # select only the relevant columns
  group_by(country) %>% # group rows by country
  summarize(mean_precip = mean(precip)) # find the mean precipitation for each group (country)

Create a bar graph to show the mean

ggplot(average_precipitation, aes(x=country, y= mean_precip, fill = country)) + 
  geom_bar(stat = "identity") + 
  theme_bw() +
  labs(x = "Country",
       y = "Average Annual Precipitation (inches)",
       caption = "Figure 1: The Average Annual Precipitation (inches) from 1952 to 2007 was graphed for each  of the 20 \n countries in Africa ",
       fill = "Country") + 
  theme(axis.text.x=element_blank(),
        plot.caption  = element_text(hjust = 0),
        axis.title.y= element_text( size = 15),
        )       

3.2 Average Annual GDP Per Capita

Calculate the average annual GDP Per Capita

average_gdpPercap <- precip_gpm_area %>% 
  select(country, gdpPercap)  %>% 
  group_by(country) %>% 
  summarize(mean_gdpPercap = mean(gdpPercap))

Create a bar graph to show the mean GDP per Capita

ggplot(average_gdpPercap, aes(x=country, y= mean_gdpPercap, fill = country)) + 
  geom_bar(stat = "identity") +
  theme_bw() +
  labs(x = "Country",
       y = "Average Annual GDP Per Capita",
       caption = "Figure 2: The Average Annual GDP Per Capita from 1952 to 2007 was graphed for each  of the 20 \n countries in Africa ",
       fill = "Country") + 
  theme(axis.text.x=element_blank(),
        plot.caption  = element_text(hjust = 0),
        axis.title.y= element_text( size = 12))       

3.3 Change in Precipitation from 1952 to 2007

Calculate the change in precipitation from 1952 to 2007

difference_precipitation<- precip_gpm_area  %>% 
  group_by(country) %>% 
  arrange(year) %>%
  filter(year %in% c(1952, 2007)) %>% 
  mutate(diff_between_precipitation = precip_per_sq_km - lag(precip_per_sq_km)) %>%  #calculate the difference from the data from 1952 to 2007 (Precipitation in 1952 - Precipitation in 2007)
  filter(year == 2007) #the step above calculated the difference from 1952 to 1952 (shows as NA), so filter out these rows 

Create a bar graph to show the difference in Precipitation

ggplot(difference_precipitation, aes(x=country, y= diff_between_precipitation, fill = country)) + 
  geom_bar(stat = "identity") +
  theme_bw() +
  labs(x = "Country",
       y = "Change In Precipitation (inches per sq. km) from 1952 to 2007",
       caption = "Figure 3: The change in Precipitation per sq. km from 1952 to 2007 was graphed for each  of the 20 \n countries in Africa ",
       fill = "Country") + 
  theme(axis.text.x=element_blank(),
        plot.caption  = element_text(hjust = 0),
        axis.title.y= element_text( size = 10))

3.4 Change in GDP Per Capita from 1952 to 2007

Calculate the change in GDP Per Capita from 1952 to 2007

difference_GDP<- precip_gpm_area  %>% 
  group_by(country) %>% 
  arrange(year) %>%
  filter(year %in% c(1952, 2007)) %>% 
  mutate(diff_between_GDP = gdpPercap- lag(gdpPercap)) %>% 
  filter(year == 2007) 

Create a bar graph to show difference in GDP Per capita

ggplot(difference_GDP, aes(x=country, y= diff_between_GDP, fill = country)) + 
  geom_bar(stat = "identity") +
  theme_bw() +
  labs(x = "Country",
       y = "Change In GDP Per Capita from 1952 to 2007",
       caption = "Figure 4: The change in GDP Per Capita from 1952 to 2007 was graphed for each of the 20 \n countries in Africa ",
       fill = "Country") + 
  theme(axis.text.x=element_blank(),
        plot.caption  = element_text(hjust = 0),
        axis.title.y= element_text( size = 12))

3.5 Association Between Precipitation and GDP Per Capita

Create a graphic to show the relationship between precipitation and GDP Per Capita with linear regression line.

ggplot(precip_gpm_area, aes( x = precip_per_sq_km, y = gdpPercap, color = country))  +
  geom_point() +
  facet_wrap(~year, nrow = 2, scales = "free_y") + # create a faceted graph (faceted by year)
  theme_bw() +
  geom_smooth(method = "lm", se = F, aes(group = year), color = "black")  + # add a linear regression line to each graph
  labs(x = "Precipitation inches per sq km",
       y = "GDP Per Capita",
       caption = "Figure 5: The relationship between precipitation per sq. km and the GDP of 20 countries from across \n Africa was plotted in a faceted  graph using 5 year intervals from 1952 to 2007 ",
       color = "Country") +
  theme(plot.caption  = element_text(size= 10, hjust = 0),
        axis.text.x = element_text(angle = 75, hjust=1, size= 5),
        axis.title.y= element_text( size = 18),
        axis.title.x= element_text( size = 18)) 

4 Statistical Analysis

4.1 Caluculating Linear Regressions

Linear regression of Precipitation per sq km versus GDP Per Capita from 1952 to 2007:

overall_summary <- summary(lm(gdpPercap ~ precip_per_sq_km, data = precip_gpm_area)) # perform a linear regression analyzing the relationship between precipitation per sq km (as the predictor variable) and GDP Per Capita (as the dependent variable) 

# format the results of the linear regression as a tibble
overall_tibble <- tibble(
  "year" = "1952 to 2007",
  "intercept" = overall_summary$coefficients[1,1], #intercept
  "slope"     = overall_summary$coefficients[2,1], #slope
  "p_value"   = overall_summary$coefficients[2,4], #p-value
  "r_squared"  = overall_summary$r.squared)         #R2) 

Linear regression of Precipitation inches per sq km versus GDP Per Capita in 1952:

data_1952 <- precip_gpm_area %>% 
  filter(year == 1952)

summary_1952 <- summary(lm(gdpPercap ~ precip_per_sq_km, data = data_1952))

tibble_1952 <- tibble(
  "year" = "1952",
  "intercept" = summary_1952$coefficients[1,1], #intercept
 "slope"     = summary_1952$coefficients[2,1], #slope
 "p_value"   = summary_1952$coefficients[2,4], #p-value
 "r_squared"  = summary_1952$r.squared)         #R2)

Linear regression of Precipitation inches per sq km versus GDP Per Capita in 2007:

data_2007 <- precip_gpm_area %>% 
  filter(year == 2007)

summary_2007 <- summary(lm(gdpPercap ~ precip_per_sq_km, data = data_2007))

tibble_2007 <- tibble(
  "year" = "2007",
  "intercept" = summary_2007$coefficients[1,1], #intercept
  "slope"     = summary_2007$coefficients[2,1], #slope
  "p_value"   = summary_2007$coefficients[2,4], #p-value
  "r_squared"  = summary_2007$r.squared)         #R2)

Combine the linear regressions into one tibble:

comparative <- rbind (overall_tibble, tibble_1952, tibble_2007)

4.2 Linear Regression Results

knitr::kable(comparative, align = "ccccc", caption = "Figure 6: Statistical Analysis of the Relationship Between Precipitation (in/(km^2)) and GDP Per Capita", col.names = c("Year", "Intercept", "Slope", "P-Value", "R Squared"))
Figure 6: Statistical Analysis of the Relationship Between Precipitation (in/(km^2)) and GDP Per Capita
Year Intercept Slope P-Value R Squared
1952 to 2007 4297.122 -2268303.6 0.0000036 0.1072089
1952 2026.064 -733095.9 0.2380914 0.0978708
2007 5600.646 -2767922.7 0.1924864 0.1181022

5 Conclusion

In the end, precipitation was not found to be a good predictor of GDP Per Capita, with just over 10% of the variance of GDP Per Capita being caused by variance in precipitation. What is interesting is that the driest countries were not found to be the poorest countries. The negative relationship between precipitation and GDP Per Capita may indicate that many countries are turning away from agriculture as a primary source of GDP Per Capita or that the driest countries have had to develop and adapt the most. The decreased p value and increased r squared value relating precipitation to GDP Per Capita in 2007 as compared to 1952 suggest the negative relationship has become more significant over time [fig. 7] This suggests that the need for alternative sources of income or advanced agriculture methods may have become more serious over time. These results show that the answer for how climate change will impact countries in Africa, and all countries in general, may be more nuanced and often spur technological and systemic innovation.

I would be interested in investigating the makeup of the sources of GDP Per Capita for each country and how the breakdown has changed overtime with fluctuations in precipitation and temperature.

6 References:

  1. “World Bank Climate Change Knowledge Portal.” Climate Change Knowledge Portal, https://climateknowledgeportal.worldbank.org/download-data. Accessed 18 Feb. 2022.

  2. “Land Area (Sq. Km).” Data, https://data.worldbank.org/indicator/AG.LND.TOTL.K2. Accessed 23 Feb. 2022.

  3. Acemoglu, D. “An African Success Story - Botswana.” GSDRC, 4 Sept. 2015, https://gsdrc.org/document-library/an-african-success-story-botswana/. Accessed 26 Feb. 2022.

  4. “Economy of Libya.” Encyclopædia Britannica, Encyclopædia Britannica, Inc., https://www.britannica.com/place/Libya/Economy. Accessed 26 Feb. 2022.

Bryan, Jennifer. 2017. Gapminder: Data from Gapminder. https://CRAN.R-project.org/package=gapminder.
Henry, Lionel, and Hadley Wickham. 2020. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Müller, Kirill, and Hadley Wickham. 2021. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Sjoberg, David. 2020. Hablar: Non-Astonishing Results in r. https://davidsjoberg.github.io/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2019. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2021a. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2021b. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
———. 2021c. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2021. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.