Data Final Project

Author

Nadia Omer

Sudanese Livelihood in the Early 2000s

Introduction to the Data

My dataset was retrieved from Open Data for Africa, under the Sudan data portal. The data itself was collected and compiled by organizations, including the United Nations Environment Program (UNEP) and Esri, through its Living Status indicators. The dataset includes information related to health, education, and economic conditions in Sudanese states. For my analysis, I will utilize multiple indicators, including hospitals per capita, infant mortality rate, consumer price index, and literacy rate. I aim to investigate which states have the highest versus the lowest quality of life indicators and determine whether these differences are attributed to historical factors or more recent changes and events. I also want to examine how economic and social conditions vary between regions in Sudan. I chose this topic and dataset because I am Sudanese and have spent only a small part of my life in Sudan, so I do not have extensive knowledge about its geopolitical and socioeconomic conditions. This project will help me gain a deeper understanding of the country, its regional differences, and the diverse living conditions people experience across different states.

Variable Description
infant_mort_per_1000_live Infant mortality rate per 1000 live births
hospitals_per_100000_pop Hospitals per 100000 people
inflation_rate Inflation rate by year
literacy_category Literacy categorized as “high” or “low” by a 70% threshold

Loading the Dataset

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
Warning: package 'leaflet' was built under R version 4.5.3
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
setwd("C:/Users/Administrator/OneDrive - montgomerycollege.edu/DATA 110")
sudan_data <- read_csv("Sudan_data.csv")
Rows: 17 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): States, RegionId, Capital, Governor
dbl (18): population, area_sq_km, pop_dens_per_sq_km, literacy_rate, child_m...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Filtering Data

sudan1 <- sudan_data |>
  filter(if_all(everything(), ~ !is.na(.)))
#removing rows with all NAs
names(sudan1) <- tolower(names(sudan1))
#standardizing to lowercase names

Significant Indicators

sudan1 |>
  arrange(sudan1$infant_mortality_rate_per_1000_live)
# A tibble: 15 × 22
   states     regionid capital governor population area_sq_km pop_dens_per_sq_km
   <chr>      <chr>    <chr>   <chr>         <dbl>      <dbl>              <dbl>
 1 Al Gezira  SD-GZ    Wad Me… El-Zube…    3680000      23373             157.  
 2 Sennar     SD-SI    Sinja   Ahmed A…    1323000      37844              35.0 
 3 River Nile SD-NR    Al-Dam… Al Hadi…    1153000     122123               9.44
 4 White Nile SD-NW    Rabak   Yousif …    1781000      30411              58.6 
 5 Khartoum   SD-KH    Kharto… Abdul R…    5428000      22142             245.  
 6 Northern   SD-NO    Dongula Fathi M…     719000     348697               2.06
 7 Kassala    SD-KA    Kassala Mohamma…    1842000      36710              50.2 
 8 North Dar… SD-DN    Al-Fas… Osman M…    2175000     296420               7.34
 9 North Kor… SD-KN    Al-Obe… Ahmed H…    3006000     221900              13.5 
10 El Gadarif SD-GD    Gedarif Karam A…    1388000      75263              18.4 
11 Blue Nile  SD-NB    Al-Dam… Hussein…     856000      45844              18.7 
12 West Darf… SD-DW    Geneina Haider …    1346000      79460              16.9 
13 South Dar… SD-DS    Nyala   Adam Ma…    4213000     127300              33.1 
14 South Kor… SD-KS    Kadugl… Adam Al…    1447000     158355               9.14
15 Red Sea    SD-RS    Port S… Mohmed …    1437000     218887               6.57
# ℹ 15 more variables: literacy_rate <dbl>, child_mort_per_1000_live <dbl>,
#   infant_mortality_rate_per_1000_live <dbl>, hospitals_per_100000_pop <dbl>,
#   beds_per_100000_populations <dbl>, live_birth_females <dbl>,
#   live_birth_males <dbl>, consumer_price_index <dbl>, inflation_rate <dbl>,
#   total_fertility_per_woman <dbl>, private_households_percent <dbl>,
#   average_household_size <dbl>,
#   consumption_of_petrolium_products_million_tonnes <dbl>, latitude <dbl>, …
#descending order
sudan2 <- sudan1 |>
  mutate(literacy_category =
           ifelse(literacy_rate>= 70,
                  "High Literacy",
                  "Low Literacy")) |>
  mutate(
    infant_mortality_percent = infant_mortality_rate_per_1000_live / 10)
#new column to categorize literacy rate and make a infant mortality rate

Linear Regression Model

linear_model <- 
  lm(infant_mortality_rate_per_1000_live ~ literacy_rate + hospitals_per_100000_pop  + beds_per_100000_populations,
                   data = sudan2)
summary(linear_model)

Call:
lm(formula = infant_mortality_rate_per_1000_live ~ literacy_rate + 
    hospitals_per_100000_pop + beds_per_100000_populations, data = sudan2)

Residuals:
    Min      1Q  Median      3Q     Max 
-16.968  -6.473   3.376   5.317  11.407 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                 109.4883    12.3102   8.894 2.35e-06 ***
literacy_rate                -1.0786     0.3245  -3.324  0.00679 ** 
hospitals_per_100000_pop    -10.4272     6.5876  -1.583  0.14176    
beds_per_100000_populations   0.2900     0.1570   1.848  0.09168 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.173 on 11 degrees of freedom
Multiple R-squared:  0.6158,    Adjusted R-squared:  0.511 
F-statistic: 5.877 on 3 and 11 DF,  p-value: 0.01202

Linear Equation

y = 109.4883 − 1.0786(literacy rate) − 10.4272(hospitals per 100k) + 0.2900(beds per 100k)

y = = predicted infant mortality rate per 1,000 live births

The adjusted R-squared is 0.511 and the p-value is 0.01202 which is less than 0.05, suggesting genuine statistical significance. This correlation does not directly suggest causation but it does represent a relationship exists between the predictors and the predicted infant mortality rate.

Map for Standard of Living Indicators

pal <- colorFactor(palette = c( "red", "black"),
                   levels=c("High Literacy","Low Literacy"))
sol_popup <- paste0(
  "<b>Standard of Living Indicators: </b>", "<br>",
  "<b>State: </b>", sudan2$states, "<br>",
  "<b>Governor: </b>", sudan2$governor, "<br>",
  "<b>Population: </b>", sudan2$population, "<br>",
  "<b>Literacy Rate (%): </b>", sudan2$literacy_rate, "<br>",
  "<b>Infant Mortaliy Rate (%): </b>", sudan2$infant_mortality_percent, "<br>",
  "<b>Hospitals per 100k: </b>", sudan2$hospitals_per_100000_pop, "<br>",
  "<b>Inflation Rate (%): </b>", sudan2$inflation_rate, "<br>")
#creating legend for color and popup
leaflet(data = sudan2) |>
  setView(lat = 13.4, lng = 30.22, zoom = 4.5) |>
  addProviderTiles("OpenStreetMap.DE") |>
  addCircleMarkers(
    lng = ~longitude,
    lat = ~latitude,
    radius =  ~sqrt(literacy_rate)*2,
    color = ~pal(literacy_category),
    fillColor = ~pal(literacy_category),
    fillOpacity = 0.6,
    popup = sol_popup) |>
    addLegend(
      position = "bottomright",
      pal = pal,
      values = ~literacy_category,
      title = "Literacy Rate",
      opacity = 0.7)
#mapping standard of living

Map Plot Description

This map shows different standard of living indicators across states in Sudan, with each circle representing a state based on its geographic location. The colors separate states into high and low literacy categories, which helps show differences in education levels across the country. The size of each circle is based on literacy rate, so you can visually compare how literacy varies between states. When you click on a state, the popup gives more details like population, literacy rate, infant mortality percentage, number of hospitals per 100,000 people, and inflation rate. Overall, this map helps show how living conditions are not the same across Sudan, and how things like education, healthcare access, and economic conditions seem to vary together across different regions. Lat, Long values (Claude AI 2026).

Inflation Plot

sudan_inflation <- ggplot(sudan2,
       aes(x = reorder(states, inflation_rate),
           y = inflation_rate,
           fill = inflation_rate)) +
  geom_col(position = "dodge", width = 0.65) +
  scale_fill_gradient(
    low = "#fde2e4",
    high = "#2596be",
    name = "Inflation Rate")  +
  #bar plot to show inflation rates
    geom_hline(yintercept = mean(sudan2$inflation_rate, na.rm = TRUE),
             color = "#f4a261",
             linetype = "dashed",
             linewidth = 1) +
  annotate("text",
           x = 4.6,
           y = 13.9,
           label = "Overall Average",
           color = "#f4a261",
           size = 3.5) +
  #text to describe line
  coord_flip() +
  labs(
    title = "Inflation Rate Across States in Sudan",
    subtitle = "Gradient shows low to high inflation",
    y = "Inflation Rate (%)",
    x = "States",
    caption = "Source: Open Data for Africa – Sudan Data Portal"
  ) +
  theme_minimal(base_size = 13, base_family = "serif") +
  theme(legend.position = "right")
sudan_inflation

Inflation Plot Description

This plot shows the inflation rate across different states in Sudan, with each bar representing a state and its corresponding inflation level. The colors use a gradient scale where lighter shades represent lower inflation and darker shades represent higher inflation, making it easy to visually compare economic conditions across regions. The states are ordered by inflation rate, so you can quickly see which areas are experiencing the highest and lowest price increases. The dashed horizontal line represents the overall average inflation rate across all states, which helps provide a benchmark for comparison. Overall, this graph highlights clear differences in economic stability between states in Sudan and shows how inflation is not evenly distributed across the country. Although my graph shows inflation variability across the Sudan, if I wanted to go more in depth, I could add more standard of living indicators or more specific data on the overall economics of Sudan and its states. For instance, GDP may have been incorporated or added as a time series, as that would reveal that in the early 2000s, Sudan experienced multiple recessions. GDP experienced growth from 2000 to 2008, then in 2009 GDP suddenly dropped by 2.8% and GDP per capita dived by 17.6%(Country Economy). These economic declines were in part due to the global great recession from around 2007 to 2009 and the drop in oil prices that decreased Sudan’s revenue. Sudan was also facing internal conflicts in Darfur, which weakened the infrastructure. The recession was also caused by limited economic diversity and investment uncertainty. The GDP decline in 2009 contributed to inflation in Sudan by reducing export revenue and weakening the currency, which made imports more expensive and increased pressure on government finances. These conditions created inflationary pressure even during a period of slower economic growth.

References

  • Home - Sudan Data Portal. (n.d.). https://sudan.opendataforafrica.org/
  • Hughes, L. (2024, March 6). World Heritage: The pyramids of Meroe, Sudan. Wanderlust. https://www.wanderlustmagazine.com/inspiration/unesco-world-heritage-sites-meroe-pyramids-sudan/
  • Anthropic. (2026). Claude (May 2026 version) [Large language model]
  • Sudan GDP - gross domestic product 2009. countryeconomy.com. (2017, October 20). https://countryeconomy.com/gdp/sudan?year=2009
  • Tian, F. D., & Almosharaf, H. A. (2014). The causes of Sudan’s recent economic decline. IOSR Journal of Economics and Finance, 2(4), 26–40. https://doi.org/10.9790/5933-0242640