Overview of Tidying and Example Analysis

The data set has a few opportunities for tidying. The column “Country_Code”, which contains a three letter code to indicate the country, is dropped since it both redundant with and less informative than the preceding “Country_Name” column. The column “Series_Name” is dropped because it is too wordy for the purposes here and the “Series_Code” column provides something more lean with which to work. Next, for some records, there are no data for a given year as indicated by a double period (“..”); I replaced all instances of “..” with NA to enable seamless downstream calculations. Next, there is an opportunity to pivot the yearly population data vertically. Lastly, two non-conforming rows containing the source of the data and the date it was last updated are dropped from the data frame.

In terms of graphing, no exact end-point was specified, but I decided to visualize measles immunization rates by country in 2022. The immunization rates span from 81% to 99%; surprisingly, the United States lingers somewhere in the middle at 92%. Notably, the herd immunity threshold for measles is thought to be in the low 90s and several countries are below this rate range.

knitr::opts_chunk$set(echo = TRUE)

#Import libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(dplyr)
library('RCurl')
## 
## Attaching package: 'RCurl'
## 
## The following object is masked from 'package:tidyr':
## 
##     complete
library(tidyverse)
library(ggplot2)
library(knitr)
library(ggpubr)

#Read data 

getdata <- getURL('https://raw.githubusercontent.com/kr0710/Data607/refs/heads/main/world_development_indicators.csv')

df <- read.csv(text = getdata, header = TRUE)

#Drop unneeded columns

df_drop <- df[,-c(2,3)]

colnames(df_drop)[3:12] <- c(2014,2015,2016,2017,2018,2019,2020,2021,2022,2023)

# Drop two rows that do not contain data
df_drop <- df_drop[-c(1084, 1085),]

df_pivot <-
  pivot_longer(df_drop,
               cols = colnames(df_drop)[3:12],
               names_to = 'Year',
               values_to = 'Measure'
  )

df_pivot[df_pivot == ".."] <- NA

#Visualize measles immunization rates in 2022 by country
df_pivot |>
  filter (Year == 2022, Series_Code == "SH.IMM.MEAS") |>
    ggplot(aes(x = Country_Name, y = Measure, fill = Country_Name)) +
      geom_bar(position="dodge", stat="identity") +
      theme(axis.text.x=element_text(angle=90))