library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.3 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.4 âś” tibble 3.2.1
## âś” lubridate 1.9.2 âś” tidyr 1.3.0
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the dataset
country = read_csv("country_stat.csv")
## Rows: 10545 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): country, continent, region
## dbl (6): year, infant_mortality, life_expectancy, fertility, population, gdp
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(country)
## # A tibble: 6 Ă— 9
## country year infant_mortality life_expectancy fertility population gdp
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Albania 1960 115. 62.9 6.19 1636054 NA
## 2 Algeria 1960 148. 47.5 7.65 11124892 1.38e10
## 3 Angola 1960 208 36.0 7.32 5270844 NA
## 4 Antigua … 1960 NA 63.0 4.43 54681 NA
## 5 Argentina 1960 59.9 65.4 3.11 20619075 1.08e11
## 6 Armenia 1960 NA 66.9 4.55 1867396 NA
## # ℹ 2 more variables: continent <chr>, region <chr>
Check for missing values and visualize missingness pattern.
summary(country)
## country year infant_mortality life_expectancy
## Length:10545 Min. :1960 Min. : 1.50 Min. :13.20
## Class :character 1st Qu.:1974 1st Qu.: 16.00 1st Qu.:57.50
## Mode :character Median :1988 Median : 41.50 Median :67.54
## Mean :1988 Mean : 55.31 Mean :64.81
## 3rd Qu.:2002 3rd Qu.: 85.10 3rd Qu.:73.00
## Max. :2016 Max. :276.90 Max. :83.90
## NA's :1453
## fertility population gdp continent
## Min. :0.840 Min. :3.124e+04 Min. :4.040e+07 Length:10545
## 1st Qu.:2.200 1st Qu.:1.333e+06 1st Qu.:1.846e+09 Class :character
## Median :3.750 Median :5.009e+06 Median :7.794e+09 Mode :character
## Mean :4.084 Mean :2.701e+07 Mean :1.480e+11
## 3rd Qu.:6.000 3rd Qu.:1.523e+07 3rd Qu.:5.540e+10
## Max. :9.220 Max. :1.376e+09 Max. :1.174e+13
## NA's :187 NA's :185 NA's :2972
## region
## Length:10545
## Class :character
## Mode :character
##
##
##
##
There is 1453 NA’s in infant_mortality, 187 NA’s in fertility, 185 NA’s in population and 2972 NA’s in gdp
Determine the number of unique countries and years of observations.
length(unique(country$country))
## [1] 185
diff(range(country$year)) + 1
## [1] 57
There are 185 unique countries and 57 years.
To create a new variable for GDP per capita.
country_per <- country %>%
mutate(GDP_per_capita = gdp/population)
To compare the variation in infant mortality rates across different continents.
# Box plot of infant mortality by continent
infant_mortality_boxplot <- ggplot(country, aes(x = continent, y = infant_mortality, fill = continent)) +
geom_boxplot() +
labs(title = "Infant Mortality Rate by Continent", x = "Continent", y = "Infant Mortality Rate") +
theme_minimal() +
scale_fill_brewer(palette = "Set3")
print(infant_mortality_boxplot)
## Warning: Removed 1453 rows containing non-finite values (`stat_boxplot()`).
The box plot compares the distribution of infant mortality rates across continents. By examining the box plots for each continent, we can identify differences in the central tendency and spread of infant mortality rates. This visualization helps us understand how infant mortality varies geographically and provides insights into potential disparities in healthcare and socio-economic conditions among continents. Additionally, differences in the whiskers’ lengths and outliers can indicate variations in the severity of infant mortality rates among continents.