For this assignment, I used the Gapminder data set from the dslabs package, which contains global health and demographic data across countries and years. I decided to create a heat map to visualize how life expectancy is changing over time across continents.
First, to create the visualization, I cleaned the data set by removing missing values from my key variables: continent, year and life expectancy. Second, I explored the data set by viewing the variable names and the values within the continent variable. This allowed me to better understand the structure of the data and identify any variables that need renaming. I noticed that one of the continent categories was labeled “Oceania,” which may not be obvious to all audiences. I decided to rename Oceania to “Australia & Oceania”.
Third, I grouped the data by continent and year, then calculated the average life expectancy for each group. This allows me to summarize the data by region/continent rather than individual countries. Fourth, I reshaped the data set into a matrix format so that the rows represent continents and the columns represent years.
Lastly, I used the matrix and heatmap() function to create a colorful visualization, depicting how life expectancy has changed over time across the continents. The color gradient represents changes in life expectancy, with darker colors representing higher life expectancy and brighter/lighter colors representing lower life expectancy.
Loading Packages and Data Set
# Loading the appropriate/required librarieslibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
Warning: package 'dslabs' was built under R version 4.5.3
country year infant_mortality life_expectancy fertility
1 Albania 1960 115.40 62.87 6.19
2 Algeria 1960 148.20 47.50 7.65
3 Angola 1960 208.00 35.98 7.32
4 Antigua and Barbuda 1960 NA 62.97 4.43
5 Argentina 1960 59.87 65.39 3.11
6 Armenia 1960 NA 66.86 4.55
population gdp continent region
1 1636054 NA Europe Southern Europe
2 11124892 13828152297 Africa Northern Africa
3 5270844 NA Africa Middle Africa
4 54681 NA Americas Caribbean
5 20619075 108322326649 Americas South America
6 1867396 NA Asia Western Asia
Cleaning Data Set and creating summarized Data set for heat map
# Viewing variable names in datasetnames(gapminder)
# I am creating summarized dataset for the heatmapgapminder_summary <- gapminder |># Remove missing values from my key variablesfilter(!is.na(continent), !is.na(life_expectancy), !is.na(year)) |># Group data by continent and yeargroup_by(continent, year) |># Calculate average life expectancy per groupsummarise(avg_life_exp =mean(life_expectancy),.groups ="drop" )# View first few rowshead(gapminder_summary)
# A tibble: 6 × 3
continent year avg_life_exp
<chr> <int> <dbl>
1 Africa 1960 43.1
2 Africa 1961 43.6
3 Africa 1962 44.1
4 Africa 1963 44.6
5 Africa 1964 45.0
6 Africa 1965 45.5
Reshaping the Data into Matrix format
# I am reshaping the data so that rows = continents, columns = years, values = average life expectancygapminder_wide <- gapminder_summary |>select(continent, year, avg_life_exp) |>pivot_wider(names_from = year, values_from = avg_life_exp)# Convert to matrix formatgapminder_matrix <-data.matrix(gapminder_wide[, -1])# Assigning row names to be continentsrow.names(gapminder_matrix) <- gapminder_wide$continent
Creating the Heat map of Average Life Expectancy
# Create heatmap of average life expectancygapminder_heatmap <-heatmap( gapminder_matrix,Rowv =NA, Colv =NA,col =viridis(25, option ="plasma"),cexCol =0.7, # adjusting column labelscexRow =0.9, # adjusting row labelscex.main =0.7, # adjusting main title scale ="none", xlab ="",ylab ="",main ="Heatmap of Average Life Expectancy by Continent and Year")
Summary:
This heat map depicts average life expectancy across continents over time (1960-2016). Each row represents a continent, while each column represents a year. The color gradient indicates life expectancy, with darker colors representing lower life expectancy and lighter colors representing higher life expectancy. Overall, the heat map shows that life expectancy has increased across all continents over time. Europe and Americas in particular have consistently shown the highest life expectancy, although Europe appears to fare better than the Americas. Africa and Asia have experienced improvements, with Africa experiencing the lowest life expectancy overall.
One factor I found surprising is that Australia and Oceania appear to have lower life expectancy compared to regions like Europe and even Asia. However, this likely reflects how the data are grouped and averaged and not reality. Oceania does not only include Australia and New Zealand but also several small Pacific Islands with lower life expectancy, which is likely pulling the regional average down. Therefore, the heat map should be interpreted with caution rather than as a direct reflection of life expectancy on the continent.