In my analysis for this exercise, I decided to use the gapminder data that was available in the dslabs package. This data is composed of values for fertility, life expectancy, population, and region for numerous nations. To demonstrate the correlation between fertility and life expectancy, I constructed a multivariable scatterplot. Moreover, I have chosen to use color to denote regions while using the point size to indicate population.
## Load packages# Load packages needed for the analysislibrary(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.3
Warning: package 'ggplot2' was built under R version 4.5.3
Warning: package 'tidyr' was built under R version 4.5.3
Warning: package 'readr' was built under R version 4.5.3
Warning: package 'purrr' was built under R version 4.5.3
Warning: package 'stringr' was built under R version 4.5.3
Warning: package 'forcats' was built under R version 4.5.3
Warning: package 'lubridate' was built under R version 4.5.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
Warning: package 'dslabs' was built under R version 4.5.3
library(ggthemes)
Warning: package 'ggthemes' was built under R version 4.5.3
ggplot(gap_2011, aes(x = fertility, y = life_expectancy)) +geom_point(aes(color = region, size = population), alpha =0.7) +labs(title ="Fertility vs Life Expectancy Around the World",subtitle ="Countries with lower fertility tend to have higher life expectancy",x ="Fertility Rate (Children per Woman)",y ="Life Expectancy (Years)",color ="Region",size ="Population",caption ="Data source: DS Labs" ) +scale_color_brewer(palette ="Dark2") +theme_economist()
Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Dark2 is 8
Returning the palette you asked for with that many colors
Warning: Removed 120 rows containing missing values or values outside the scale range
(`geom_point()`).
DISCUSSION
This multivariable chart visualizes the correlation between fertility and life expectancy of several countries during 2011. In order to achieve this, I firstly filtered the data by 2011 and eliminated the NA from all the variables necessary for the creation of this visualization. Then, I aggregated detailed regions into regional groups in order to have a more clear legend. A scatterplot was created by setting up fertility to x axis and life expectancy to y axis, and using color to show region group and size to represent population.
It is notable that the countries that have relatively high levels of fertility usually have a shorter life expectancy, while those with low fertility have longer life expectancy. Besides, Sub-Saharan Africa countries seem to cluster at the upper-left corner of the graph (high fertility and lower life expectancy), and The Western countries cluster at the lower-right corner (low fertility and high life expectancy).