Instructions

For each exercise below, show code. Once you’ve completed things, don’t forget to input everything into the quiz on Canvas and to upload this document (knitted version please!) at the end of the quiz. A few tips:


Set up

The gapminder libraries include and load demographic information into R that we will use to analyze various countries demographics to make funding decisions.

Tidyverse is a package that is used to create and manipulate visualizations in R.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(gapminder)
data(gapminder)

This code will filter data from the U.S into a smaller data frame specific to the U.S. GGplot and smooth code is used to generate a more clear and readable line chart.

gapminder %>%
  filter(country == "United States") %>%
  ggplot() +
  geom_smooth(mapping = aes(x = year, y = lifeExp), color = "blue") +
  ggtitle("Life expectancy in the US") +
  xlab("Year") +
  ylab("Life expectancy")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Life expectancy in the United States has shown a consistently increasing trend from approximately 1950 to 2010.


Q1: Visualize the change in life expectancy (lifeExp) in three countries of your choosing using a line plot (geom_line()).

The code below includes the demographic data of the U.S, Afganistan and Nigeria to create a comparative line chart using ggplot along with coding in plot points representing population size.

gapminder %>%
  filter(country %in% c("Afghanistan", "United States", "Nigeria")) %>%
  ggplot() +
  geom_smooth(mapping = aes(x = year, y = lifeExp, color = country)) +
  geom_point(mapping = aes(x = year, y = lifeExp, size = pop)) +
  ggtitle("Comparing life expectancy") +
  xlab("Year") +
  ylab("Life expectancy")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Demographic data approximately from 1950 to 2010 express a wide gap in age expentancy between the United State, Afganistan and Nigeria. In 1950 the U.s had a life expentancy near 70 years old while Afganistan and Nigeria were approximately 30 and 35 years old respectively. Over the next 60 years the U.S life expectancy increase approximately 10 years, Afganistan saw the greatest life expectancy increase of about 15 years while Nigeria saw a slower rate of increase about 5 years. Nigeria showed the only period in which life expectancy decreased a few years from 1995-2005.

Boxplot for Life Expectancy in 2007 to further analyze recent life expectancy across various regions which will be correlated with GDP per capita.

gapminder %>%
  filter(year == 2007) %>%  
  ggplot() +
  geom_boxplot(aes(x = continent, y = lifeExp)) +  # replace y aesthetic
  ggtitle("Comparing Life Expectancy in 2007") +
  xlab("Geographic Area") +
  ylab("Age")


Q2: Use a box and whisker plot (geom_boxplot()) to visualize differences in GDP per capita across continents.

The code to create a boxplot is not much different than that above to create a line chart. The geom code type was changed to boxplot and aesthetics were updated for better labelling and visualizaion. Geometry was changed to appropriately represent continent and GDP data.

gapminder %>%
  filter(year == 2007) %>%  # filter out the year of interest
  ggplot() +
  geom_boxplot(aes(x = continent, y = gdpPercap, fill = continent)) +  # new geometry and aesthetic
  ggtitle("Comparing GDP per capita") +
  xlab("2007") +
  ylab("GDP per capita")

The boxplot shows GDP per capita of 5 geographic regions across the world in 2007. The boxes represent half the data entries (25-75% quartiles) within the data set with the horizonal line showing median of the distribution. The lines extending outside the boxes represent the other half of the data with outliers represented by dots. The box plot expresses higher GDP’s per capita in the European and Oceanic regions with the Americas and Asia in the middle where Americas sees large outlieing GDP in the U.S and Canada and half of Asian countries GDP’s much higher than those falling within the lower 50% of GDP within the region.


Q3: Create a scatterplot (geom_point()) showing the relationship between GDP per capita and life expectancy in 2007.

To show the relationship between GDP per capita and life expectancy correlation can be computed by dividing the product of the standard deviations of each variable by covariance (joint variation of their means).

cor(gapminder$gdpPercap, gapminder$lifeExp)
## [1] 0.5837062

A .58 correlation indicates a positive relationship. As GDP per capita increases Life Expectancy sees an increase as well.

ggplot(gapminder) +
  geom_point(aes(x = lifeExp, y = gdpPercap, color = continent), alpha = 0.5) +
  theme_minimal() +
  theme(legend.title = element_blank()) +
  xlab("Life expectancy in years") +
  ylab("GDP per capita") +
  labs(title = "GDP per Capita and Life Expectancy", caption = "Source: Gapminder\n ")