Instructions
For each exercise below, show code. Once you’ve completed things,
don’t forget to input everything into the quiz on Canvas and to upload
this document (knitted version please!) at the end of the quiz. A few
tips:
- Don’t forget to knit your document frequently!
- Don’t forget to
install.packages() and load them using
library().
- EXPLAIN WHAT YOUR RESULTS MEAN! Think about the
numbers and explain, in words, what they mean.
- Make sure you label all axes and add a title to your plots. I will
take off points if you fail to do this.
Set up
This code will filter data from the U.S into a smaller data frame
specific to the U.S. GGplot and smooth code is used to generate a more
clear and readable line chart.
gapminder %>%
filter(country == "United States") %>%
ggplot() +
geom_smooth(mapping = aes(x = year, y = lifeExp), color = "blue") +
ggtitle("Life expectancy in the US") +
xlab("Year") +
ylab("Life expectancy")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Life expectancy in the United States has shown a consistently
increasing trend from approximately 1950 to 2010.
Q1: Visualize the change in life expectancy (lifeExp)
in three countries of your choosing using a line plot
(geom_line()).
The code below includes the demographic data of the U.S, Afganistan
and Nigeria to create a comparative line chart using ggplot along with
coding in plot points representing population size.
gapminder %>%
filter(country %in% c("Afghanistan", "United States", "Nigeria")) %>%
ggplot() +
geom_smooth(mapping = aes(x = year, y = lifeExp, color = country)) +
geom_point(mapping = aes(x = year, y = lifeExp, size = pop)) +
ggtitle("Comparing life expectancy") +
xlab("Year") +
ylab("Life expectancy")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Demographic data approximately from 1950 to 2010 express a wide gap
in age expentancy between the United State, Afganistan and Nigeria. In
1950 the U.s had a life expentancy near 70 years old while Afganistan
and Nigeria were approximately 30 and 35 years old respectively. Over
the next 60 years the U.S life expectancy increase approximately 10
years, Afganistan saw the greatest life expectancy increase of about 15
years while Nigeria saw a slower rate of increase about 5 years. Nigeria
showed the only period in which life expectancy decreased a few years
from 1995-2005.
Q2: Use a box and whisker plot (geom_boxplot()) to
visualize differences in GDP per capita across continents.
The code to create a boxplot is not much different than that above
to create a line chart. The geom code type was changed to boxplot and
aesthetics were updated for better labelling and visualizaion. Geometry
was changed to appropriately represent continent and GDP data.
gapminder %>%
filter(year == 2007) %>% # filter out the year of interest
ggplot() +
geom_boxplot(aes(x = continent, y = gdpPercap, fill = continent)) + # new geometry and aesthetic
ggtitle("Comparing GDP per capita") +
xlab("2007") +
ylab("GDP per capita")

The boxplot shows GDP per capita of 5 geographic regions across the
world in 2007. The boxes represent half the data entries (25-75%
quartiles) within the data set with the horizonal line showing median of
the distribution. The lines extending outside the boxes represent the
other half of the data with outliers represented by dots. The box plot
expresses higher GDP’s per capita in the European and Oceanic regions
with the Americas and Asia in the middle where Americas sees large
outlieing GDP in the U.S and Canada and half of Asian countries GDP’s
much higher than those falling within the lower 50% of GDP within the
region.
Q3: Create a scatterplot (geom_point()) showing the
relationship between GDP per capita and life expectancy in 2007.
To show the relationship between GDP per capita and life expectancy
correlation can be computed by dividing the product of the standard
deviations of each variable by covariance (joint variation of their
means).
cor(gapminder$gdpPercap, gapminder$lifeExp)
## [1] 0.5837062
A .58 correlation indicates a positive relationship. As GDP per
capita increases Life Expectancy sees an increase as well.
ggplot(gapminder) +
geom_point(aes(x = lifeExp, y = gdpPercap, color = continent), alpha = 0.5) +
theme_minimal() +
theme(legend.title = element_blank()) +
xlab("Life expectancy in years") +
ylab("GDP per capita") +
labs(title = "GDP per Capita and Life Expectancy", caption = "Source: Gapminder\n ")

The scatterplot shows a positive correlation between GDP per capita
and Life Expectancy. Countries with lower GDP per capita’s, especially
Africa, are seen to have lower life expectancy. While countries with
higher GDP per capita, such as Europe and Oceania, are much more present
on the scatterplot where higher GDP per capita and higher life
expectancies intersect. We need to be careful to not over simplify the
relationship between GDP per capita and Life Expectancy as this data
only represents few amonst many factors related to a regions wealth or
life expectancy.