Submit this file (after adding your name after “author”) using Canvas. Make sure to label your plots!
dslabs and define the
dataset gapminder as gapminder <- as_tibble(gapminder).
There is another dataset called gapminder too, that is in
the package gapminder. They are different, so make sure you
only load dslabs. If you want to avoid confusion, you can
write dslabs::gapminder to make the package explicit. The
dslabs package has yearly observations, the
gapminder dataset reports data every 5 years.library(dslabs)
gapminder <- as_tibble(gapminder)
To look at the dataset, you can do head(gapminder). If
you just write dslabs::gapminder, R will print the whole
dataset. Please try to avoid this.
gdp_per_cap
corresponding to gdp divided by
population.gapminder <- gapminder %>%
mutate(gdp_per_cap = gdp / population)
gapminder_m <- gapminder %>% group_by(continent, year) %>% summarize(gapminder_m = median(life_expectancy, na.rm = TRUE))
## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
library(ggplot2)
gapminder_no2016 <- gapminder %>%
filter(year != 2016)
avg_lifeexp <- gapminder_no2016 %>%
group_by(continent, year) %>%
mutate(avg_life_exp = mean(life_expectancy, na.rm = TRUE))
ggplot(avg_lifeexp, aes(x = year, y = avg_life_exp, color = continent)) +
geom_line() +
labs(
title = "Average Life Expectancy over Time in each Continent",
x = "Year",
y = "Average Life Expectancy"
)
avg_lifeexp
## # A tibble: 10,360 × 11
## # Groups: continent, year [280]
## country year infant_mortality life_expectancy fertility population gdp
## <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Albania 1960 115. 62.9 6.19 1636054 NA
## 2 Algeria 1960 148. 47.5 7.65 11124892 1.38e10
## 3 Angola 1960 208 36.0 7.32 5270844 NA
## 4 Antigua… 1960 NA 63.0 4.43 54681 NA
## 5 Argenti… 1960 59.9 65.4 3.11 20619075 1.08e11
## 6 Armenia 1960 NA 66.9 4.55 1867396 NA
## 7 Aruba 1960 NA 65.7 4.82 54208 NA
## 8 Austral… 1960 20.3 70.9 3.45 10292328 9.67e10
## 9 Austria 1960 37.3 68.8 2.7 7065525 5.24e10
## 10 Azerbai… 1960 NA 61.3 5.57 3897889 NA
## # ℹ 10,350 more rows
## # ℹ 4 more variables: continent <fct>, region <fct>, gdp_per_cap <dbl>,
## # avg_life_exp <dbl>
Call this graph g.
g <- gapminder %>%
filter(year == 2010) %>%
ggplot(aes(x = fertility)) +
geom_histogram(
binwidth = 0.5,
color = "white",
fill = "#d90502") +
labs(
title = "Histogram of Fertility Rates in 2010",
x = "Fertility Rate",
y = "Count") +
theme_minimal()
g
g <- g + facet_grid(rows = vars(continent))
What do you see?
scatter_2010 <- gapminder %>%
filter(year == 2010) %>%
ggplot(aes(x = gdp_per_cap, y = fertility)) +
geom_point(
size = 3,
alpha = 0.5,
color = "#009E73"
) +
labs(
title = "Scatter Plot of Fertility Rate by GDP per Capita in 2010",
x = "GDP per Capita",
y = "Fertility Rate" +
theme_minimal()
)
scatter_2010
## Warning: Removed 9 rows containing missing values or values outside the scale range
## (`geom_point()`).
scatter_2015 <- gapminder %>%
filter(year == 2015) %>%
ggplot(aes(x = year, y = infant_mortality)) +
geom_point() +
labs(
title = "Scatter Plot of Infant Mortality Rate in 2015",
x = "Year",
y = "Infant Mortality" +
theme_minimal()
)
scatter_2015
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_point()`).
##There are a few outliers that reach a higher range
box_2015 <- gapminder %>%
filter(year == 2015) %>%
ggplot(aes(x = year, y = infant_mortality)) +
geom_boxplot() +
labs(
title = "Box Plot of Infant Mortality Rate in 2015",
x = "Year",
y = "Infant Mortality" +
theme_minimal()
)
box_2015
## Warning: Removed 7 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
##Yes there are outliers toward the top in the higher range and is more easily seen than a scatter plot.
gapminder_2000s <- gapminder %>%
filter(year >= 2000 & year <= 2009) %>%
ggplot(aes(x = year, y = infant_mortality)) +
geom_boxplot() +
labs(
title = "Infant Mortality by Year from 2000-2009",
x = "Year",
y = "Infant Mortality" +
theme_minimal()
)
gapminder_2000s
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
##OVertime, there are more outliers and they have reached higher levels of infant mortality toward 150 than 100 in just the year 2010.