This lab will use the packages dslabs for the
gapminder dataset, and tidyverse. Notice that
both are loaded above. Remember to add your name to the file.
Boxplots provide a simple way to identify outliers. Here we will see why outliers may be easier to identify in boxplots than scatterplots.
gapminder %>%
filter(year == 2015) %>%
ggplot(aes(x = year, y = infant_mortality)) +
geom_point()
## Warning: Removed 7 rows containing missing values (`geom_point()`).
gapminder %>%
filter(year == 2015) %>%
ggplot(aes(x = "", y = infant_mortality)) +
geom_boxplot()
## Warning: Removed 7 rows containing non-finite values (`stat_boxplot()`).
gapminder %>%
filter(year >= 2005 & year <= 2015) %>%
ggplot(aes(x = "", y = infant_mortality)) +
geom_boxplot()
## Warning: Removed 77 rows containing non-finite values (`stat_boxplot()`).
In line with the exercise in class, create a bar plot that presents the number of flights per carrier and airport of origin. Instead of using both variables in the bar plot as in class, add a facet_wrap to have separate plots for each airport.
flights %>%
ggplot(aes(x = carrier)) +
geom_bar() +
facet_wrap(~ origin)
mean_pop <- gapminder %>%
filter(year == 1960) %>%
pull(population) %>%
mean(na.rm = TRUE)
median_pop <- gapminder %>%
filter(year == 1960) %>%
pull(population) %>%
median(na.rm = TRUE)
median_pop
## [1] 3075752
geom_density of population
in 1960. A density plot is a way of representing the distribution of a
numeric variable. Add a vertical line containing the value of mean_pop
and another one containing the value of median_pop. Use
geom_vline to do so ans use as.numeric around mean_pop and
median_pop. What do you observe?gapminder %>%
filter(year == 1960) %>%
ggplot(aes(x = population)) +
geom_density() +
geom_vline(xintercept = as.numeric(mean_pop), color = "red") +
geom_vline(xintercept = as.numeric(median_pop), color = "blue")