library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(openintro)
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following object is masked from 'package:ggplot2':
##
## diamonds
## The following objects are masked from 'package:datasets':
##
## cars, trees
Let’s examine the distribution of state populations using the data in countyComplete from the openintro package.
cc <- countyComplete %>%
select(state,name,pop2010) %>%
group_by(state) %>%
summarize(totpop = sum(pop2010),
count = n(),
avg = mean(pop2010)
)
cc
## # A tibble: 51 x 4
## state totpop count avg
## <fctr> <dbl> <int> <dbl>
## 1 Alabama 4779736 67 71339.34
## 2 Alaska 710231 29 24490.72
## 3 Arizona 6392017 15 426134.47
## 4 Arkansas 2915918 75 38878.91
## 5 California 37253956 58 642309.59
## 6 Colorado 5029196 64 78581.19
## 7 Connecticut 3574097 8 446762.12
## 8 Delaware 897934 3 299311.33
## 9 District of Columbia 601723 1 601723.00
## 10 Florida 18801310 67 280616.57
## # ... with 41 more rows
Let’s look at the distribution of state populations with a histogram and a boxplot.
cc %>% ggplot(aes(x=totpop)) + geom_histogram(bins=5)
Here’s the boxplot.
cc %>% ggplot(aes(x="X",y=totpop)) +
geom_boxplot() +
coord_flip()
Note the compression of most of the data because of the one large outlier. Scale the axis to avoid this
cc %>% ggplot(aes(x=totpop)) +
geom_histogram(bins=5) +
scale_x_log10()
Let’s display a list of the states by population.
cc %>% ggplot(aes(x=state,y=totpop)) + geom_col()
Let’d do a coordinate flip.
cc %>% ggplot(aes(x=state,y=totpop)) +
geom_col() +
coord_flip()
Better, but still hard to read. Modify figure height and width in the chunk.
cc %>% ggplot(aes(x=state,y=totpop)) +
geom_col() +
coord_flip()
Order the states by size.
cc %>% ggplot(aes(x=reorder(state,totpop),y=totpop)) +
geom_col() +
coord_flip()
Try points instead of bars.
cc %>% ggplot(aes(x=reorder(state,totpop),y=totpop)) +
geom_point() +
coord_flip()
Exercise: Use the cdc dataset. Restrict yourself to the male population. Your objective is to produce a graphic showing how the average age of people varies across the values of the genhlth variable/