Load the required packages

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(openintro)
## Please visit openintro.org for free statistics materials
## 
## Attaching package: 'openintro'
## The following object is masked from 'package:ggplot2':
## 
##     diamonds
## The following objects are masked from 'package:datasets':
## 
##     cars, trees

Let’s examine the distribution of state populations using the data in countyComplete from the openintro package.

cc <- countyComplete %>% 
  select(state,name,pop2010) %>% 
  group_by(state) %>% 
  summarize(totpop = sum(pop2010),
            count = n(),
            avg = mean(pop2010)
            )
cc
## # A tibble: 51 x 4
##                   state   totpop count       avg
##                  <fctr>    <dbl> <int>     <dbl>
##  1              Alabama  4779736    67  71339.34
##  2               Alaska   710231    29  24490.72
##  3              Arizona  6392017    15 426134.47
##  4             Arkansas  2915918    75  38878.91
##  5           California 37253956    58 642309.59
##  6             Colorado  5029196    64  78581.19
##  7          Connecticut  3574097     8 446762.12
##  8             Delaware   897934     3 299311.33
##  9 District of Columbia   601723     1 601723.00
## 10              Florida 18801310    67 280616.57
## # ... with 41 more rows

Let’s look at the distribution of state populations with a histogram and a boxplot.

cc %>% ggplot(aes(x=totpop)) + geom_histogram(bins=5)

Here’s the boxplot.

cc %>% ggplot(aes(x="X",y=totpop)) +
   geom_boxplot() + 
   coord_flip()

Note the compression of most of the data because of the one large outlier. Scale the axis to avoid this

cc %>% ggplot(aes(x=totpop)) + 
  geom_histogram(bins=5) +
  scale_x_log10()

Let’s display a list of the states by population.

cc %>% ggplot(aes(x=state,y=totpop)) + geom_col()

Let’d do a coordinate flip.

cc %>% ggplot(aes(x=state,y=totpop)) + 
  geom_col() +
  coord_flip()

Better, but still hard to read. Modify figure height and width in the chunk.

cc %>% ggplot(aes(x=state,y=totpop)) + 
  geom_col() +
  coord_flip()

Order the states by size.

cc %>% ggplot(aes(x=reorder(state,totpop),y=totpop)) + 
  geom_col() +
  coord_flip()

Try points instead of bars.

cc %>% ggplot(aes(x=reorder(state,totpop),y=totpop)) + 
  geom_point() +
  coord_flip()

Exercise: Use the cdc dataset. Restrict yourself to the male population. Your objective is to produce a graphic showing how the average age of people varies across the values of the genhlth variable/