Notes Feb 17

Harold Nelson

2/17/2021

Setup

Get the libraries and data.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.5     ✓ dplyr   1.0.3
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(waffle)
load("cdc.Rdata")

Table of genhlth

Make a basic table of genhlth.

Answer

table(cdc$genhlth)
## 
## excellent very good      good      fair      poor 
##      4657      6972      5675      2019       677

Basic Barplot

Make a default barplot of genhlth using ggplot2.

Answer

cdc %>% 
  ggplot(aes(x=genhlth)) + geom_bar()

## Improvement

Produce a similar graphic of genhlth with the columns in order by count.

Answer

cdc %>% 
  group_by(genhlth) %>% 
  summarize(count = n()) %>% 
  ggplot(aes(x=reorder(genhlth,-count),y=count)) + 
           geom_col() 

It is possible to do this using geom_bar(), but I’ve always used geom_col() instead. See this link. https://stackoverflow.com/questions/56599684/reorder-geom-bar-from-high-to-low-when-using-stat-count

Flip

Put the categorical variable on the y-axis.

Answer

cdc %>% 
  group_by(genhlth) %>% 
  summarize(count = n()) %>% 
  ggplot(aes(x=reorder(genhlth,-count),y=count)) + 
           geom_col() +
           coord_flip()

Cleveland Dotplot

Do this as a Cleveland dotplot.

Answer

cdc %>% 
  group_by(genhlth) %>% 
  summarize(count = n()) %>% 
  ggplot(aes(y=reorder(genhlth,count),x=count)) + 
           geom_point() 

Proportions

We can see the relative sizes of the counts easily, but estimating proportions from these visuals is not easy.

Try a pie chart.

gh_counts = cdc %>% 
  group_by(genhlth) %>% 
  summarize(count = n())

ggplot(gh_counts, aes(x = 1, y = count, fill = genhlth)) +
    geom_col() +
    coord_polar(theta='y')

Clean up the Pie

Use theme_void() and give it a title.

Answer

ggplot(gh_counts, aes(x = 1, y = count, fill = genhlth)) +
    geom_col() +
    coord_polar(theta='y') +
  theme_void() + 
  ggtitle("Distribution of genhlth")

Waffle

Do a waffle instead of a pie.

Answer

Note that we have left the tidyverse!!

# Crate a percent variable in gh_counts.
gh_counts <- gh_counts %>% 
  mutate(percent = count/sum(count) * 100) 

# Create a vector of percents with the genhlth values as names.

wpc = gh_counts$percent
names(wpc) = gh_counts$genhlth

waffle(wpc, title = "Distribution of genhlth")