Motivation

Thanksgiving is just around the corner. Let’s celebrate the best way we know how, by analyzing some festive data! I hope you have fun making graphics! Happy Thanksgiving!

Dataset/Article

We are using the data from fivethirtyeight the Here’s What Your Part of America Eats on Thanksgiving article https://fivethirtyeight.com/features/heres-what-your-part-of-america-eats-on-thanksgiving/.

The original data set can be found here: https://raw.githubusercontent.com/fivethirtyeight/data/master/thanksgiving-2015/thanksgiving-2015-poll-data.csv

The variables for the survey are described here:https://github.com/fivethirtyeight/data/tree/master/thanksgiving-2015

I cleaned up the data, selected a subset of variables, and created binary variables. My dataset can be found here: https://raw.githubusercontent.com/kitadasmalley/FA2020_DataViz/main/data/useThanks.csv

useThanks<-read.csv("https://raw.githubusercontent.com/kitadasmalley/FA2020_DataViz/main/data/useThanks.csv",
                    header=TRUE)

str(useThanks)
## 'data.frame':    1058 obs. of  83 variables:
##  $ id                 : num  4.34e+09 4.34e+09 4.34e+09 4.34e+09 4.34e+09 ...
##  $ celebrate          : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ main               : Factor w/ 9 levels "","Chicken","Ham/Pork",..: 9 9 9 9 7 9 9 9 9 5 ...
##  $ cooked             : Factor w/ 6 levels "","Baked","Fried",..: 2 2 6 2 2 6 2 2 6 2 ...
##  $ stuffing           : Factor w/ 5 levels "","Bread-based",..: 2 2 5 2 2 5 2 5 2 2 ...
##  $ cranberry          : Factor w/ 5 levels "","Canned","Homemade",..: 4 5 3 3 2 3 2 3 2 5 ...
##  $ gravy              : Factor w/ 3 levels "","No","Yes": 3 3 3 3 3 3 3 3 3 3 ...
##  $ brussel.sprouts    : Factor w/ 2 levels "","Brussel sprouts": 1 1 2 2 2 2 1 1 2 2 ...
##  $ carrots            : Factor w/ 2 levels "","Carrots": 2 1 2 1 1 2 1 2 1 2 ...
##  $ cauliflower        : Factor w/ 2 levels "","Cauliflower": 1 1 2 1 1 2 1 1 1 1 ...
##  $ corn               : Factor w/ 2 levels "","Corn": 1 2 2 1 1 2 1 1 2 1 ...
##  $ cornbread          : Factor w/ 2 levels "","Cornbread": 1 1 2 2 2 2 1 1 2 1 ...
##  $ fruit.salad        : Factor w/ 2 levels "","Fruit salad": 1 1 1 1 1 2 2 1 1 1 ...
##  $ green.beans        : Factor w/ 2 levels "","Green beans/green bean casserole": 2 2 1 1 1 2 2 1 2 2 ...
##  $ mac.n.cheese       : Factor w/ 2 levels "","Macaroni and cheese": 2 2 1 1 1 2 1 1 1 1 ...
##  $ mashed.potatoes    : Factor w/ 2 levels "","Mashed potatoes": 2 2 2 2 2 2 2 1 2 2 ...
##  $ rolls              : Factor w/ 2 levels "","Rolls/biscuits": 1 2 2 2 2 2 2 1 2 2 ...
##  $ squash             : Factor w/ 2 levels "","Squash": 1 1 1 1 2 2 1 1 2 1 ...
##  $ salad              : Factor w/ 2 levels "","Vegetable salad": 1 2 2 2 2 2 1 1 1 1 ...
##  $ yams.sweet.potato  : Factor w/ 2 levels "","Yams/sweet potato casserole": 2 2 1 2 2 2 2 1 1 2 ...
##  $ apple.pie          : Factor w/ 2 levels "","Apple": 2 2 2 1 2 1 2 1 2 1 ...
##  $ buttermilk.pie     : Factor w/ 2 levels "","Buttermilk": 1 1 1 1 1 1 1 1 2 2 ...
##  $ cherry.pie         : Factor w/ 2 levels "","Cherry": 1 1 2 1 1 1 1 1 1 1 ...
##  $ chocolate.pie      : Factor w/ 2 levels "","Chocolate": 1 2 1 1 1 1 1 2 1 1 ...
##  $ coconut.pie        : Factor w/ 2 levels "","Coconut cream": 1 1 1 1 1 1 1 1 1 1 ...
##  $ keylime.pie        : Factor w/ 2 levels "","Key lime": 1 1 1 1 1 1 1 1 1 1 ...
##  $ peach.pie          : Factor w/ 2 levels "","Peach": 1 1 2 1 1 1 1 1 1 1 ...
##  $ pecan.pie          : Factor w/ 2 levels "","Pecan": 1 1 2 2 1 1 1 1 1 1 ...
##  $ pumpkin.pie        : Factor w/ 2 levels "","Pumpkin": 1 2 2 2 2 1 2 1 2 2 ...
##  $ sweet.potato.pie   : Factor w/ 2 levels "","Sweet Potato": 1 1 2 1 1 2 1 1 2 2 ...
##  $ apple.cobbler      : Factor w/ 2 levels "","Apple cobbler": 1 1 1 1 1 1 1 1 1 1 ...
##  $ blondies           : Factor w/ 2 levels "","Blondies": 1 1 1 1 1 1 1 1 1 1 ...
##  $ brownies           : Factor w/ 2 levels "","Brownies": 1 1 2 1 1 1 1 1 1 1 ...
##  $ carrot.cake        : Factor w/ 2 levels "","Carrot cake": 1 1 2 1 1 1 1 1 1 1 ...
##  $ cheesecake         : Factor w/ 2 levels "","Cheesecake": 2 2 1 1 1 2 1 1 1 1 ...
##  $ cookies            : Factor w/ 2 levels "","Cookies": 2 2 2 1 1 1 2 2 2 1 ...
##  $ fudge              : Factor w/ 2 levels "","Fudge": 1 1 2 1 1 1 1 1 1 1 ...
##  $ ice.cream          : Factor w/ 2 levels "","Ice cream": 2 1 2 1 1 1 1 1 1 1 ...
##  $ peach.cobbler      : Factor w/ 2 levels "","Peach cobbler": 1 1 1 1 1 1 1 1 1 1 ...
##  $ pray               : Factor w/ 3 levels "","No","Yes": 3 3 3 2 2 3 2 2 2 3 ...
##  $ friendsgiving      : Factor w/ 3 levels "","No","Yes": 2 2 3 2 2 3 2 3 2 2 ...
##  $ black.friday       : Factor w/ 3 levels "","No","Yes": 2 3 3 2 2 3 3 3 2 2 ...
##  $ area.live          : Factor w/ 4 levels "","Rural","Suburban",..: 3 2 3 4 4 4 2 2 4 3 ...
##  $ age                : Factor w/ 5 levels "","18 - 29","30 - 44",..: 2 2 2 3 3 2 2 2 3 3 ...
##  $ gender             : Factor w/ 3 levels "","Female","Male": 3 2 3 3 3 3 3 3 3 3 ...
##  $ income             : Factor w/ 12 levels "","$0 to $9,999",..: 11 10 2 8 4 2 9 12 11 9 ...
##  $ DivName            : Factor w/ 10 levels "","East North Central",..: 4 3 5 7 7 7 2 5 4 3 ...
##  $ celebrate01        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ gravy01            : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ friendsgiving01    : int  0 0 1 0 0 1 0 1 0 0 ...
##  $ black.friday01     : int  0 1 1 0 0 1 1 1 0 0 ...
##  $ brussel.sprouts01  : int  0 0 1 1 1 1 0 0 1 1 ...
##  $ carrots01          : int  1 0 1 0 0 1 0 1 0 1 ...
##  $ cauliflower01      : int  0 0 1 0 0 1 0 0 0 0 ...
##  $ corn01             : int  0 1 1 0 0 1 0 0 1 0 ...
##  $ cornbread01        : int  0 0 1 1 1 1 0 0 1 0 ...
##  $ fruit.salad01      : int  0 0 0 0 0 1 1 0 0 0 ...
##  $ green.beans01      : int  1 1 0 0 0 1 1 0 1 1 ...
##  $ mac.n.cheese01     : int  1 1 0 0 0 1 0 0 0 0 ...
##  $ mashed.potatoes01  : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ rolls01            : int  0 1 1 1 1 1 1 0 1 1 ...
##  $ squash01           : int  0 0 0 0 1 1 0 0 1 0 ...
##  $ salad01            : int  0 1 1 1 1 1 0 0 0 0 ...
##  $ yams.sweet.potato01: int  1 1 0 1 1 1 1 0 0 1 ...
##  $ apple.pie01        : int  1 1 1 0 1 0 1 0 1 0 ...
##  $ buttermilk.pie01   : int  0 0 0 0 0 0 0 0 1 1 ...
##  $ cherry.pie01       : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ chocolate.pie01    : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ coconut.pie01      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ keylime.pie01      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ peach.pie01        : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ pecan.pie01        : int  0 0 1 1 0 0 0 0 0 0 ...
##  $ pumpkin.pie01      : int  0 1 1 1 1 0 1 0 1 1 ...
##  $ sweet.potato.pie01 : int  0 0 1 0 0 1 0 0 1 1 ...
##  $ apple.cobbler01    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ blondies01         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ brownies01         : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ carrot.cake01      : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ cheesecake01       : int  1 1 0 0 0 1 0 0 0 0 ...
##  $ cookies01          : int  1 1 1 0 0 0 1 1 1 0 ...
##  $ fudge01            : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ ice.cream01        : int  1 0 1 0 0 0 0 0 0 0 ...
##  $ peach.cobbler01    : int  0 0 0 0 0 0 0 0 0 0 ...

Directions

Option 1: Create

Step 1: Use the variables from the dataset to ask a question about Thanksgiving

Step 2: Make your first attempt at creating a graphic

Step 3: Color palette

Step 4: Brainstorming/Planning

Step 5: Update your plot

Step 6: Final polished plot

Option 2: (CHALLENGE) Recreate

This is pretty tough and requires a fair bit of data wrangling. Here are a few hints to help you along the way.

Hint 1: Group_by and Summarise

  • Find the total number of households who serve each of the menu items, by region
  • Find the total number of houesholds in each region

Hint 2: Mutate

  • Find the proportion of households who serve each of the menu items, by region

Hint 3: Census Statistics

In order to assess what dishes are served “disproportionately” by region, we first need to understand national trends. Thus, we must calculate national values as weighted averages by population distribution in regions. This data comes from https://www.hcup-us.ahrq.gov/figures/nis_figure1_2018.jsp

Here is some data about how people are distributed accross regions:

popDiv<-data.frame(DivName=c("East North Central",
                             "East South Central", 
                             "Middle Atlantic", 
                             "Mountain", 
                             "New England", 
                             "Pacific", 
                             "South Atlantic", 
                             "West North Central", 
                             "West South Central"), 
                   pop=c(46798649, 
                         18931477,
                         41601787,
                         23811346,
                         14757573,
                         52833604,
                         63991523,
                         21179519,
                         39500457))%>%
  mutate(popProp=pop/323405935)

popDiv
##              DivName      pop    popProp
## 1 East North Central 46798649 0.14470560
## 2 East South Central 18931477 0.05853782
## 3    Middle Atlantic 41601787 0.12863644
## 4           Mountain 23811346 0.07362681
## 5        New England 14757573 0.04563173
## 6            Pacific 52833604 0.16336622
## 7     South Atlantic 63991523 0.19786750
## 8 West North Central 21179519 0.06548896
## 9 West South Central 39500457 0.12213894

Hint 5: Differences

  • Take the differences between the region level proportions and the national proportions
  • Find the items that are MOST distributionate (ie have the largest difference)

Hint 6: Summarise to Region/Division Favorites

favorites<-divPie2%>%
  select(DivName, favFlavor, favSide)%>%
  mutate(DivName=paste(DivName, " Division", sep=""))

favorites
## # A tibble: 10 x 3
##    DivName                       favFlavor    favSide     
##    <chr>                         <chr>        <chr>       
##  1 " Division"                   Coconut      Salad       
##  2 "East North Central Division" Pumpkin      Rolls       
##  3 "East South Central Division" Pecan        Mac N Cheese
##  4 "Middle Atlantic Division"    Apple        Squash      
##  5 "Mountain Division"           Pecan        Salad       
##  6 "New England Division"        Apple        Squash      
##  7 "Pacific Division"            Cherry       Salad       
##  8 "South Atlantic Division"     Sweet Potato Mac N Cheese
##  9 "West North Central Division" Pumpkin      Green Beans 
## 10 "West South Central Division" Pecan        Cornbread

Hint 7: Import Map Package

#install.packages("usmap")
library(usmap)
## Warning: package 'usmap' was built under R version 3.6.2
states <- usmap::us_map()

Hint 8: Join to Your Data

  • You’re going to need a way to connect regions/divisions to states. You can use the file that I created from my github.
fips<-read.csv("https://raw.githubusercontent.com/kitadasmalley/FA2020_DataViz/main/data/stateFIPS.csv", 
               header=TRUE)

geoPie<-fips%>%
  left_join(favorites)
## Joining, by = "DivName"
foodStates<-states %>%
  mutate(Name=full)%>%
  left_join(geoPie)
## Joining, by = "Name"

Now use ggplot to create your graphic and polish! Voila!

  • Hint: geom_polygon

DON’T FORGET TO POLISH

This graphic is not complete yet! Thats up to you.