Background

This eCOTS 2018 virtual poster (video) assumes you know

  1. The difference between R and RStudio
  2. How R packages work, are installed, and are loaded

If you’re new to R: ModernDive Chapter 2 Getting Started with Data

Payoffs vs costs ratio

Decision of whether to use R can be viewed in terms of ratio:

\[ \frac{\mbox{Payoffs of using R}}{\mbox{Costs of learning R}} \]

Ratio has increased of late due to many reasons, in particular DataCamp (free academic licence)

Increasing ratio with data

Our proposal to increase ratio: provide data that is

  • Rich, real, and realistic
  • Easily accessible and “tamed” for novices.

A balancing act

In other words, “taming” sets a balance between data

As it exists “in the wild” Completely safe
Drawing Drawing

“Tame” data principles

We propose the following “tame” data principles with novices in mind. All data should have

  1. Clean variable names
  2. ID variables marked
  3. Clean dates
  4. Clean categorical variables
  5. Consistent “tidy” format

fivethirtyeight package

The fivethirtyeight R package:

  • Takes FiveThirtyEight’s raw article data from GitHub
  • “Tames” the raw data
  • Makes data, documentation, and original article easily accessible via an R package

Example usage

Two data visualizations via:

  1. Base R
  2. ggformula package

Code below available at bit.ly/ecots_2018

# Load fivethirtyeight and other needed packages
library(fivethirtyeight)
library(dplyr)
library(ggformula)


# Ex 1: US Births ---------------------------------------------------
View(US_births_1994_2003)
?US_births_1994_2003

# Use filter command from dplyr package for data wrangling
US_births_1999 <- US_births_1994_2003 %>%
  filter(year == 1999)
View(US_births_1999)

# Plot time series via base R:
plot(x = US_births_1999$date, y = US_births_1999$births, type = "l")


# Ex 2: Hate crimes -------------------------------------------------
View(hate_crimes)
?hate_crimes

# Create scatterplot & regression line via ggformula package
gf_point(hate_crimes_per_100k_splc ~ share_vote_trump, data = hate_crimes) %>%
  gf_lm()


# Ex 3: Campaign stops of last 10 weeks of 2016 US election ---------
View(pres_2016_trail)
?pres_2016_trail

# Create map of Clinton vs Trump campaing stops via ggplot2 package and using
# preloaded map_data of US states in maps package
library(ggplot2)
library(maps)
ggplot(data = pres_2016_trail, aes(x = lng, y = lat)) +
  facet_wrap(~candidate) +
  geom_point(col = "black", size = 3) + 
  coord_map() + 
  geom_path(data = map_data("state"), aes(x = long, y = lat, group = group), size = 0.1)

Tips and resources