October 1, 2015

Agenda

Introduction

  • Air Force
    • HQ Air Force Materiel Command, Studies and Analyses Division
    • Operations Research Analyst
  • Air Force Institute of Technology
    • Ph.D. in 2015
    • Adjunct Professor
  • Social Media

GR&A

R packages used…

install.packages("package")

library(package)

tidyr


dplyr

data used…



install.packages("EDAWR")

library(EDAWR)


Data sets: cases, storms, tb, iris, a, b

%>% operator…

learn it, love it, leverage it



filter(data, variable == numeric_value)

or

data %>% filter(variable == numeric_value)

%>% operator…

learn it, love it, leverage it

arrange(
        summarize(
                filter(data, variable == "numeric_value"),
                Total = sum(variable)
        ),
        desc(Total)
)
a <- filter(data, variable == "numeric_value")
b <- summarise(a, Total = sum(variable))
c <- arrange(b, desc(Total))
data %>%
        filter(variable == "value") %>%
        summarise(Total = sum(variable)) %>%
        arrange(desc(Total))
Same results but the %>% operator is more efficient and legible

Data Wrangling


because…

"Classroom data are like teddy bears and real data are like a grizzley bear with salmon blood dripping out its mouth."

Jenny Bryan                   

because…

"Classroom data are like teddy bears and real data are like a grizzley bear with salmon blood dripping out its mouth."

Jenny Bryan                   

because…

"Classroom data are like teddy bears and real data are like a grizzley bear with salmon blood dripping out its mouth."

Jenny Bryan                   

because…

"Classroom data are like teddy bears and real data are like a grizzley bear with salmon blood dripping out its mouth."

Jenny Bryan                   

because…

"Classroom data are like teddy bears and real data are like a grizzley bear with salmon blood dripping out its mouth."

Jenny Bryan                   


because…

"Classroom data are like teddy bears and real data are like a grizzley bear with salmon blood dripping out its mouth."

Jenny Bryan                   



Up to 80% of data analysis is spent on the process of cleaning and preparing data.

cf. Wickham, 2014 and Dasu and Johnson, 2003

because…

"Classroom data are like teddy bears and real data are like a grizzley bear with salmon blood dripping out its mouth."

Jenny Bryan                   



Up to 80% of data analysis is spent on the process of cleaning and preparing data.

cf. Wickham, 2014 and Dasu and Johnson, 2003



tidyr & dplyr make 95% of your data wrangling tasks much easier!

tidyr

a package that reshapes the layout of dataframes

tidyr

Primary functions

gather(): transforms data from wide to long


spread(): transforms data from long to wide


separate(): splits a single column into multiple columns


unite(): combines multiple columns into a single column

tidyr

gather()

Transform data from wide to long:

Function:       gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)
Same as:        data %>% gather(key, value, ..., na.rm = FALSE, convert = FALSE)

Arguments:
        data:           data frame
        key:            column name representing new variable
        value:          column name representing variable values
        ...:            names of columns to gather (or not gather)
        na.rm:          option to remove observations with missing values (represented by NAs)
        convert:        if TRUE will automatically convert values to logical, integer, numeric, complex or 
                        factor as appropriate

tidyr

gather()

tidyr

gather()

tidyr

gather()