https://www.datacamp.com/community/tutorials/data-table-cheat-sheet data.table cheat sheet

two csv files for the training can be found here: https://github.com/rlkirkham/data-table-training

data.table examples

library(data.table)

Read in data, create data.table. Very important!

Must be a data.table to do data.table queries

need to change file path

moviedata <- fread("H:/MOVIEDATA.csv", header=TRUE)
str(moviedata)
### I operation ###
moviedata[GENRE=="Action" | GENRE == "Drama"]
moviedata[GENRE %in% list("Action", "Adventure", "Romance")]
### J Operation###
moviedata[GENRE=="Action", sum(OVERALL)]
moviedata[, .(GENRE, OVERALL)]
### By operation - need to have either a I or J operation with a by operation###
moviedata[, sum(OVERALL), by= GENRE]
moviedata[, sum(OVERALL), by= .(GENRE, GENRE2)]

All three together

moviedata[GENRE=="Action", sum(OVERALL, na.rm=TRUE), by= DECADE]

Using .SD

moviedata[, lapply(.SD, mean), by= GENRE]

Using setkey

setkey(moviedata, GENRE)

Now look at moviedata

Using .SDcols

moviedata[, lapply(.SD, mean), by= GENRE, .SDcols= c("PLOT", "CHARACTER", "SOUND")]

Chaining

moviedata[SCORE > 40, .SD, by= GENRE, .SDcols= c("PLOT", "CHARACTER")][, max(PLOT)]

Answers for practical

will need to change file path

flights <- fread("H:/data.table training/flights14.csv", header=TRUE)

setkey(flights, year, dep_time, tailnum)

flights[carrier=="UA"]

flights[, .N, by= carrier]
flights[, mean(arr_delay), by=carrier]

averagedelay <- flights[, mean(arr_delay), by=carrier]
setnames(averagedelay, "V1", "Mean_delay")

flights[, "total_delay" := arr_delay + dep_delay]

setkey(flights, carrier)
flights["UA"]