dplyr–a brief repriseI feel we’ve learned a lot of dplyr, and hopefully
you’ve been persuaded that this is a good toolkit for the kinds of tasks
we face as bench social scientists–data manipulation before
modelling.
But it’s a little like learning Italian–the lab is just providing
vocabulary lists and some exemplary turns of phrase. But to speak
Italian requires daily conversation. To become conversant in
R, similarly, requires consistent application.
Just to remind us about the dplyr verbs and adverbs
| Verbs | Result |
|---|---|
arrange() |
change a table’s row order |
select() |
keep only a subset of columns |
filter() |
keep only a subset of rows |
mutate() |
add a column, normally through computation |
summarise() |
generate a new table, with statistic |
| …and an adverb | |
group_by() |
repeat a function for every group of rows |
| …and some utilities | |
slice() |
return a subset of rows by position |
rename() |
rename a column |
Let’s do some simple exercises on NBA box scores, from the 1948-49 season to 2022-2023.
library(tidyverse)
library(magrittr)
t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/nba_games_48_22.rds" %>%
url %>%
readRDS
Generate a table or figure which reports the number of regular season games by season
The 3pt line was introduced in the 79-80 season. Report the number of 3pt field goals attempted, and non-3pt field goals attempted, by season, for every year after the 3pt line’s introduction.
Report the 5 longest within-season winning sequences.