dplyr
–a brief repriseI feel we’ve learned a lot of dplyr
, and hopefully
you’ve been persuaded that this is a good toolkit for the kinds of tasks
we face as bench social scientists–data manipulation before
modelling.
But it’s a little like learning Italian–the lab is just providing
vocabulary lists and some exemplary turns of phrase. But to speak
Italian requires daily conversation. To become conversant in
R
, similarly, requires consistent application.
Just to remind us about the dplyr verbs and adverbs
Verbs | Result |
---|---|
arrange() |
change a table’s row order |
select() |
keep only a subset of columns |
filter() |
keep only a subset of rows |
mutate() |
add a column, normally through computation |
summarise() |
generate a new table, with statistic |
…and an adverb | |
group_by() |
repeat a function for every group of rows |
…and some utilities | |
slice() |
return a subset of rows by position |
rename() |
rename a column |
Let’s do some simple exercises on NBA box scores, from the 1948-49 season to 2022-2023.
library(tidyverse)
library(magrittr)
t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/nba_games_48_22.rds" %>%
url %>%
readRDS
Generate a table or figure which reports the number of regular season games by season
The 3pt line was introduced in the 79-80 season. Report the number of 3pt field goals attempted, and non-3pt field goals attempted, by season, for every year after the 3pt line’s introduction.
Report the 5 longest within-season winning sequences.