dplyr–a brief reprise

I feel we’ve learned a lot of dplyr, and hopefully you’ve been persuaded that this is a good toolkit for the kinds of tasks we face as bench social scientists–data manipulation before modelling.

But it’s a little like learning Italian–the lab is just providing vocabulary lists and some exemplary turns of phrase. But to speak Italian requires daily conversation. To become conversant in R, similarly, requires consistent application.

Just to remind us about the dplyr verbs and adverbs

Verbs Result
arrange() change a table’s row order
select() keep only a subset of columns
filter() keep only a subset of rows
mutate() add a column, normally through computation
summarise() generate a new table, with statistic
and an adverb
group_by() repeat a function for every group of rows
…and some utilities
slice() return a subset of rows by position
rename() rename a column

Let’s do some simple exercises on NBA box scores, from the 1948-49 season to 2022-2023.

library(tidyverse)
library(magrittr)


t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/nba_games_48_22.rds" %>%
  url %>%
  readRDS
  1. Generate a table or figure which reports the number of regular season games by season

  2. The 3pt line was introduced in the 79-80 season. Report the number of 3pt field goals attempted, and non-3pt field goals attempted, by season, for every year after the 3pt line’s introduction.

  3. Report the 5 longest within-season winning sequences.