Research question:
Are collaborations increasing over time? (using a proxy: credited artist count)
Did collaborations differ between older and modern eras?
Research question:
Are collaborations increasing over time? (using a proxy: credited artist count)
Did collaborations differ between older and modern eras?
This deck uses an illustrative dataset with Grammy-style structure:
year: 1980 to 2025
nominee_rank: 1 is the winner, 2–5 are nominees
n_artists: proxy for number of credited artists
popularity: proxy score (for a 3D plot)
era: 1980–1999 vs 2000–2025
| year | nominee_rank | is_winner | era | n_artists | popularity |
|---|---|---|---|---|---|
| 1990 | 1 | TRUE | 1990–1999 | 3 | 77 |
| 1991 | 1 | TRUE | 1990–1999 | 1 | 79 |
| 1992 | 1 | TRUE | 1990–1999 | 1 | 67 |
| 1993 | 1 | TRUE | 1990–1999 | 1 | 55 |
| 1994 | 1 | TRUE | 1990–1999 | 1 | 62 |
| 1995 | 1 | TRUE | 1990–1999 | 1 | 52 |
| 1996 | 1 | TRUE | 1990–1999 | 2 | 69 |
| 1997 | 1 | TRUE | 1990–1999 | 1 | 63 |
| 1998 | 1 | TRUE | 1990–1999 | 2 | 41 |
| 1999 | 1 | TRUE | 1990–1999 | 3 | 53 |
Point estimate for each era mean:
\(\bar{Y}_{old}\) = average credited artists for winners (1980–1999)
\(\bar{Y}_{new}\) = average credited artists for winners (2000–2025)
| era | n_years | mean_artists | sd_artists |
|---|---|---|---|
| 1990–1999 | 10 | 1.600 | 0.843 |
| 2000–2025 | 26 | 2.115 | 0.588 |
Interpretation: These are single-number summaries, but they do not show uncertainty.
Let \(Y_i\) be the proxy credited-artist count for the winning record in year \(x_i\). \[ Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad \varepsilon_i \sim N(0,\sigma^2) \]
Interpretation:
\(\beta_1\) is the average change in credited artist count per year.
If \(\beta_1 > 0\), collaborations increase over time on average.
We test:
\[ H_0:\ \beta_1 = 0 \quad \text{vs} \quad H_A:\ \beta_1 \ne 0 \]
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -24.6394 | 22.2936 | -1.1052 | 0.2768 | -69.9455 | 20.6667 |
| year | 0.0133 | 0.0111 | 1.1937 | 0.2409 | -0.0093 | 0.0358 |
The row for year contains the slope estimate \(\widehat{\beta}_1\)
conf.low / conf.high is a 95% CI for \(\beta_1\)
p.value is the regression p-value for \(H_0:\beta_1=0\)
Compare the mean collaboration proxy for winners:
\[ H_0:\ \mu_{\text{new}} - \mu_{\text{old}} = 0 \quad \text{vs} \quad H_A:\ \mu_{\text{new}} - \mu_{\text{old}} \ne 0 \]
A 95% confidence interval for the difference has the form:
\[ \widehat{\Delta} \pm t_{0.975,\nu}\, SE(\widehat{\Delta}) \]
| estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
|---|---|---|---|---|---|---|---|---|---|
| -0.5154 | 1.6 | 2.1154 | -1.7738 | 0.1004 | 12.5274 | -1.1455 | 0.1147 | Welch Two Sample t-test | two.sided |
Interpretation:
small p-value suggests evidence the era means differ
a CI that does not include 0 suggests a difference at the 5% level
winners <- grammy %>% dplyr::filter(is_winner)
# Fit regression model
lm_fit <- lm(n_artists ~ year, data = winners)
# Plot the trend with a fitted line
library(ggplot2)
ggplot(winners, aes(x = year, y = n_artists)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE) +
labs(
title = "Winners: credited artist count (proxy) over time",
x = "Year",
y = "Credited artists (proxy)"
)
Takeaways:
Point estimates summarize typical values; CIs show uncertainty.
Regression slope estimates a trend and includes a p-value + CI.
A two-sample t-test compares eras and provides inference on the difference in means.