knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(readxl)
library(tidyverse)
library (ggplot2)
library(ggthemes)
tvdata <- read_excel("Streaming Services and Age.xlsx")
count(tvdata,AgeCat)
## # A tibble: 3 × 2
## AgeCat n
## <chr> <int>
## 1 18–25 100
## 2 26–40 100
## 3 41+ 100
count(tvdata,Platform)
## # A tibble: 5 × 2
## Platform n
## <chr> <int>
## 1 Amazon 54
## 2 Disney+ 61
## 3 Hulu 46
## 4 Netflix 111
## 5 Other 28
contingtab <- table(tvdata$AgeCat, tvdata$Platform)
contingtab
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 4 22 23 47 4
## 26–40 11 25 16 41 7
## 41+ 39 14 7 23 17
stacked<- ggplot(tvdata, aes(x = AgeCat, fill = Platform)) +
geom_bar(position = "fill") +
labs(
title = "Streaming Platform by Age Range",
y = "Streaming Platform Proportion",
x = "Age Range"
) +
theme_fivethirtyeight()
stacked
clustered <- ggplot(tvdata, aes(x = Platform, fill = AgeCat)) +
geom_bar(position = "dodge") +
labs(
title = "Platform Preference by Age Range",
x = "Streaming Platform",
y = "Respondent Number",
fill = "Age Range"
) +
theme_economist()
clustered
chitest <- chisq.test(contingtab)
chitest
##
## Pearson's Chi-squared test
##
## data: contingtab
## X-squared = 68.044, df = 8, p-value = 1.203e-11
A Chi-Square test of independence was conducted to examine the relationship between age range and streaming platform preference. The test produced a Chi-Square statistic of χ²(8,N=300) = 68.04, p <.001. Consequently, we can reject the null hypothesis. There is a statistically significant relationship between age range and streaming platform preference.
observed <- chitest$observed
observed
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 4 22 23 47 4
## 26–40 11 25 16 41 7
## 41+ 39 14 7 23 17
expected <- chitest$expected
expected
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 18 20.33333 15.33333 37 9.333333
## 26–40 18 20.33333 15.33333 37 9.333333
## 41+ 18 20.33333 15.33333 37 9.333333
residuals <- chitest$residuals
residuals
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 -3.2998316 0.3696106 1.9578900 1.6439899 -1.7457431
## 26–40 -1.6499158 1.0349098 0.1702513 0.6575959 -0.7637626
## 41+ 4.9497475 -1.4045204 -2.1281413 -2.3015858 2.5095057
(Note: I am using two standard deviations as the cutoff for “notable deviations”).
For 18 - 25 year olds, far fewer chose Amazon than expected. For 26 - 40 year olds, there were no particularly unusual deviations. For 41+ year olds, substantially more than expected chose Amazon and more than expected chose Other. Fewer than expected chose Hulu and Netflix.
cellcontributions <- (observed - expected)^2 / expected
cellcontributions
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 10.88888889 0.13661202 3.83333333 2.70270270 3.04761905
## 26–40 2.72222222 1.07103825 0.02898551 0.43243243 0.58333333
## 41+ 24.50000000 1.97267760 4.52898551 5.29729730 6.29761905
chi_sq_total <- as.numeric(chitest$statistic)
cellcontrib_pct <- 100 * cellcontributions / chi_sq_total
round(cellcontrib_pct, 1)
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 16.0 0.2 5.6 4.0 4.5
## 26–40 4.0 1.6 0.0 0.6 0.9
## 41+ 36.0 2.9 6.7 7.8 9.3
library(pheatmap)
pheatmap(
cellcontrib_pct,
cluster_rows = FALSE,
cluster_cols = FALSE,
display_numbers = TRUE,
number_format = "%.1f",
main = "Contribution of Each Cell to Total Chi-Square (in Percent)"
)
The cell that contributes the most is Amazon for 41+ year olds. This suggests that Amazon either has more content geared at adults than the other platforms, or that it enables access to a greater variety of adult-oriented subsidiary platforms (PBS Documentaries, Max, etc.) 41+ adults may also use Amazon more frequently since they are more likely to have Amazon Prime accounts already.
n <- sum(contingtab)
chi_sq <- as.numeric(chitest$statistic)
r <- nrow(contingtab)
c <- ncol(contingtab)
cramers_v <- sqrt(chi_sq / (n * (min(r - 1, c - 1))))
cramers_v
## [1] 0.3367584
Cramer’s V = .34, indicating a moderate association between age and platform preference.
The Chi-Square test revealed a significant relationship between age and platform preference χ²(8, N=300) = 68.04, p <.001. The largest contributions came from the 41+ Amazon viewers and 18-25 Amazon viewers. 41+ viewers were far more likely to use Amazon for streaming than expected, while 18-25 viewers were far less likely to. 41+ year olds watching Other also accounted for a substantive amount of the association. Cramer’s V was .34 for the Chi-Square test, indicating a moderate association between age and streaming preference. It is likely that other factors (stylistic preference, cost, etc.) are also important.
In order to better understand the mechanisms behind these deviations, streaming services should conduct further survey research on user preferences. In particular, they should consider the subsidiary streaming services that Amazon prime viewers might be subscribing to through Amazon prime, as well as the exact services within the Other category. It is possible that there is a fair amount of overlap between subsidiary Amazon platforms and Other.