Let’s set up and clean the data.
streaming_data <- read_excel("Streaming Services and Age.xlsx")
head(streaming_data)
## # A tibble: 6 × 2
## AgeCat Platform
## <chr> <chr>
## 1 18–25 Other
## 2 18–25 Hulu
## 3 18–25 Netflix
## 4 18–25 Netflix
## 5 18–25 Amazon
## 6 18–25 Netflix
count_age <- table(streaming_data$AgeCat)
count_plat <- table(streaming_data$Platform)
Let’s create a contingency table for each age category per platform of preference.
c_table <- table(streaming_data$AgeCat, streaming_data$Platform)
c_table
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 4 22 23 47 4
## 26–40 11 25 16 41 7
## 41+ 39 14 7 23 17
stacked_bar_chart <- ggplot(streaming_data, aes(x=Platform, fill = AgeCat)) +
geom_bar(position = "fill") +
labs(
title="Stacked Bar Chart Preferences By Age Group",
x = "Platform",
y = "Age Group"
) +
theme_fivethirtyeight()
stacked_bar_chart
clustered_stacked_graph <- ggplot(streaming_data, aes(x=Platform, fill = AgeCat)) +
geom_bar(position = "dodge") +
geom_text(
stat = "count",
aes(label=after_stat(count)),
position = position_dodge(width = 0.8),
vjust=-0.2,
size = 3)
labs(
title="Clustured Bar Chart Preferences By Age Group",
x = "Platform",
y = "AgeCat"
) +
theme_fivethirtyeight()
## NULL
clustered_stacked_graph
Let’s perform a Chi-square of Independence to see whether Age Category and Platform Preference are related.
chi_square <- chisq.test(c_table)
chi_square
##
## Pearson's Chi-squared test
##
## data: c_table
## X-squared = 68.044, df = 8, p-value = 1.203e-11
The Chi-Square statistic (χ² = 68.044), Degrees of freedom (df = 8), and the p-value = 1.203e-11.
Let’s examine Observed counts, Expected counts, and Residuals from the table.
chi_square$observed
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 4 22 23 47 4
## 26–40 11 25 16 41 7
## 41+ 39 14 7 23 17
chi_square$expected
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 18 20.33333 15.33333 37 9.333333
## 26–40 18 20.33333 15.33333 37 9.333333
## 41+ 18 20.33333 15.33333 37 9.333333
chi_square$residuals
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 -3.2998316 0.3696106 1.9578900 1.6439899 -1.7457431
## 26–40 -1.6499158 1.0349098 0.1702513 0.6575959 -0.7637626
## 41+ 4.9497475 -1.4045204 -2.1281413 -2.3015858 2.5095057
The 18-25 age group showed less than the epxected values for Amazon (Obs = 4, Exp = 18, and Res = -3.30) and just a little bit more for Disney+ (Obs = 25, Exp = 20.33, and Res = 0.37), and much more than the expected values for Hulu and Netflix [Hulu ( Obs = 23, Exp = 20, and Res = 1.96), Netflix (Obs = 47, Exp = 37, and Res = 1.64)].
The 26-40 age group showed more than the epxected values for Amazon (Obs = 11, Exp = 18, and Res = -1.65) and just a little bit more for Disney+ (Obs = 25, Exp = 20.33, and Res = 1.03) and Hulu (Obs = 15, Exp = 15.33, and Res = -2.13), and way more than the expected values for Netflix (Obs = 41, Exp = 15.33, and Res = 0.66).
The 41+ age group showed more than the epxected values for Amazon (Obs = 39, Exp = 18, and Res = 4.95) and just a little bit less for Disney+ (Obs = 14, Exp = 20.33, and Res = -1.40) and more for Hulu (Obs = 23, Exp = 15.33, and Res = -2.13), and way less for Netflix (Obs = 23, Exp = 15.33, and Res = -2.30).
cell_cont <- ((chi_square$observed-chi_square$expected)^2)/chi_square$expected
cell_cont
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 10.88888889 0.13661202 3.83333333 2.70270270 3.04761905
## 26–40 2.72222222 1.07103825 0.02898551 0.43243243 0.58333333
## 41+ 24.50000000 1.97267760 4.52898551 5.29729730 6.29761905
perc_cont<- cell_cont / chi_square$statistic *100
perc_cont
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 16.00277665 0.20077087 5.63363056 3.97200744 4.47891125
## 26–40 4.00069416 1.57404361 0.04259834 0.63552119 0.85729161
## 41+ 36.00624747 2.89913133 6.65599073 7.78513459 9.25525020
pheatmap(perc_cont,
display_numbers = TRUE,
cluster_rows = FALSE,
cluster_cols = FALSE,
main = "% Contribution to Chi-Square Statistic")
The cell that contributes the most to the Chi-square statistic is the 41+ (Amazon) while it seems like groups, such as 18-25 and 26-40 do not seem to show a preference when it comes to streaming platforms.
cramerV(c_table)
## Cramer V
## 0.3368
The effect size 0.3368, which is statistically considered to be a moderate effect between AgeCat and Platform variables.
The Chi-Square test revealed a significant relationship between age and platform preference, χ²(8, N = 300) = 68.004, p = 1.203e-11. The largest contributions came from the 41+ age group and Amazon cells. Cramer’s V = 0.37 indicates a weak-to-moderate association. This suggests that older viewers favor Amazon, while younger adults seem to show a preference for streaming platforms such as Netflix like. These insights could help streaming services tailor content and marketing strategies to specific age demographics.