Step 1 – Data Preparation

Let’s set up and clean the data.

streaming_data <- read_excel("Streaming Services and Age.xlsx")
head(streaming_data)
## # A tibble: 6 × 2
##   AgeCat Platform
##   <chr>  <chr>   
## 1 18–25  Other   
## 2 18–25  Hulu    
## 3 18–25  Netflix 
## 4 18–25  Netflix 
## 5 18–25  Amazon  
## 6 18–25  Netflix
count_age <- table(streaming_data$AgeCat)
count_plat <- table(streaming_data$Platform)

Let’s create a contingency table for each age category per platform of preference.

c_table <- table(streaming_data$AgeCat, streaming_data$Platform)
c_table
##        
##         Amazon Disney+ Hulu Netflix Other
##   18–25      4      22   23      47     4
##   26–40     11      25   16      41     7
##   41+       39      14    7      23    17

Step 2 – Visualization

stacked_bar_chart <- ggplot(streaming_data, aes(x=Platform, fill = AgeCat)) +
  geom_bar(position = "fill") +
  labs(
    title="Stacked Bar Chart Preferences By Age Group",
    x = "Platform",
    y = "Age Group"
  ) +
  theme_fivethirtyeight()
stacked_bar_chart

clustered_stacked_graph <- ggplot(streaming_data, aes(x=Platform, fill = AgeCat)) +
  geom_bar(position = "dodge") +
  geom_text(
    stat = "count",
    aes(label=after_stat(count)),
    position = position_dodge(width = 0.8),
    vjust=-0.2,
    size = 3)
  labs(
    title="Clustured Bar Chart Preferences By Age Group",
    x = "Platform",
    y = "AgeCat"
  ) +
    theme_fivethirtyeight()
## NULL
clustered_stacked_graph

Step 3 – Chi-Square Test of Independence

Let’s perform a Chi-square of Independence to see whether Age Category and Platform Preference are related.

chi_square <- chisq.test(c_table)
chi_square
## 
##  Pearson's Chi-squared test
## 
## data:  c_table
## X-squared = 68.044, df = 8, p-value = 1.203e-11

The Chi-Square statistic (χ² = 68.044), Degrees of freedom (df = 8), and the p-value = 1.203e-11.

Step 4 – Observed, Expected, and Residual Values

Let’s examine Observed counts, Expected counts, and Residuals from the table.

chi_square$observed
##        
##         Amazon Disney+ Hulu Netflix Other
##   18–25      4      22   23      47     4
##   26–40     11      25   16      41     7
##   41+       39      14    7      23    17
chi_square$expected
##        
##         Amazon  Disney+     Hulu Netflix    Other
##   18–25     18 20.33333 15.33333      37 9.333333
##   26–40     18 20.33333 15.33333      37 9.333333
##   41+       18 20.33333 15.33333      37 9.333333
chi_square$residuals
##        
##             Amazon    Disney+       Hulu    Netflix      Other
##   18–25 -3.2998316  0.3696106  1.9578900  1.6439899 -1.7457431
##   26–40 -1.6499158  1.0349098  0.1702513  0.6575959 -0.7637626
##   41+    4.9497475 -1.4045204 -2.1281413 -2.3015858  2.5095057

The 18-25 age group showed less than the epxected values for Amazon (Obs = 4, Exp = 18, and Res = -3.30) and just a little bit more for Disney+ (Obs = 25, Exp = 20.33, and Res = 0.37), and much more than the expected values for Hulu and Netflix [Hulu ( Obs = 23, Exp = 20, and Res = 1.96), Netflix (Obs = 47, Exp = 37, and Res = 1.64)].

The 26-40 age group showed more than the epxected values for Amazon (Obs = 11, Exp = 18, and Res = -1.65) and just a little bit more for Disney+ (Obs = 25, Exp = 20.33, and Res = 1.03) and Hulu (Obs = 15, Exp = 15.33, and Res = -2.13), and way more than the expected values for Netflix (Obs = 41, Exp = 15.33, and Res = 0.66).

The 41+ age group showed more than the epxected values for Amazon (Obs = 39, Exp = 18, and Res = 4.95) and just a little bit less for Disney+ (Obs = 14, Exp = 20.33, and Res = -1.40) and more for Hulu (Obs = 23, Exp = 15.33, and Res = -2.13), and way less for Netflix (Obs = 23, Exp = 15.33, and Res = -2.30).

Step 5 – Contributions to the Chi-Square Statistic

cell_cont <- ((chi_square$observed-chi_square$expected)^2)/chi_square$expected
cell_cont
##        
##              Amazon     Disney+        Hulu     Netflix       Other
##   18–25 10.88888889  0.13661202  3.83333333  2.70270270  3.04761905
##   26–40  2.72222222  1.07103825  0.02898551  0.43243243  0.58333333
##   41+   24.50000000  1.97267760  4.52898551  5.29729730  6.29761905
perc_cont<- cell_cont / chi_square$statistic *100
perc_cont
##        
##              Amazon     Disney+        Hulu     Netflix       Other
##   18–25 16.00277665  0.20077087  5.63363056  3.97200744  4.47891125
##   26–40  4.00069416  1.57404361  0.04259834  0.63552119  0.85729161
##   41+   36.00624747  2.89913133  6.65599073  7.78513459  9.25525020
pheatmap(perc_cont,
           display_numbers = TRUE,
           cluster_rows = FALSE,
           cluster_cols = FALSE,
           main = "% Contribution to Chi-Square Statistic")

The cell that contributes the most to the Chi-square statistic is the 41+ (Amazon) while it seems like groups, such as 18-25 and 26-40 do not seem to show a preference when it comes to streaming platforms.

Step 6 – Effect Size (Cramer’s V)

cramerV(c_table)
## Cramer V 
##   0.3368

The effect size 0.3368, which is statistically considered to be a moderate effect between AgeCat and Platform variables.

Step 7 – Final Interpretation

The Chi-Square test revealed a significant relationship between age and platform preference, χ²(8, N = 300) = 68.004, p = 1.203e-11. The largest contributions came from the 41+ age group and Amazon cells. Cramer’s V = 0.37 indicates a weak-to-moderate association. This suggests that older viewers favor Amazon, while younger adults seem to show a preference for streaming platforms such as Netflix like. These insights could help streaming services tailor content and marketing strategies to specific age demographics.