Introduction:

We will be examining if age is related to preferred streaming platforms (such as Netflix, Hulu, Disney and Amazon) using a sample of 300 people (N = 300). We ran a Chi-Square test of independence, used residual tables, and broke down cell contributions to the χ² statistic with a heatmap. Last we reported Cramer’s V as effect size. Our goal is to pinpoint which age groups favor which platforms and use that insight to inform marketing and content planning.

streaming_data<-read_xlsx("Streaming Services and Age.xlsx")

View(streaming_data)

str(streaming_data)
## tibble [300 × 2] (S3: tbl_df/tbl/data.frame)
##  $ AgeCat  : chr [1:300] "18–25" "18–25" "18–25" "18–25" ...
##  $ Platform: chr [1:300] "Other" "Hulu" "Netflix" "Netflix" ...
skim(streaming_data)
Data summary
Name streaming_data
Number of rows 300
Number of columns 2
_______________________
Column type frequency:
character 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
AgeCat 0 1 3 5 0 3 0
Platform 0 1 4 7 0 5 0
count_age_cat<-table(streaming_data$AgeCat)
count_age_cat
## 
## 18–25 26–40   41+ 
##   100   100   100
count_platform<-table(streaming_data$Platform)
count_platform
## 
##  Amazon Disney+    Hulu Netflix   Other 
##      54      61      46     111      28
contingency_table<-table(streaming_data$AgeCat,streaming_data$Platform)
contingency_table
##        
##         Amazon Disney+ Hulu Netflix Other
##   18–25      4      22   23      47     4
##   26–40     11      25   16      41     7
##   41+       39      14    7      23    17
stacked_bar<- ggplot(streaming_data, aes(x = Platform, fill = AgeCat)) +
  geom_bar(position = "fill") + 
  labs(
    title = "Streaming Platform Preferance Proportion",
    y = "Age",
    x = "Streaming Platform"
  ) +
  theme_solarized_2()

stacked_bar

clustered_bar<- ggplot(streaming_data, aes(Platform, fill = AgeCat)) +
  geom_bar(position = "dodge") +
  geom_text(
    stat = "count",
    aes(label=after_stat(count)),
    position = position_dodge(width = 0.8),
    vjust=-0.2,
    size = 3) +
  labs(
    title = "Streaming Platform Preferance by Age",
    x="Streaming Platform",
    y="Number of People"
  )+
  theme_solarized_2()

clustered_bar

chi_square_test<-chisq.test(contingency_table)
chi_square_test
## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 68.044, df = 8, p-value = 1.203e-11

This is Significant.

chi_square_test$observed
##        
##         Amazon Disney+ Hulu Netflix Other
##   18–25      4      22   23      47     4
##   26–40     11      25   16      41     7
##   41+       39      14    7      23    17
chi_square_test$expected
##        
##         Amazon  Disney+     Hulu Netflix    Other
##   18–25     18 20.33333 15.33333      37 9.333333
##   26–40     18 20.33333 15.33333      37 9.333333
##   41+       18 20.33333 15.33333      37 9.333333
chi_square_test$residuals
##        
##             Amazon    Disney+       Hulu    Netflix      Other
##   18–25 -3.2998316  0.3696106  1.9578900  1.6439899 -1.7457431
##   26–40 -1.6499158  1.0349098  0.1702513  0.6575959 -0.7637626
##   41+    4.9497475 -1.4045204 -2.1281413 -2.3015858  2.5095057

The 18–25 year old age bracket and 26–40 year old age bracket were much less likely to choose Amazon for a streaming service, than the age 41 plus group. Netflix was preferred more by younger (18–25) and middle-aged (26–40) adults, but less by those 41 plus. Hulu showed a similar pattern, with higher preference among 18–25-year-olds and lower preference among older adults. Disney was slightly more popular among younger and middle-aged adults, but less popular among the oldest group.

cell_contributions<-((chi_square_test$observed-chi_square_test$expected)^2)/chi_square_test$expected
cell_contributions
##        
##              Amazon     Disney+        Hulu     Netflix       Other
##   18–25 10.88888889  0.13661202  3.83333333  2.70270270  3.04761905
##   26–40  2.72222222  1.07103825  0.02898551  0.43243243  0.58333333
##   41+   24.50000000  1.97267760  4.52898551  5.29729730  6.29761905
percent_contributions<- cell_contributions / chi_square_test$statistic *100
percent_contributions
##        
##              Amazon     Disney+        Hulu     Netflix       Other
##   18–25 16.00277665  0.20077087  5.63363056  3.97200744  4.47891125
##   26–40  4.00069416  1.57404361  0.04259834  0.63552119  0.85729161
##   41+   36.00624747  2.89913133  6.65599073  7.78513459  9.25525020
 pheatmap(percent_contributions,
           display_numbers = TRUE,
           cluster_rows = FALSE,
           cluster_cols = FALSE,
           main = "% Contribution to Chi-Square Statistic")

This heatmap shows that most people in the 41 plus age bracket watch Amazon more than any other age bracket, and other age brackets (18-25 and 26-40) watch from a variety of different streaming platforms, including Netflix, Disney, Hulu and other.

cramerV(contingency_table)
## Cramer V 
##   0.3368

Interpretation The Chi-Square test gave us the results of χ²(8, N = 300) = 68.04, p = .12. This shows association was not statistically significant for age and streaming platform preference,. The 41 plus age bracket preference for Amazon, and Cramer’s V = 0.34 shows a moderate association. Overall, older adults tended to prefer Amazon, while younger adults (18–25 and 26–40) favored Netflix, and then other platforms including Disney and Hulu showing age-based patterns despite the non-significant test result.