Introduction

The Streaming Analytics Division (SAD) has hired me as their new data analyst. My mission is to uncover whether age group influences people’s preferred streaming platform.

SAD wants to know if certain platforms appeal more to specific age demographics. This analysis will help guide targeted marketing, promotional strategies, and content investment decisions that align with audience preferences.

The following is a document of my analysis.

Step 1: Data Preparation

Loading and exploring the data

streaming_data<-read_xlsx("Streaming Services and Age.xlsx")

View(streaming_data)

str(streaming_data)
## tibble [300 × 2] (S3: tbl_df/tbl/data.frame)
##  $ AgeCat  : chr [1:300] "18–25" "18–25" "18–25" "18–25" ...
##  $ Platform: chr [1:300] "Other" "Hulu" "Netflix" "Netflix" ...
skim(streaming_data)
Data summary
Name streaming_data
Number of rows 300
Number of columns 2
_______________________
Column type frequency:
character 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
AgeCat 0 1 3 5 0 3 0
Platform 0 1 4 7 0 5 0

Finding total counts for Age Category and Platform Preference

count_age_cat<-table(streaming_data$AgeCat)
count_age_cat
## 
## 18–25 26–40   41+ 
##   100   100   100
count_platform<-table(streaming_data$Platform)
count_platform
## 
##  Amazon Disney+    Hulu Netflix   Other 
##      54      61      46     111      28

First code shows us the total counts for Age Category separately.

Second code shows us the total counts for Platform Preference seperately.

Creating a contigency table showing how many people from each age group prefer each platform.

contingency_table<-table(streaming_data$AgeCat,streaming_data$Platform)
contingency_table
##        
##         Amazon Disney+ Hulu Netflix Other
##   18–25      4      22   23      47     4
##   26–40     11      25   16      41     7
##   41+       39      14    7      23    17

Step 2: Visualization

Stack Bar chart: Showing proportions of platform preferences within each age group

stacked_bar<- ggplot(streaming_data, aes(x = Platform, fill = AgeCat)) +
  geom_bar(position = "fill") + 
  labs(
    title = "Proportion of Platform Preferences Within Each Age Group",
    y = "Age Group",
    x = "Platform"
  ) +
  theme_solarized_2()

stacked_bar

Clustered Bar Chart: Showing the counts side by side for each platform across age groups.

clustered_bar<- ggplot(streaming_data, aes(Platform, fill = AgeCat)) +
  geom_bar(position = "dodge") +
  geom_text(
    stat = "count",
    aes(label=after_stat(count)),
    position = position_dodge(width = 0.8),
    vjust=-0.2,
    size = 3) +
  labs(
    title = "Counts For Each Platform Preference Across Age Groups",
    x="Platform",
    y="Counts of People"
  )+
  theme_solarized_2()

clustered_bar

Running a Chi-Square Test of Independence

chi_square_test<-chisq.test(contingency_table)
chi_square_test
## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 68.044, df = 8, p-value = 1.203e-11
  • Chi Square Statistic: 68.044

  • Degrees of Freedom: 8

  • P-Value: 1.203e-11

For our test, we find that our chi-square test of independence reveals a statistically signficant relationship between age and platform preference.

Step 4: Observed, Expected, and Residual Values

We will be examining observed counts, expected counts, and residuals from our chi-square test.

chi_square_test$observed
##        
##         Amazon Disney+ Hulu Netflix Other
##   18–25      4      22   23      47     4
##   26–40     11      25   16      41     7
##   41+       39      14    7      23    17
chi_square_test$expected
##        
##         Amazon  Disney+     Hulu Netflix    Other
##   18–25     18 20.33333 15.33333      37 9.333333
##   26–40     18 20.33333 15.33333      37 9.333333
##   41+       18 20.33333 15.33333      37 9.333333
chi_square_test$residuals
##        
##             Amazon    Disney+       Hulu    Netflix      Other
##   18–25 -3.2998316  0.3696106  1.9578900  1.6439899 -1.7457431
##   26–40 -1.6499158  1.0349098  0.1702513  0.6575959 -0.7637626
##   41+    4.9497475 -1.4045204 -2.1281413 -2.3015858  2.5095057

Identify and discuss patterns that stand out

  • Amazon: There was less than expected 18-25 year old’s (expected: 18 vs observed: 4, residuals: -3.30) and less than expected 26-40 year old’s (expected: 18, observed: 11, residuals:-1.65) who preferred Amazon as their streaming platform. There was more than expected 41+ year old participants (expected: 18, observed: 39, residuals: +4.95) who preferred Amazon as their streaming platform.

  • Disney+: There was a bit more than expected 18-25 year old’s (expected 20.33, observed: 22, residuals: +0.37) and a bit more than expected 26-40 year old’s (expected: 20.33, observed, 25, residuals: +1.03) who preferred Disney+ as their streaming platform. There was a bit less than expected 41+ year old participants (expected: 20.33, observed: 14, residuals: -1.40) who preferred Disney+ as their streaming platform.

  • Hulu: There was a bit more than expected 18-25 year old’s (expected: 15.33 vs observed: 23, residuals: +1.96) and a bit more than expected 26-40 year old’s (expected: 15.33, observed: 16, residuals:-0.17) who preferred Hulu as their streaming platform. There was less than expected 41+ year old participants (expected: 15.33, observed: 7, residuals: -2.13) who preferred Amazon as their streaming platform.

  • Netflix: There was a more than expected 18-25 year old’s (expected: 37 vs observed: 47, residuals: +1.64) and more than expected 26-40 year old’s (expected: 37, observed: 41, residuals:+0.66) who preferred Netflix as their streaming platform. There was less than expected 41+ year old participants (expected: 37, observed: 23, residuals: -2.30) who preferred Netflix as their streaming platform.

  • Other: There was less than expected 18-25 year old’s (expected: 9.33 vs observed: 4, residuals: -1.74) and a less than expected 26-40 year old’s (expected: 9.33, observed: 7, residuals:-0.76) who preferred Other as their streaming platform. There was more than expected 41+ year old participants (expected: 9.33, observed: 17, residuals: +2.51) who preferred Amazon as their streaming platform.

Overall, 18-25 year old’s preferred Netflix, and Hulu more than expected, while they preferred less Amazon than expected. 26-40 year old’s preferred Disney+ and a bit of Netflix more than expected, while they preferred less Amazon and other streaming platforms. 41+ year old’s preferred Amazon more than expected, while they preferred less Netflix and Hulu.

Step 5: Contributions to the Chi-Square Statistic

Cell Contributions

cell_contributions<-((chi_square_test$observed-chi_square_test$expected)^2)/chi_square_test$expected
cell_contributions
##        
##              Amazon     Disney+        Hulu     Netflix       Other
##   18–25 10.88888889  0.13661202  3.83333333  2.70270270  3.04761905
##   26–40  2.72222222  1.07103825  0.02898551  0.43243243  0.58333333
##   41+   24.50000000  1.97267760  4.52898551  5.29729730  6.29761905

Percent Contributions

percent_contributions<- cell_contributions / chi_square_test$statistic *100
percent_contributions
##        
##              Amazon     Disney+        Hulu     Netflix       Other
##   18–25 16.00277665  0.20077087  5.63363056  3.97200744  4.47891125
##   26–40  4.00069416  1.57404361  0.04259834  0.63552119  0.85729161
##   41+   36.00624747  2.89913133  6.65599073  7.78513459  9.25525020

Pheatmap displaying the percentage contribution of each cell to the total chi-square statistic value.

 pheatmap(percent_contributions,
           display_numbers = TRUE,
           cluster_rows = FALSE,
           cluster_cols = FALSE,
           main = "% Contribution to Chi-Square Statistic")

The cell that contribute the most to the overall chi-square statistic is 41+ participant within the Amazon streaming platform. What this suggest about viewing habits among different age groups is that 41+ year old participants prefer Amazon as their streaming platform while 18-25 year old participants and 26-40 year old participants viewing habits are spread out, which suggests that they might not have a preference.

Step 6: Effect Size

cramerV(contingency_table)
## Cramer V 
##   0.3368

Effect size is 0.33 and it is moderate meaning there is a moderate effect size between Age and platform. This shows that there is an influence of certain age groups showing platform preferences.

Step 7: Final Interpretation

The Chi-Square test revealed a significant relationship between age and platform preference, χ²(8, N = 300) = 68.044, p = 1.203e-1. The largest contribution came 41+ year old participants and Amazon platform cells. Cramer’s V = 0.3368, indicates a moderate association. This suggests that older adults prefer Amazon as their streaming platform while younger adults (18-25 and 26-40) prefer Netflix. This can help streaming platform tailor their streaming platform to certain age demographics.