The Streaming Analytics Division (SAD) has hired me as their new data analyst. My mission is to uncover whether age group influences people’s preferred streaming platform.
SAD wants to know if certain platforms appeal more to specific age demographics. This analysis will help guide targeted marketing, promotional strategies, and content investment decisions that align with audience preferences.
The following is a document of my analysis.
streaming_data<-read_xlsx("Streaming Services and Age.xlsx")
View(streaming_data)
str(streaming_data)
## tibble [300 × 2] (S3: tbl_df/tbl/data.frame)
## $ AgeCat : chr [1:300] "18–25" "18–25" "18–25" "18–25" ...
## $ Platform: chr [1:300] "Other" "Hulu" "Netflix" "Netflix" ...
skim(streaming_data)
| Name | streaming_data |
| Number of rows | 300 |
| Number of columns | 2 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| AgeCat | 0 | 1 | 3 | 5 | 0 | 3 | 0 |
| Platform | 0 | 1 | 4 | 7 | 0 | 5 | 0 |
count_age_cat<-table(streaming_data$AgeCat)
count_age_cat
##
## 18–25 26–40 41+
## 100 100 100
count_platform<-table(streaming_data$Platform)
count_platform
##
## Amazon Disney+ Hulu Netflix Other
## 54 61 46 111 28
First code shows us the total counts for Age Category separately.
Second code shows us the total counts for Platform Preference seperately.
contingency_table<-table(streaming_data$AgeCat,streaming_data$Platform)
contingency_table
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 4 22 23 47 4
## 26–40 11 25 16 41 7
## 41+ 39 14 7 23 17
stacked_bar<- ggplot(streaming_data, aes(x = Platform, fill = AgeCat)) +
geom_bar(position = "fill") +
labs(
title = "Proportion of Platform Preferences Within Each Age Group",
y = "Age Group",
x = "Platform"
) +
theme_solarized_2()
stacked_bar
clustered_bar<- ggplot(streaming_data, aes(Platform, fill = AgeCat)) +
geom_bar(position = "dodge") +
geom_text(
stat = "count",
aes(label=after_stat(count)),
position = position_dodge(width = 0.8),
vjust=-0.2,
size = 3) +
labs(
title = "Counts For Each Platform Preference Across Age Groups",
x="Platform",
y="Counts of People"
)+
theme_solarized_2()
clustered_bar
chi_square_test<-chisq.test(contingency_table)
chi_square_test
##
## Pearson's Chi-squared test
##
## data: contingency_table
## X-squared = 68.044, df = 8, p-value = 1.203e-11
Chi Square Statistic: 68.044
Degrees of Freedom: 8
P-Value: 1.203e-11
For our test, we find that our chi-square test of independence reveals a statistically signficant relationship between age and platform preference.
chi_square_test$observed
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 4 22 23 47 4
## 26–40 11 25 16 41 7
## 41+ 39 14 7 23 17
chi_square_test$expected
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 18 20.33333 15.33333 37 9.333333
## 26–40 18 20.33333 15.33333 37 9.333333
## 41+ 18 20.33333 15.33333 37 9.333333
chi_square_test$residuals
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 -3.2998316 0.3696106 1.9578900 1.6439899 -1.7457431
## 26–40 -1.6499158 1.0349098 0.1702513 0.6575959 -0.7637626
## 41+ 4.9497475 -1.4045204 -2.1281413 -2.3015858 2.5095057
Amazon: There was less than expected 18-25 year old’s (expected: 18 vs observed: 4, residuals: -3.30) and less than expected 26-40 year old’s (expected: 18, observed: 11, residuals:-1.65) who preferred Amazon as their streaming platform. There was more than expected 41+ year old participants (expected: 18, observed: 39, residuals: +4.95) who preferred Amazon as their streaming platform.
Disney+: There was a bit more than expected 18-25 year old’s (expected 20.33, observed: 22, residuals: +0.37) and a bit more than expected 26-40 year old’s (expected: 20.33, observed, 25, residuals: +1.03) who preferred Disney+ as their streaming platform. There was a bit less than expected 41+ year old participants (expected: 20.33, observed: 14, residuals: -1.40) who preferred Disney+ as their streaming platform.
Hulu: There was a bit more than expected 18-25 year old’s (expected: 15.33 vs observed: 23, residuals: +1.96) and a bit more than expected 26-40 year old’s (expected: 15.33, observed: 16, residuals:-0.17) who preferred Hulu as their streaming platform. There was less than expected 41+ year old participants (expected: 15.33, observed: 7, residuals: -2.13) who preferred Amazon as their streaming platform.
Netflix: There was a more than expected 18-25 year old’s (expected: 37 vs observed: 47, residuals: +1.64) and more than expected 26-40 year old’s (expected: 37, observed: 41, residuals:+0.66) who preferred Netflix as their streaming platform. There was less than expected 41+ year old participants (expected: 37, observed: 23, residuals: -2.30) who preferred Netflix as their streaming platform.
Other: There was less than expected 18-25 year old’s (expected: 9.33 vs observed: 4, residuals: -1.74) and a less than expected 26-40 year old’s (expected: 9.33, observed: 7, residuals:-0.76) who preferred Other as their streaming platform. There was more than expected 41+ year old participants (expected: 9.33, observed: 17, residuals: +2.51) who preferred Amazon as their streaming platform.
Overall, 18-25 year old’s preferred Netflix, and Hulu more than expected, while they preferred less Amazon than expected. 26-40 year old’s preferred Disney+ and a bit of Netflix more than expected, while they preferred less Amazon and other streaming platforms. 41+ year old’s preferred Amazon more than expected, while they preferred less Netflix and Hulu.
cell_contributions<-((chi_square_test$observed-chi_square_test$expected)^2)/chi_square_test$expected
cell_contributions
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 10.88888889 0.13661202 3.83333333 2.70270270 3.04761905
## 26–40 2.72222222 1.07103825 0.02898551 0.43243243 0.58333333
## 41+ 24.50000000 1.97267760 4.52898551 5.29729730 6.29761905
percent_contributions<- cell_contributions / chi_square_test$statistic *100
percent_contributions
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 16.00277665 0.20077087 5.63363056 3.97200744 4.47891125
## 26–40 4.00069416 1.57404361 0.04259834 0.63552119 0.85729161
## 41+ 36.00624747 2.89913133 6.65599073 7.78513459 9.25525020
pheatmap(percent_contributions,
display_numbers = TRUE,
cluster_rows = FALSE,
cluster_cols = FALSE,
main = "% Contribution to Chi-Square Statistic")
The cell that contribute the most to the overall chi-square statistic is 41+ participant within the Amazon streaming platform. What this suggest about viewing habits among different age groups is that 41+ year old participants prefer Amazon as their streaming platform while 18-25 year old participants and 26-40 year old participants viewing habits are spread out, which suggests that they might not have a preference.
cramerV(contingency_table)
## Cramer V
## 0.3368
Effect size is 0.33 and it is moderate meaning there is a moderate effect size between Age and platform. This shows that there is an influence of certain age groups showing platform preferences.
The Chi-Square test revealed a significant relationship between age and platform preference, χ²(8, N = 300) = 68.044, p = 1.203e-1. The largest contribution came 41+ year old participants and Amazon platform cells. Cramer’s V = 0.3368, indicates a moderate association. This suggests that older adults prefer Amazon as their streaming platform while younger adults (18-25 and 26-40) prefer Netflix. This can help streaming platform tailor their streaming platform to certain age demographics.