library(readxl)
library(skimr)
library(dplyr)
streaming_and_age<- read_xlsx("Streaming Services and Age.xlsx")
View(streaming_and_age)
age_count<- streaming_and_age %>% group_by(AgeCat) %>% summarize(count=n())
age_count
## # A tibble: 3 × 2
## AgeCat count
## <chr> <int>
## 1 18–25 100
## 2 26–40 100
## 3 41+ 100
platform_count<- streaming_and_age %>% group_by(Platform) %>% summarize(count=n())
platform_count
## # A tibble: 5 × 2
## Platform count
## <chr> <int>
## 1 Amazon 54
## 2 Disney+ 61
## 3 Hulu 46
## 4 Netflix 111
## 5 Other 28
cont_table<- table(streaming_and_age$AgeCat, streaming_and_age$Platform)
cont_table
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 4 22 23 47 4
## 26–40 11 25 16 41 7
## 41+ 39 14 7 23 17
library(ggplot2)
library(ggthemes)
grouped_streaming<- ggplot(streaming_and_age, aes(x = AgeCat, fill = Platform)) +
geom_bar(position = "fill") +
labs(
title = "Proportion of Platform Preferences by Age Group",
y = "Proportion of Platforms",
x = "Age"
) +
theme_economist()
grouped_streaming
clustered_streaming<- ggplot(streaming_and_age, aes(x = AgeCat, fill = Platform)) +
geom_bar(position = "dodge") +
geom_text(
stat = "count",
aes(label = after_stat(count)),
position = position_dodge(width = 0.9),
vjust = -0.3,
size = 4
) +
labs(
title = "Percentage of Age Groups with Streaming Platforms",
x = "Age",
y = "Count of Platform",
fill = "Streaming Platform"
) +
theme_solarized()
clustered_streaming
chisq.test(cont_table)
##
## Pearson's Chi-squared test
##
## data: cont_table
## X-squared = 68.044, df = 8, p-value = 1.203e-11
streaming_chi_test<- chisq.test(cont_table)
streaming_chi_test
##
## Pearson's Chi-squared test
##
## data: cont_table
## X-squared = 68.044, df = 8, p-value = 1.203e-11
The Chi-Square test statistic is 68.044. The Degrees of Freedom is 8, and the p-value is less than 0.05, making it a statistically significant relationship between age and platform.
streaming_chi_test$observed
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 4 22 23 47 4
## 26–40 11 25 16 41 7
## 41+ 39 14 7 23 17
streaming_chi_test$expected
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 18 20.33333 15.33333 37 9.333333
## 26–40 18 20.33333 15.33333 37 9.333333
## 41+ 18 20.33333 15.33333 37 9.333333
streaming_chi_test$residuals
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 -3.2998316 0.3696106 1.9578900 1.6439899 -1.7457431
## 26–40 -1.6499158 1.0349098 0.1702513 0.6575959 -0.7637626
## 41+ 4.9497475 -1.4045204 -2.1281413 -2.3015858 2.5095057
After running 3 tables from the Chi-Square we can see that older viewers prefer Amazon more than expected. The difference between the observed and expected is the largest out of all other combinations.
Other combinations which show more people than expected are 18-25 year olds with Disney+, Hulu, and Netflix. 26-40 year olds with Disney+, Hulu, and Netflix and 41+ year olds with other platforms.
Combinations which show fewer people than expected are 18-25 year olds with Amazon and Other platforms. 26-40 year olds with Amazon and Other platforms as well. 41+ year olds with Disney +, Hulu, and Netflix.
streaming_contributions<- ((streaming_chi_test$observed-streaming_chi_test$expected)^2)/streaming_chi_test$expected
streaming_contributions
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 10.88888889 0.13661202 3.83333333 2.70270270 3.04761905
## 26–40 2.72222222 1.07103825 0.02898551 0.43243243 0.58333333
## 41+ 24.50000000 1.97267760 4.52898551 5.29729730 6.29761905
streaming_percent_con<- streaming_contributions / streaming_chi_test$statistic*100
streaming_percent_con
##
## Amazon Disney+ Hulu Netflix Other
## 18–25 16.00277665 0.20077087 5.63363056 3.97200744 4.47891125
## 26–40 4.00069416 1.57404361 0.04259834 0.63552119 0.85729161
## 41+ 36.00624747 2.89913133 6.65599073 7.78513459 9.25525020
library(pheatmap)
pheatmap(streaming_percent_con,
display_numbers = TRUE,
cluster_rows = FALSE,
cluster_cols = FALSE,
main = "% Contribution to Chi-Square Statistic")
Looking at our pheatmap, we can see that the cells that contributed the most to the overall Chi-Square are the 41+ age group with Amazon, and the 18-25 age group with Amazon as well. This finding could suggest that Amazon is becoming a platform that individuals of all ages are starting to use more than expected.
library(rcompanion)
cramerV(cont_table)
## Cramer V
## 0.3368
After running Cramer’s V, we see that there is a moderate effect size. Conveying that there is a moderate effect between the relationship with Age and Platform preference.
The Chi-Square test shows that there is a significant relationship between age and platform preference, X^2(8, N=300) =68.044. The 2 largest contributions came from the 41+ year old and 18-25 year old age group both with Amazon. The Cramer’s V = 0.3368 which indicates that there is a moderate effect between the relationship with age and platform. These results could mean that all age groups are starting to like/enjoy Amazon a lot, and with these findings, this streaming platform could know how to tailor their content to their all their viewers.