This analysis explores hypothesis testing using the Social Media and Entertainment Dataset. Key objectives:
# Compute Engagement Score
data <- data %>%
mutate(Engagement_Score = (`Daily Social Media Time (hrs)` + `Time Spent in Online Communities (hrs)` + `Ad Interaction Count`) / 3)
# Check if column is created
head(data$Engagement_Score, n = 10)
## [1] 9.156667 10.773333 18.583333 10.176667 15.630000 8.866667 17.466667
## [8] 10.906667 4.656667 8.780000
# Create engagement categories (Low: < Median, High: ≥ Median)
median_engagement <- median(data$Engagement_Score, na.rm = TRUE)
data <- data %>%
mutate(Engagement_Level = ifelse(Engagement_Score >= median_engagement, "High", "Low"))
# Count table
gender_engagement_table <- table(data$Gender, data$Engagement_Level)
gender_engagement_table
##
## High Low
## Female 50096 49777
## Male 49785 50117
## Other 50135 50090
# Fisher’s Exact Test with simulated p-value for large data
fisher_result <- fisher.test(
gender_engagement_table,
simulate.p.value = TRUE # Monte Carlo simulation for large tables
)
fisher_result
##
## Fisher's Exact Test for Count Data with simulated p-value (based on
## 2000 replicates)
##
## data: gender_engagement_table
## p-value = 0.3388
## alternative hypothesis: two.sided
Interpretation: - The p-value (0.3208) is greater than 0.05, so we fail to reject H₀. - This means there is no significant difference in engagement levels between genders.
ggplot(data, aes(x = Gender, fill = Engagement_Level)) +
geom_bar(position = "fill") +
labs(title = "Engagement Levels by Gender",
x = "Gender",
y = "Proportion") +
theme_minimal()
Analysis:
Key Findings:
1. Social Media Time by Age Group
Analysis: