week_7

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggthemes)

## Warning: package 'ggthemes' was built under R version 4.3.3

library(ggrepel)

## Warning: package 'ggrepel' was built under R version 4.3.3

library(effsize)

## Warning: package 'effsize' was built under R version 4.3.3

library(pwrss)

## Warning: package 'pwrss' was built under R version 4.3.3

## 
## Attaching package: 'pwrss'
## 
## The following object is masked from 'package:stats':
## 
##     power.t.test

data <- read.csv ("C:\\Users\\91630\\OneDrive\\Desktop\\statistics\\age_gaps.CSV")

Hypothesis Testing

Hypothesis I:

Hypothesis 1: Age Difference Affects Movie Success

Null Hypothesis (H0): There is no significant difference between the age difference of romantic couples in movies released before 1990 and movies released after 1990.
Alternative Hypothesis (H1): There is significant difference between the age difference of romantic couples in movies released before 1990 and movies released after 1990.

Statistical assumptions:

Alpha Level or Significance Value:
- Value Chosen: 0.05
- Reason: In many fields, including social sciences and medicine, an alpha level of 0.05 is often used. This is because it provides a balance between minimizing Type I errors (false positives) and maintaining adequate sensitivity to detect effects. Researchers aim to control the risk of falsely concluding that there is an effect when there isn’t one, while also being sensitive enough to detect true effects if they exist.
Power Level:
- Value Chosen: 0.08
- Reason: The choice of power level depends on the consequences of Type II errors (false negatives) and practical constraints such as sample size and resources. In some fields, such as clinical trials or medical research, a higher power level (e.g., 0.80 or higher) is typically preferred to minimize the risk of failing to detect clinically meaningful effects. However, in other areas where data collection may be more challenging or costly, a lower power level may be accepted. A power level of 0.08 suggests that you are willing to tolerate a relatively higher risk of Type II errors, which may be appropriate depending on the specific context of your research.
Minimum Effect Size:
- Value Chosen: 0.20
- Reason: The choice of minimum effect size depends on the practical significance of the findings and the context of the research. In some fields, such as psychology or education, effect sizes of 0.20 or higher are considered small but potentially meaningful. Researchers aim to detect effects that are large enough to have practical implications or real-world significance. Choosing a minimum effect size of 0.20 indicates that you are interested in detecting effects that are larger than trivial but not necessarily large. This threshold aligns with the typical effect sizes observed in many areas of research.

Neyman-Pearson Testing

data$release_year <- as.numeric(data$release_year)

before <- subset(data, release_year < 1990)
after <- subset(data, release_year >= 1990)

avg_age_diff_before <- mean(before$age_difference, na.rm = TRUE)
avg_age_diff_after <- mean(after$age_difference, na.rm = TRUE)

observed_diff <- abs(avg_age_diff_before - avg_age_diff_after)

bootstrap <- function (x, func=mean, n_iter=10^4) {
  # empty vector to be filled with values from each iteration
  func_values <- c(NULL)
  
  # we simulate sampling `n_iter` times
  for (i in 1:n_iter) {
    # pull the sample (e.g., a vector or data frame)
    x_sample <- sample_n(x, size = length(x), replace = TRUE)
    
    # add on this iteration's value to the collection
    func_values <- c(func_values, func(x_sample))
  }
  
  return(func_values)
}

diff_in_avg <- function (x_data) {
  before <- subset(data, release_year < 1999)
after <- subset(data, release_year >= 1999)

avg_age_diff_before <- mean(before$age_difference, na.rm = TRUE)
avg_age_diff_after <- mean(after$age_difference, na.rm = TRUE)
  diff <- abs(avg_age_diff_before - 
           avg_age_diff_after)
  
  return(diff)
}

diffs_in_avgs <- bootstrap(data, diff_in_avg, n_iter = 100)

ggplot() +
  geom_function(xlim = c(-10, 10), 
                fun = function(x) dnorm(x, mean = 0, 
                                        sd = sd(diffs_in_avgs))) +
  geom_vline(mapping = aes(xintercept = observed_diff,
                           color = paste("observed: ",
                                         round(observed_diff)))) +
  labs(title = "Bootstrapped Sampling Distribution of mean Light Difference",
       x = "Observed Mean difference in age",
       y = "Probability Density",
       color = "") +
  scale_x_continuous(breaks = seq(-20, 20, 5)) +
  theme_minimal()

cohen.d(d = after$age_difference,
        f = before$age_difference)

## 
## Cohen's d
## 
## d estimate: -0.6255694 (medium)
## 95 percent confidence interval:
##      lower      upper 
## -0.7912990 -0.4598398

As per the output of the above the above cell, the cohen’s d value/effect size being very small (< 0.2) means that we cannot reject the null hypothesis as the difference between the both the hypothesis is very small. Further testing is required to make a decision.

We are assuming that the acceptable age difference is 8 years

test <- pwrss.t.2means(mu1 = 8, 
                       sd1 = sd(pluck(data, "age_difference")),
                       kappa = 1,
                       power = .80, alpha = 0.05, 
                       alternative = "not equal")

##  Difference between Two means 
##  (Independent Samples t Test) 
##  H0: mu1 = mu2 
##  HA: mu1 != mu2 
##  ------------------------------ 
##   Statistical power = 0.8 
##   n1 = 19 
##   n2 = 19 
##  ------------------------------ 
##  Alternative = "not equal" 
##  Degrees of freedom = 36 
##  Non-centrality parameter = 2.897 
##  Type I error rate = 0.05 
##  Type II error rate = 0.2

plot(test)

Fisher’s Significance Testing

P Value

f_sampling <- function(x) dnorm(x, mean = 0, 
                                sd = sd(diffs_in_avgs))

ggplot() +
  stat_function(mapping = aes(fill = 'more extreme samples'),
                fun = f_sampling, 
                xlim = c(observed_diff, 40),
                geom = "area") +
  stat_function(mapping = aes(fill = 'more extreme samples'),
                fun = f_sampling, 
                xlim = c(-40, -observed_diff),
                geom = "area") +
  geom_function(xlim = c(-40, 40), 
                fun = f_sampling) +
  geom_vline(mapping = aes(xintercept = observed_diff,
                           color = paste("observed: ",
                                         round(observed_diff, 1)))) +
  labs(title = "Bootstrapped Sampling Distribution of Light Differences",
       x = "Difference in Age Difference Calculated",
       y = "Probability Density",
       color = "",
       fill = "") +
  scale_x_continuous(breaks = seq(-40, 40, 10)) +
  scale_fill_manual(values = 'lightblue') +
  theme_minimal()

lm_model <- lm(age_difference ~ release_year, data = data)

se_coef <- summary(lm_model)$coef["release_year", "Std. Error"]

t_statistic <- coef(lm_model)["release_year"] / se_coef

df <- nrow(data) - length(coef(lm_model))

p_value <- 2 * pt(abs(t_statistic), df = df, lower.tail = FALSE)

cat("t-statistic:", t_statistic, "\n")

## t-statistic: -7.086887

cat("p-value:", p_value)

## p-value: 2.383761e-12

The P-Value is <0.5 which means we cannot accept the null hypothesis.

Hypothesis II:

Null Hypothesis (H0): There is no significant difference in the age gap between male-female romantic pairings and same-sex romantic pairings in movies.
Alternative Hypothesis (H1): Male-female romantic pairings have a larger age gap compared to same-sex romantic pairings in movies.

Statistical assumptions:

Alpha Level or Significance Value:
- Value Chosen: 0.05
- Reason: In many fields, including social sciences and medicine, an alpha level of 0.05 is often used. This is because it provides a balance between minimizing Type I errors (false positives) and maintaining adequate sensitivity to detect effects. Researchers aim to control the risk of falsely concluding that there is an effect when there isn’t one, while also being sensitive enough to detect true effects if they exist.
Power Level:
- Value Chosen: 0.08
- Reason: The choice of power level depends on the consequences of Type II errors (false negatives) and practical constraints such as sample size and resources. In some fields, such as clinical trials or medical research, a higher power level (e.g., 0.80 or higher) is typically preferred to minimize the risk of failing to detect clinically meaningful effects. However, in other areas where data collection may be more challenging or costly, a lower power level may be accepted. A power level of 0.08 suggests that you are willing to tolerate a relatively higher risk of Type II errors, which may be appropriate depending on the specific context of your research.
Minimum Effect Size:
- Value Chosen: 0.20
- Reason: The choice of minimum effect size depends on the practical significance of the findings and the context of the research. In some fields, such as psychology or education, effect sizes of 0.20 or higher are considered small but potentially meaningful. Researchers aim to detect effects that are large enough to have practical implications or real-world significance. Choosing a minimum effect size of 0.20 indicates that you are interested in detecting effects that are larger than trivial but not necessarily large. This threshold aligns with the typical effect sizes observed in many areas of research.

Neyman-Pearson Testing

male_female_pairs <- data %>%
  filter(
    (character_1_gender == "man" & character_2_gender == "woman") |
    (character_1_gender == "woman" & character_2_gender == "man")
  )

other_pairs <- data %>%
  filter(
    !(character_1_gender == "man" & character_2_gender == "woman") &
    !(character_1_gender == "woman" & character_2_gender == "man")
  )

avg_male_female_pairs <- mean(before$age_difference, na.rm = TRUE)
avg_other_pairs <- mean(after$age_difference, na.rm = TRUE)

observed_diff <- abs(avg_male_female_pairs - avg_other_pairs)

diff_in_avg <- function (x_data) {
  male_female_pairs <- data %>%
  filter(
    (character_1_gender == "man" & character_2_gender == "woman") |
    (character_1_gender == "woman" & character_2_gender == "man")
  )

other_pairs <- data %>%
  filter(
    !(character_1_gender == "man" & character_2_gender == "woman") &
    !(character_1_gender == "woman" & character_2_gender == "man")
  )

avg_male_female_pairs <- mean(before$age_difference, na.rm = TRUE)
avg_other_pairs <- mean(after$age_difference, na.rm = TRUE)
  
  diff <- abs(avg_male_female_pairs - avg_other_pairs)
  
  return(diff)
}

diffs_in_avgs <- bootstrap(data, diff_in_avg, n_iter = 100)

ggplot() +
  geom_function(xlim = c(-20, 20), 
                fun = function(x) dnorm(x, mean = 0, 
                                        sd = sd(diffs_in_avgs))) +
  geom_vline(mapping = aes(xintercept = observed_diff,
                           color = paste("observed: ",
                                         round(observed_diff)))) +
  labs(title = "Bootstrapped Sampling Distribution of mean Light Difference",
       x = "Difference in Mean of average_age of actors",
       y = "Probability Density",
       color = "") +
  scale_x_continuous(breaks = seq(-20, 20, 5)) +
  theme_minimal()

cohen.d(d = male_female_pairs$age_difference,
        f = other_pairs$age_difference)

## 
## Cohen's d
## 
## d estimate: -0.345523 (small)
## 95 percent confidence interval:
##       lower       upper 
## -0.75900911  0.06796309

Cohen’s d value is -0.34 which is very high, hence we cannot reject the null hypothesis

f_sampling <- function(x) dnorm(x, mean = 0, 
                                sd = sd(diffs_in_avgs))

ggplot() +
  stat_function(mapping = aes(fill = 'more extreme samples'),
                fun = f_sampling, 
                xlim = c(observed_diff, 20),
                geom = "area") +
  stat_function(mapping = aes(fill = 'more extreme samples'),
                fun = f_sampling, 
                xlim = c(-20, -observed_diff),
                geom = "area") +
  geom_function(xlim = c(-20, 20), 
                fun = f_sampling) +
  geom_vline(mapping = aes(xintercept = observed_diff,
                           color = paste("observed: ",
                                         round(observed_diff, 1)))) +
  labs(title = "Bootstrapped Sampling Distribution of Light Differences",
       x = "Difference in Ages",
       y = "Probability Density",
       color = "",
       fill = "") +
  scale_x_continuous(breaks = seq(-20, 20, 5)) +
  scale_fill_manual(values = 'lightblue') +
  theme_minimal()

diffs_in_avgs_d <- diffs_in_avgs - mean(diffs_in_avgs)

# proportion of times the difference is more extreme
paste("p-value ", 
      sum(abs(observed_diff) < abs(diffs_in_avgs_d)) /
        length(diffs_in_avgs_d))

## [1] "p-value  0"

P-Value is < 0.5, we cannot accept the null hypothesis.

week_7

2024-04-09

Hypothesis Testing

Hypothesis I:

Statistical assumptions:

Neyman-Pearson Testing

Fisher’s Significance Testing

P Value

Hypothesis II:

Statistical assumptions:

Neyman-Pearson Testing