Data Dive Week 13

Tasks To Be Performed

Selecting a week to critique
Analyzing issues
Providing solutions

The week we have selected is Week 7

Group Members - Prerana, Sharmitha Yazhini, Sai Dhanush Doddapaneni, Chaitanya

Week 7 critiques

Loading Libraries and Data

Critique 1 : No information on how to select a proper alpha level and power level

\[ H_0: \text{Average daily revenue remains equal for both variations of advertisement.} \]

For the above mentioned null hypothesis (taken from the notebook), while selecting an alpha value we need to consider the following:

Alpha level represents the probability of making a Type I error. A smaller alpha is more stringent and requires stronger evidence to reject the null hypothesis. If the impact of a false positive is high, we need to choose a smaller alpha to minimize the risk. If we have a large sample, we can opt for a smaller alpha value.

Since the size of the data is small (40 entries), we should consider taking alpha value of 0.05

Power refers to the study’s ability to find a difference. It is denoted as 1 - β, where β is the probability of failing to reject a null hypothesis when the null hypothesis is wrong. Since the higher the power, the better the ability to detect effects. So, it was appropriate to assume power to be 0.85.

Critique 2: No mention of what test to use for performing a Neyman-Pearson hypothesis test

Although the testing process was mentioned, there were no clear steps on ‘how’ to perform a Neyman-Pearson hypothesis test.

Steps:

Check to see if we can perform a Neyman-Pearson hypothesis test
Distribution of the random variable represented in the null hypothesis
Calculate “Critical Value”
Calculate “Delta Value”
Perform the test statistic and interpret the results

In the notebook, the power was given as 0.85 and alpha value was 0.1

test <- pwrss.t.2means(mu1 = 100, 
                       sd1 = sd(pluck(marketing, "revenue")),
                       kappa = 1,
                       power = 0.85, alpha = 0.1, 
                       alternative = "not equal")

##  Difference between Two means 
##  (Independent Samples t Test) 
##  H0: mu1 = mu2 
##  HA: mu1 != mu2 
##  ------------------------------ 
##   Statistical power = 0.85 
##   n1 = 34 
##   n2 = 34 
##  ------------------------------ 
##  Alternative = "not equal" 
##  Degrees of freedom = 66 
##  Non-centrality parameter = 2.733 
##  Type I error rate = 0.1 
##  Type II error rate = 0.15

plot(test)

avg_revenue <- marketing|>
  filter(!(is.na(revenue)))|>
    group_by(display) |>
      summarise(avg_revenue = mean(revenue,na.rm=TRUE), size=n())
avg_revenue

## # A tibble: 2 × 3
##   display avg_revenue  size
##     <dbl>       <dbl> <int>
## 1       0        222.    20
## 2       1        315.    20

According to the Independent Samples T-Test, we need at least 34 samples in each group to perform the test. Since, we have 20 samples in each category, we cannot perform a Neyman Pearson test with respect to the chosen alpha and power values.

Since the observations are very less, it will be more appropriate to take a bigger alpha value. Hence we have taken the alpha value as 0.4

test <- pwrss.t.2means(mu1 = 100, 
                       sd1 = sd(pluck(marketing, "revenue")),
                       kappa = 1,
                       power = 0.85, alpha = 0.4, 
                       alternative = "not equal")

##  Difference between Two means 
##  (Independent Samples t Test) 
##  H0: mu1 = mu2 
##  HA: mu1 != mu2 
##  ------------------------------ 
##   Statistical power = 0.85 
##   n1 = 17 
##   n2 = 17 
##  ------------------------------ 
##  Alternative = "not equal" 
##  Degrees of freedom = 32 
##  Non-centrality parameter = 1.932 
##  Type I error rate = 0.4 
##  Type II error rate = 0.15

plot(test)

Since each of the categories have sufficient count of data. We can now implement a Neyman-Pearson hypothesis test.

Alternative Hypothesis

\[ H_A: \text{Average daily revenue is not same for both variations of advertisement.} \]

Observed Difference:

observed_diff <- (avg_revenue$avg_revenue[1] - avg_revenue$avg_revenue[2])
paste("Observed Difference: ", observed_diff)

## [1] "Observed Difference:  -93.59"

Critical Value

alpha <- 0.4  # Significance level
critical_value <- qnorm(1 - alpha/2)
critical_value

## [1] 0.8416212

Delta Value

effect_size = cohen.d(d = filter(marketing, display == 0) |> pluck("revenue"),
                      f = filter(marketing, display == 1) |> pluck("revenue"))
effect_size

## 
## Cohen's d
## 
## d estimate: -0.6449585 (medium)
## 95 percent confidence interval:
##       lower       upper 
## -1.30156050  0.01164352

Performing T Test

sample_display_0 <- marketing$revenue[marketing$display == 0]
sample_display_1 <- marketing$revenue[marketing$display == 1]

t_test_result <- t.test(sample_display_0, sample_display_1)

# Print t-test results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  sample_display_0 and sample_display_1
## t = -2.0395, df = 37.646, p-value = 0.04846
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -186.5137628   -0.6662372
## sample estimates:
## mean of x mean of y 
##    221.76    315.35

Interpretation:

We have a p-value of 0.05 which is less than alpha value 0.4, we thereby reject the Null Hypothesis. So the average revenue is different for the two types of displays.

Critique 3: Which test statistic to use while doing a hypothesis test

The observed test statistic is a numerical value calculated from the sample data in a hypothesis test. It is used to make a decision about whether to reject or fail to reject the null hypothesis. The choice of the test statistic depends on the type of hypothesis test being conducted.

the test statistic measures the difference between what is observed in the sample and what is expected under the null hypothesis.

The larger the test statistic (in absolute value), the stronger the evidence against the null hypothesis.

In this notebook, we have used a t-test because of the small sample size.

The other test, that we can consider is “Z-Test” which is used in case of a large sample size.

Critique 4: No illustration for a two tailed test

Below is the graphical representation of a two tailed test

delta <- -0.6449585
critical_value <- 0.8416212

f_0 <- function(x) dnorm(x, mean = 0)
f_a <- function(x) dnorm(x, mean = delta)

ggplot() +
   stat_function(mapping = aes(fill = 'power'),
                fun = f_a, 
                xlim = c(critical_value, 4),
                geom = "area") +
    stat_function(mapping = aes(fill = 'alpha'),
                fun = f_0, 
                xlim = c(critical_value, 4),
                geom = "area") +
  geom_function(mapping = aes(color = 'Null Hypothesis'),xlim = c(-4, 4), fun = f_0) +
  geom_function(mapping = aes(color = 'Alternative Hypothesis'),xlim = c(-4, 4), fun = f_a,linetype=2) +
  geom_vline(mapping = aes(xintercept = critical_value,color = "Critical Value")) +
  geom_vline(mapping = aes(xintercept = -critical_value,color = "Critical Value"), linetype=2) +
  geom_vline(mapping = aes(xintercept = 0),color = 'gray', linetype=2) +

  labs(title = "Two Tailed Test Illustration") +
  scale_x_continuous(breaks = seq(-4, 4, 1)) +
  scale_fill_manual(values = c('lightblue', 'yellow')) +
  scale_color_manual(values = c('darkred', 'darkorange', 
                                'darkgreen')) +
  theme_minimal()

The green solid line represents the null hypothesis H₀ distribution with a mean of 0.
The red dashed line represents the alternative hypothesis H₁ distribution with a mean of δ.
The critical values are marked with yellow vertical lines.