2024-01-29

Foundations: Null and Alternate Hypotheses

Null Hypothesis

- The statement of “no effect”; e.g., mood does not affect performance

- Represented by H0

Alternative Hypothesis

- The statement of “effect” that is undertaken to be proven; e.g., mood affects performance

- Represented by Ha

Understanding P-Value

  • P-value is a standard of scientific research and is widely used throughout many fields

  • P-value acts as a “denotation of significance” and determines whether researchers can reasonably accept or reject the null and alternative hypotheses

  • P-value also indicates with what level of confidence researchers can make these decisions

  • This allows researchers to quantitatively understand outcomes of inquiries into the relationship between variables

Calculating P-Value

First calculate the test stastic using either z-test or t-test
Examples: - One-sample Z-test \[ z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}} \] - Two-sample T-test \[ t = |\frac{{\bar{X}_1 - \bar{X}_2}}{{\sqrt{\frac{{s_1^2}}{{n_1}} + \frac{{s_2^2}}{{n_2}}}}}| \] This determines how mathematically significant the difference between the sample and test population is.

Interpreting test results


- The normal distribution curve helps us interpret test results

- Assuming no effect, the p-value would be directly in the middle

- Any deviation begins to indicate significance

Confidence Levels

- A researcher also sets a desired significance level which is used to accept or reject the null hypothesis

- This bell curve illustrates a 95% confidence interval or CI

- In order to accept an alternative hypothesis under this CI a test statistic would need to place in one of the ends

Z & T tables: Understanding CDF

  • Once you have determined your test statitic, you can compare it to a z or t table
  • These are generated using the CDF formula which is visualized above

Formula for CDF

The formula for CDF is as follows: \(F(x) = P(X \leq x)\)

Where:

- P() represents a function of probability,

- X is a random variable

- and x is the selected value of cumulative probability

Example: Coin Flip

  • Someone might theorize that a person is using a weighted quarter to cheat during a coin flip
  • In order to prove their point they could utilize P-Value testing
  • Sample: (a trusted quarter) as well as generate a null hypothesis and an alternative hypothesis
  • Null hypothesis: there is no difference between the heads/tails ratio of the trusted coin and the weighted coin
  • Alternative hypothesis: the weighted coin lands on one side more in comparison to the regular coin

Example Continued

  • You perform testing and determine that the weighted coin landed on heads 70 times in 100 flips
  • You then determine that this has a P-value of:
## [1] 3.92507e-05
  • Given a significance value of 0.05 you conclude that p < α and thus the result is statistically significant
  • You now have mathematically solid evidence that the coin is being used to cheat

Example Visualized

## [1] 3.92507e-05

R Code Utilized

dnorm_label <- function(x, mean, sd) { dnorm(x, mean = mean, sd = sd) }

mean_value <- 0 sd_value <- 1 set.seed(123)

data <- data.frame(x = seq(-3, 3, length.out = 1000), y = dnorm(seq(-3, 3, length.out = 1000)))

data\(center <- cut(data\)x, breaks = c(-Inf, -1.96, 1.96, Inf), labels = c(“Lower”, “Middle”, “Upper”))

ggplot(data, aes(x = x, y = y)) + geom_line(color = “blue”) + geom_ribbon(data = subset(data, center == “Middle”), aes(ymin = 0, ymax = y), fill = “gray”, alpha = 0.5) + labs(title = “Bell Curve with 95% Confidence Interval”, x = “X-axis”, y = “Density”) + theme_minimal() ggplot(data.frame(x = c(-4, 4)), aes(x = x)) + stat_function(fun = dnorm_label, args = list(mean = mean_value, sd = sd_value), color = “blue”) +

labs(title = “Normal Distribution with P-value Labels”, x = “Standard Deviations from Mean”, y = “Density”) + theme_minimal()’

set.seed(42) data <- rnorm(1000, mean = 0, sd = 1)

sorted_data <- sort(data) cumulative_prob <- seq(0, 1, length.out = length(sorted_data))

cumulative_prob <- cumulative_prob / max(cumulative_prob)

cdf_plot <- plot_ly(x = sorted_data, y = cumulative_prob, type = “scatter”, mode = “lines”, name = “CDF”)

layout <- list( title = “Cumulative Distribution Function (CDF)”, xaxis = list(title = “Data”), yaxis = list(title = “Cumulative Probability”) )

cdf_plot <- cdf_plot %>% layout(layout)

cdf_plot