What is hypothesis testing?

Hypothesis testing is used to help make a decisions or idea about a given population parameter utilizing two hypothesis, a null hypothesis and a alternative hypothesis. It also uses a p-value and a significance level. The p-value which is maybe 0.01, 0.10 and 0.05 is compared to the significance level. Reject the null hypothesis if p-value is less than significance level. Fail to reject the null hypothesis if the p-value is equal to or greater than the significance level.

There are different types of hypothesis testing. Some of these tests are the Z-tests, T-tests, Chi-square and ANOVA. Z-testing and T-testing will be used for this presentation.

Hyopthesis testing with Z-tests

The Z-tests is used when the population variance is known. It is also used when the sample size is large. There are three kinds of Z-tests

One-Sample Z-test: Compares a mean of a sample to a known population mean. Used to determine if the sample mean differs form the population mean or hypothesized mean.

Two-Sample Z-test: Compares two sample and determine if there are difference between them.

Proportion Z-test: Compares proportion of a sample characteristic to a known population portion. Used to evaluate if the proportion of the sample differs from expectation.

One-sample Z-test equation

\[ \text{One-Sample Z-test} \\ Z = \frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{n}}} \\ \bar{x} \text{ = mean of sample} \\ \mu_0 \text{ = mean of population} \\ \sigma \text{ = standard deviation of population} \\ \text{n = number of observation} \\ \]

Example Z-test with one-sample Z-test

A theme park claims that the average top speed for their new roller coast is 90 mph. A roller coaster enthusiast runs a simulation using the likeness of the roller coast with 30 test runs. The average top speed achieved was 70 mph with a standard deviation of 30mph. At a 95% confidence interval determine if the theme park was wrong in stating that the average top speed is 90. \[ \tiny{ \hspace{5cm}\text{One-Sample Z-test} \\ H_0: \mu = 90 \hspace{0.5cm}H_1:\mu \neq 90 \hspace{1cm} Z = \frac{70 - 90}{\frac{30}{\sqrt{30}}} = -3.65 \hspace{0.5cm} \text{p-value = 0.000131}\\ \hspace{4.5cm}\text{Reject the null hypothesis} } \]

Hypthesis testing with t and two-sample t-test equation

The T-test is used when the population variance is known. It is also used when the sample size is small. It also has three kinds of t-tests. The t-test has one sample and two sample t-tests which are similar to the z-test version but differs in sample size or whether population variance is known.

Paired t-test: A test where the same or related samples are being tested on but with different condition in order to determine if there are any significant difference between them. \[ \small{ \hspace{8cm} t = \frac{\bar{x_1}-\bar{x_2}}{\sqrt{\frac{S^2_1}{n_1} + \frac{S^2_2}{n_2}}} \\ \bar{x_1}, \bar{x_2} = \text{observed mean of first sample and second sample respectivaly} \\ S^2_1, S^2_2 = \text{standard deviation of 1st sample and second sample} \\ n_1, n_2 = \text{sample size of 1st sample and 2nd sample} } \]

Setting up two-sample t-test

Two fisherman likes to fish. One day they wondered if there was a differences in the fish size between the two lakes they frequented. For 7 days, they randomly caught a fish and measured the size.

Solving two-sample t-test

\[ \tiny{ \hspace{5cm}H_0: L_1 = L_2 \hspace{0.5cm}H_1:L1 \neq L_2 \\\ \bar{x_1} = 6.11 \hspace{1cm} s_1 = 0.27 \hspace{2cm} \bar{x_2} = 6.03 \hspace{1cm} s_2 = 0.41 \hspace{2cm} n_1 = n_2 = 7 \\ \hspace{3cm}t = \frac{6.11 - 6.03}{\sqrt{\frac{0.27}{7} + \frac{0.41}{7}}} = 0.26 \hspace{2cm}\text{p-value = 0.40}\\ \hspace{1.5cm} \alpha = 0.01 \hspace{2cm} 0.40 > 0.01 \hspace{2cm} \text{fail to reject} } \]

ggplot(data.frame(x = c(-3,3)), aes(x))+stat_function(fun=dnorm)+
  stat_function(fun = dnorm, xlim = c(-3, -1.96),geom = "area",fill = "red")+
  stat_function(fun = dnorm, xlim = c(1.96, 3), geom = "area", fill = "red")+
  annotate("point",x=0.26, y=dnorm(-0.26), color="green")