Hypothesis Testing

3/27/2020

Null and Alternative Hypothesis

\(H_0\) is the null hypothesis (what we are testing)

\(H_a\) is the alternative hypothesis (what is contrary to the null hypothesis)

Example: Test if average scores of exam are 78 (two-sided test)

\(H_0: \mu = 78\)

\(H_a: \mu \neq 78\)

Actual Example

Test if the average speed of the cars from the dataset “cars” is less than 17 mph.

\(H_0: \mu = 17\)

\(H_a: \mu < 17\)

## 
##  One Sample t-test
## 
## data:  cars$speed
## t = -2.1397, df = 49, p-value = 0.01869
## alternative hypothesis: true mean is less than 17
## 95 percent confidence interval:
##     -Inf 16.6537
## sample estimates:
## mean of x 
##      15.4

Code for Previous Output

data("cars")
h <- t.test(cars$speed, mu = 17, alternative = "less")
h

Interpretation of the Results

From the previous example, we saw the p-value was 0.01869 which is less than our \(\alpha = 0.05\) so we can reject the null hypothesis, \(H_0: \mu = 17\). We conclude that the true average speed of the cars is less than 17 mph.

Check for Normality

What we should have done before conducting a hypothesis test is check to see if the data is normal. We can confirm this using a plot.

Code and Interpretation

data("cars")
ggqqplot(cars$speed, ylab = "Speed of Cars")

The plot shows the data to be within a normal distribution so the hypothesis testing would be able to proceed.

Visualize Normal Plot

We can also visualize the results of the t-test we calculated with cars data.

Code and Interpretation

data("cars")
ggttest(t.test(cars$speed, mu = 17, alternative = "less"))

We see that the plot is a normal distribution density curve that displays our test statistic within the rejection area of -1.645 which once agains shows that we can reject \(H_0: \mu = 17\).

Plotly

If we wanted to test the correlation between speed and distance of the cars we could visualize that in plotly to get an idea.

Correlation Hypothesis Test

From the plotly, we can see some correlation, but to know for sure we can use a hypothesis test to see if the true correlation between speed and distance exists:

\(H_0: r = 0\) and \(H_a: r \neq 0\)

Using Pearson’s product-moment correlation test in R we get:

## 
##  Pearson's product-moment correlation
## 
## data:  cars$speed and cars$dist
## t = 9.464, df = 48, p-value = 1.49e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6816422 0.8862036
## sample estimates:
##       cor 
## 0.8068949

Code and Interpretation

cor.test(cars$speed, cars$dist, method = "pearson")

From the test we see that the p-value is significantly less than \(\alpha = 0.05\) so we can conclude that speed and distance are significantly correlated with a value of 0.8068949.

Of course correlation in general is the formula:

\(r = {\sum(x-\bar{x})(y-\bar{y})\over(n-1)s_xs_y}\)