Testing an assumption regarding a population
Null Hypothesis assumes no affect
Alternative Hypothesis suggests an affect
## ## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2': ## ## last_plot
## The following object is masked from 'package:stats': ## ## filter
## The following object is masked from 'package:graphics': ## ## layout
Testing an assumption regarding a population
Null Hypothesis assumes no affect
Alternative Hypothesis suggests an affect
Used to determine if a relationship is statistically significant.
Requires more than 30 data points
Formula to determine Z score
z = ( x– μ ) / (σ /√n)
x = What is being evaluated
μ = The mean
σ = Standard deviation
n = Sample size
Let’s assume for our null hypothesis that the average height form men in the U.S. is 71 inches. After measuring 100 men, the measured average height is 70 inches, with a standard deviation of 2 inches. Using our z test:
(70 - 71) / (2 / √100) = -5
This z-score is negative, and not within 0.05, the average height of men is likely smaller that 71 inches, therefore the hypothesis is rejected.
dnormTwo = function(x){ twoside = dnorm(x) twoside[x <= -2 | x >= 2] = NA return(twoside)
}
g = ggplot(data.frame(x = c(-6, 6)), aes(x = x)) + stat_function(fun = dnorm) + stat_function(fun = dnormT, geom = “area”, fill = “green”, alpha = 0.3) g + geom_vline(xintercept = 1.96) + geom_vline(xintercept = -1.96) + geom_vline(xintercept = -5, linetype=“dotted”, color=“red”) + geom_label( label=“Value inside of rejection region”, x=-3.0, y=0.2, label.padding = unit(0.05, “lines”), label.size = 0.05, color = “black”, fill=“#69b3a2”
)
Using the mtcars data set from R programming we’ll assume the average mpg of the 32 cars made between 1971-72 is 20.1.
The measured mean is 20.09, and the standard deviation is 6.03.
(20.1 - 20.09) / (6.03 / √32) = 0.009
This z-score is within the interval of acceptance, there for the hypothesis is accepted.
Again using the mtcars data set, we will assume the mean horse-power of the cars is 150. The standard deviation is 68.56, the actual mean is 146.7, sample size is 32.
(150 - 147.6) /(68.65 / √32) = 1.97
This falls just inside the rejection zone.