Parametric tests require at least four assumptions:
How to tell if data is normally distributed:
# first, visual inspection:
hist(mtcars$mpg,probability = T)
lines(density(mtcars$mpg),col='red',lwd=2)
qqnorm(mtcars$mpg)
qqline(mtcars$mpg,col="red")
We can also test mathematically:
shapiro.test(mtcars$hp)
##
## Shapiro-Wilk normality test
##
## data: mtcars$hp
## W = 0.93342, p-value = 0.04881
Interpretation: HP is barely “not normal”.
We can transform variables with the SQRT (square-root) transformation. This makes variables more “normal” and allows them to play nice with our equations.
horsepower<-mtcars %>% select(hp) %>% mutate(horsepower_sqrt=sqrt(hp))
head(horsepower)
## hp horsepower_sqrt
## Mazda RX4 110 10.488088
## Mazda RX4 Wag 110 10.488088
## Datsun 710 93 9.643651
## Hornet 4 Drive 110 10.488088
## Hornet Sportabout 175 13.228757
## Valiant 105 10.246951
Notice the difference in these two charts:
hist(horsepower$hp,probability = T)
lines(density(horsepower$hp),col='red',lwd=2)
hist(horsepower$horsepower_sqrt, probability = T)
lines(density(horsepower$horsepower_sqrt),col='red',lwd=2)