Week 7 Review

Parametric tests require at least four assumptions:

  1. Data (or residuals) are normally distributed
  2. Variance is homogenous
  3. Interval data (numbers)
  4. Variables are independent

How to tell if data is normally distributed:

# first, visual inspection:

hist(mtcars$mpg,probability = T)
lines(density(mtcars$mpg),col='red',lwd=2)

qqnorm(mtcars$mpg)
qqline(mtcars$mpg,col="red")

We can also test mathematically:

shapiro.test(mtcars$hp)
## 
##  Shapiro-Wilk normality test
## 
## data:  mtcars$hp
## W = 0.93342, p-value = 0.04881

Interpretation: HP is barely “not normal”.

Transformation of Variables

We can transform variables with the SQRT (square-root) transformation. This makes variables more “normal” and allows them to play nice with our equations.

horsepower<-mtcars %>% select(hp) %>% mutate(horsepower_sqrt=sqrt(hp))

head(horsepower)
##                    hp horsepower_sqrt
## Mazda RX4         110       10.488088
## Mazda RX4 Wag     110       10.488088
## Datsun 710         93        9.643651
## Hornet 4 Drive    110       10.488088
## Hornet Sportabout 175       13.228757
## Valiant           105       10.246951

Notice the difference in these two charts:

hist(horsepower$hp,probability = T)
lines(density(horsepower$hp),col='red',lwd=2)

hist(horsepower$horsepower_sqrt, probability = T)
lines(density(horsepower$horsepower_sqrt),col='red',lwd=2)