Distribution - mtcars project

Let’s take a look at our dataset: mtcars

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

It’s a dataframe with 32 observations and 11 variables.

Let’s work with wt variable which contains weights of cars in tons. First let’s store it in ‘wt’ and then plot an historgram to see what the distribution looks like:

wt <- mtcars$wt
hist(wt, main = "Histogram of Car Weight", xlab = "Car weight (in tons)")

The distribution we get seems to have an outlier on the right side. Let’s check it with a bigger break.

hist(wt, breaks = 15, main = "Histogram of Car Weight", xlab = "Car weight (in tons)")

Yes, there is an obvious outlier after the threshold of 5 tons.

Keep checking wt distribution:

qqnorm(wt); qqline(wt, col = 3)

We can see these 3 data points on the right side of a plot are outliers from the last histogram.

We can actually find them, if we need to:

which(wt >=5)

## [1] 15 16 17

We can see now that 15, 16 and 17 elements are outliers of the distribution.

The last check up: Shapiro Test which will give us p-value of the distribution.

shapiro.test(wt)

## 
##  Shapiro-Wilk normality test
## 
## data:  wt
## W = 0.94326, p-value = 0.09265

p-value is even to 0.09 which is higher than 0.05 but still close. The distribution estimated as normal (with given data) though with visible deviations.

The project is finished.

Distribution - mtcars project

Elena

October 29, 2016