2/11/2019

Analysis of "mtcars" dataset

In this presentation, we are going to look at the "mtcars" dataset and trying to validate few of the very simple and intuitive ideas. We would be drawing simple "boxplot" using "plotly" to validate that:

  • A higher cylinder car (cyl) would have low average as the engine is more powerful and consumes more fuel.
  • The automatic transmission is more efficient than manual transmission.

"mtcars" dataset structure

Let's look at the mtcars dataset.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Creating the Plot

Now, let's create a simple boxplot to display the milege (mpg) against number of cylinders (cyl) and transmission type (am):

suppressPackageStartupMessages(library(plotly))
mtcars2 <- within (mtcars, 
                   {am  <- factor(am, labels = c("automatic", "manual"))
                    cyl <- ordered(cyl)})
plot_ly(mtcars2,y = ~mpg, x = ~cyl,
        color = ~am, type = "box",
        boxpoints = "all", jitter = 0.3, pointpos = -1.8)

Here, we converted "am" into a factor variable and ordered "cyl".

Basic BoxPlot

Observations

Looking at the plot, we can clearly see the following trends:

  • The cars with less number of cylinders have better milege.
  • The cars with "automatic" transmission have better milege as compared to "manual" transimission, in the same cylinder category.

Also, note that the sample size of this dataset (number of point) is extremely small to make any meaningful predictions.

Thank You