Project- MTCars

Discription of project:

In this data set in mtcars all rows represents car brands, and column specifies the information of the cars as mileage, cylinder, horsepower etc.

We have data set of 32 car companies in the data set, and whether high cylinder cause high horsepower? does the relation between cylinder and horse power always holds true?

Let us investigate

Step 1: Load the data

data(mtcars)

View(mtcars)
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

At first we loaded the data into global environment, than we check the structure to have better understanding of the given dataset.

cyl <- mtcars$cyl
hp <- mtcars$hp
am <- mtcars$am
# through this we assigned data for the new variable we created.

Step 2: Ensure you have the right package for the analysis - ggplot2

library(ggplot2)

Step 3: Scatter plot

ggplot(data=mtcars, aes(x=cyl, y=hp)) + geom_point()

Step 4: Now, we can add ‘Transmission’ variable as color

ggplot(data = mtcars, aes(x=cyl, y=hp, color=am)) + geom_point()

Step 5: The scatter plot got many clumsy points and not so clear. Let us make it more clear.

ggplot(data = mtcars, aes(x=cyl, y=hp, color=am)) + geom_point()

Step 6: For better clearity we will take manual and automatic seperately

ggplot(data = mtcars[mtcars$am <1,], aes(x=cyl, y=hp, color=am)) + geom_point()

ggplot(data = mtcars[mtcars$am >0,], aes(x=cyl, y=hp, color=am)) + geom_point()

Step 7: To see the averages for the cylinder, let us run the following line.

ggplot(data = mtcars[mtcars$am <1,], aes(x=cyl, y=hp, color=am)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 3.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 4.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 8.3971e-17
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 4.0804
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 3.98
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 4.02
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 8.3971e-17
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 4.0804

ggplot(data = mtcars[mtcars$am >0,], aes(x=cyl, y=hp, color=am)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 3.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 2.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 16.16
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 3.98
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 2.02
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 0
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 16.16

Conclusion:

1.From the above analysis we can see that as the cylinder increases the horsepower of the car is also increasing where with 4 cylinder being lowest to 8 cylinder being highest 2.Its is also shown that it doe not effect even if the car is manual or automatic, if the number of cylinder is increasing then the horsepower will also increase.