Multiple Models on Dataset- mtcars

This is a case study where different car models are available with different details regarding different features in terms of several variables.The aim is to apply analytical operations in terms of Model Creation and Prediction.

Create a model on Dataset - mtcars

data(mtcars)                # Using data function, we have to call dataset to R studio GLOBAL ENV
View(mtcars)
str(mtcars)                 # show the structure in terms of DATAFRAME, Structure shows no calculation
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars)             # descriptive statistics, Summary shows the CALCULATIONS
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
miles_per_gallon <- mtcars$mpg
cylinder <- mtcars$cyl

The above code chunk contains functions like data() to show the data set in the Global Environment in RStudio. The dataset is already available in the ‘datasets’ package in the library of RStudio. View() shows the chosen dataset in a new script file. Str() shows the structure of the dataset used i.e. ‘mtcars’, which is shown in terms of dataframe containing all the variables(or columns) with respective datatypes. Summary() is used to show the calculations of descriptive statistics, as can be seen in the output above.

Let us use the function to create the model - using Simple Linear Regression

To create the model, lm() function is required as shown in the code chunk below. For lm() to run, the above code chunk is a prerequisite. Also, the variable miles_per_gallon is created which is the dependent variable for the current model and cylinder variable created above is the independent variable.

model1 <- lm(miles_per_gallon ~ cylinder)
model1                       # Call the variable name, run the variablename
## 
## Call:
## lm(formula = miles_per_gallon ~ cylinder)
## 
## Coefficients:
## (Intercept)     cylinder  
##      37.885       -2.876

The variable ‘model1’ has been created in the chunk above, which is obviously going to the Global Environment in RStudio. The variable name ‘model1’ is called and executed as shown in the 2nd line of above code chunk.

Let us create a Predictive Model on ‘model1’

Now, the predictive model is created after model1 has been created. The predictions are to be ddone for different criteria and requirements, as shown below one by one.

1.Criteria: A car having 8 cylinders

To check the mileage/ performance of cars having 8 cylinders

predict(model1, newdata = data.frame(cylinder = 8))
##        1 
## 14.87826

In the above chunk, predict() is used to predict the miles per gallon for a car with 8 cylinders which was found to be 14.87826 mpg Interpretation: A car having 8 cylinders would have a mileage of 14.87826 mpg.

2.Criteria: A car having 6 cylinders

To check the mileage/ performance of cars having 6 cylinders

predict(model1, newdata = data.frame(cylinder = 6))
##        1 
## 20.62984

In the above chunk, predict() is used to predict the miles per gallon for a car with 6 cylinders which was found to be 20.62984 mpg. Interpretation: A car having 6 cylinders would have a mileage of 20.62984 mpg.

3.Criteria: A car having 4 cylinders

To check the mileage/ performance of cars having 4 cylinders

predict(model1, newdata = data.frame(cylinder = 4))
##        1 
## 26.38142

In the above chunk, predict() is used to predict the miles per gallon for a car with 4 cylinders which was found to be 26.38142 mpg. Interpretation: A car having 4 cylinders would have a mileage of 26.38142 mpg.

Let us create some Visuals

plot <- plot(cylinder, miles_per_gallon, type= "p", main = "Gas Consumption as Explained by Number of Cylinders",sub ="General Information",xlab= "Number of Cylinders",ylab= "Miles (U.S.) per gallon",col="firebrick")
abline(model1)

To show the Visuals in terms of a graphical representation, a variable named ‘plot’ has been created as shown in the above chunk. plot() function has been used to create the graph. plot() is available in the ggplot2 package in the RStudio.

Create a model using Multiple Regression Model

Now, two variables are to be used for multiple regression, two variables need to be created for independent variables and ‘miles_per_gallon’ is the independent variable

Variables to be used : mpg, cyl, am

am <- mtcars$am
model2 <- lm(miles_per_gallon ~ cylinder + am)
model2
## 
## Call:
## lm(formula = miles_per_gallon ~ cylinder + am)
## 
## Coefficients:
## (Intercept)     cylinder           am  
##      34.522       -2.501        2.567

In the above code chunk, model2 variable has been created using the lm() function with variables ‘cylinder’ and ‘am’ as independent variables. Again model2 is called to execute.

Interpretation for the Multiple Regression Model

Conclusion

This case study shows to apply simple linear regression and multiple linear regression on a given dataset (mtcars in this case). The models are created and predictive analyses have been done as shown above. The interpretations clearly show the nature of prediction.