This is a case study where different car models are available with different details regarding different features in terms of several variables.The aim is to apply analytical operations in terms of Model Creation and Prediction.
data(mtcars) # Using data function, we have to call dataset to R studio GLOBAL ENV
View(mtcars)
str(mtcars) # show the structure in terms of DATAFRAME, Structure shows no calculation
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars) # descriptive statistics, Summary shows the CALCULATIONS
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
miles_per_gallon <- mtcars$mpg
cylinder <- mtcars$cyl
The above code chunk contains functions like data() to show the data set in the Global Environment in RStudio. The dataset is already available in the ‘datasets’ package in the library of RStudio. View() shows the chosen dataset in a new script file. Str() shows the structure of the dataset used i.e. ‘mtcars’, which is shown in terms of dataframe containing all the variables(or columns) with respective datatypes. Summary() is used to show the calculations of descriptive statistics, as can be seen in the output above.
To create the model, lm() function is required as shown in the code chunk below. For lm() to run, the above code chunk is a prerequisite. Also, the variable miles_per_gallon is created which is the dependent variable for the current model and cylinder variable created above is the independent variable.
model1 <- lm(miles_per_gallon ~ cylinder)
model1 # Call the variable name, run the variablename
##
## Call:
## lm(formula = miles_per_gallon ~ cylinder)
##
## Coefficients:
## (Intercept) cylinder
## 37.885 -2.876
The variable ‘model1’ has been created in the chunk above, which is obviously going to the Global Environment in RStudio. The variable name ‘model1’ is called and executed as shown in the 2nd line of above code chunk.
Now, the predictive model is created after model1 has been created. The predictions are to be ddone for different criteria and requirements, as shown below one by one.
predict(model1, newdata = data.frame(cylinder = 8))
## 1
## 14.87826
In the above chunk, predict() is used to predict the miles per gallon for a car with 8 cylinders which was found to be 14.87826 mpg Interpretation: A car having 8 cylinders would have a mileage of 14.87826 mpg.
predict(model1, newdata = data.frame(cylinder = 6))
## 1
## 20.62984
In the above chunk, predict() is used to predict the miles per gallon for a car with 6 cylinders which was found to be 20.62984 mpg. Interpretation: A car having 6 cylinders would have a mileage of 20.62984 mpg.
predict(model1, newdata = data.frame(cylinder = 4))
## 1
## 26.38142
In the above chunk, predict() is used to predict the miles per gallon for a car with 4 cylinders which was found to be 26.38142 mpg. Interpretation: A car having 4 cylinders would have a mileage of 26.38142 mpg.
plot <- plot(cylinder, miles_per_gallon, type= "p", main = "Gas Consumption as Explained by Number of Cylinders",sub ="General Information",xlab= "Number of Cylinders",ylab= "Miles (U.S.) per gallon",col="firebrick")
abline(model1)
To show the Visuals in terms of a graphical representation, a variable named ‘plot’ has been created as shown in the above chunk. plot() function has been used to create the graph. plot() is available in the ggplot2 package in the RStudio.
Now, two variables are to be used for multiple regression, two variables need to be created for independent variables and ‘miles_per_gallon’ is the independent variable
am <- mtcars$am
model2 <- lm(miles_per_gallon ~ cylinder + am)
model2
##
## Call:
## lm(formula = miles_per_gallon ~ cylinder + am)
##
## Coefficients:
## (Intercept) cylinder am
## 34.522 -2.501 2.567
In the above code chunk, model2 variable has been created using the lm() function with variables ‘cylinder’ and ‘am’ as independent variables. Again model2 is called to execute.
This case study shows to apply simple linear regression and multiple linear regression on a given dataset (mtcars in this case). The models are created and predictive analyses have been done as shown above. The interpretations clearly show the nature of prediction.