The datasets contain different estimates of fuel economy for passenger cars and trucks. For each vehicle, various characteristics are recorded such as the engine displacement or number of cylinders. Along with these values, laboratory measurements are made for the city and highway fuel economy (FE) of the car.
This project explains as to how Fuel Economy is dependent on Engine Displacement.
Step 1 : Load the given dataset.
library(readr)
FE2010 <- read_csv("FE2010.csv")
## Parsed with column specification:
## cols(
## `EngDisplacement(X)` = col_double(),
## `Fuel efficiency(Y)` = col_double()
## )
Step 2 : Perform linear regression.
library(knitr)
model<-lm(FE2010$`Fuel efficiency(Y)` ~ FE2010$`EngDisplacement(X)`)
model
##
## Call:
## lm(formula = FE2010$`Fuel efficiency(Y)` ~ FE2010$`EngDisplacement(X)`)
##
## Coefficients:
## (Intercept) FE2010$`EngDisplacement(X)`
## 50.563 -4.521
here are two co-efficients one is intercept and one more is regression co-efficient named engine displacement(x).
Let us name intercept as b0 and regresion co-efficient as b1.
The calculations are explained below
the above values are computed in the file FE2010.xlsx can be found in my GitHub.
The detailed calculations are found in FE2010.xlsx in my GitHub
Step 3 : Plot the linear model.
library("ggplot2")
ggplot() +
geom_point(aes(x=FE2010$`EngDisplacement(X)`,y=FE2010$`Fuel efficiency(Y)`),colour='red')+
geom_line(aes(x=FE2010$`EngDisplacement(X)`,y=predict(model,newdata = FE2010)),colour='blue')+
ggtitle('FUEL ECONOMY VS ENGINE DISPLACEMENT')+
xlab('Engine Displacement')+ylab('Fuel Economy')
Now we have to substitute b0 and b1 in regression equation
y=a+bX
y=b0+b1X
Fuel Efficiency = 50.563 - 4.5209 * EngineDisplacement
The above statement states that inorder to increase Fuel Efficiency by 1 unit,then Engine Displacement has to be decreased by 4.5 units.
Step 4 : Now plot the summary of the model
summary(model)
##
## Call:
## lm(formula = FE2010$`Fuel efficiency(Y)` ~ FE2010$`EngDisplacement(X)`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.486 -3.192 -0.365 2.671 27.215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.5632 0.3985 126.89 <2e-16 ***
## FE2010$`EngDisplacement(X)` -4.5209 0.1065 -42.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.624 on 1105 degrees of freedom
## Multiple R-squared: 0.62, Adjusted R-squared: 0.6196
## F-statistic: 1803 on 1 and 1105 DF, p-value: < 2.2e-16
SST stands for Sum Squared Total. SSR stands for Sum Squared Regressor. SSE stands for Sum Squared Error.