1 Data Description

The data collected was collected from the UFL website and is a data set on comparing the technological advancement of hybrid electric vehicles. The source did not say how the data was collected.

2 Questions

The main question looking at this data is the effect of all variables on the manufacturer’s suggested retail price.

3 Data Analysis

The vehicle variable tells which car each other variable is for and the carid is a code for that variable, so vehicle is character and carid is an integer. We will not be concerned with the vehicle variable in this report, because it is not split up into brands. The year variable was split into two groups, before and after 2007 to do the multiple linear regression, and changed into yeargroup. The carclass is a categorical variable that tells us the type of car with C being compact, M being midsize, TS being 2 Seater, L being Large, PT being Pickup Truck, MV being Minivan, and SUV being Sport Utility Vehicle. The carclass_id variable was removed from the dataset. Accelerate is the cars acceleration rate in km/hour/second. The mpg variable is the miles per gallon and the mpgmpge variable includes the maximum of the electric miles and the the gas miles. The msrp is the manufacturer’s suggested retail price in 2013, which is the response variable here.

car = read.csv("https://raw.githubusercontent.com/emmalaughin/sta321/main/data/hybrid_reg%20(1).csv")
car2 = select(car,-1, -2, -9) #taking variables out of original CSV
car2$yeargroup = ifelse(car2$year > 2007, "[2007-2013]", "[1997-2006]]") #vectorized function, changing value into function with single step
pander(head(car2[, c("year", "yeargroup")])) # new variable is yeargroup, we're adding it to the data
year yeargroup
1997 [1997-2006]]
2000 [1997-2006]]
2000 [1997-2006]]
2000 [1997-2006]]
2001 [1997-2006]]
2001 [1997-2006]]
car3 = select(car2, -1) #dataset for MLR

4 Multiple Linear Regression

#options(digits = 7) #making 7 digits in the model
full.model = lm(msrp ~ ., data = car3) #creating a full linear model
kable(summary(full.model)$coef, caption ="Statistics of Regression Coefficients") 
Statistics of Regression Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9094.89984 9978.99205 0.9114047 0.3636268
accelrate 3793.57569 494.89550 7.6654075 0.0000000
mpg -499.15095 172.63771 -2.8913205 0.0044400
mpgmpge 69.67344 82.33238 0.8462460 0.3988394
carclassL 27174.22420 6219.53416 4.3691736 0.0000239
carclassM -4309.02967 3301.39418 -1.3052151 0.1939306
carclassMV 12113.05751 7212.85145 1.6793715 0.0952786
carclassPT -5835.70554 7033.21999 -0.8297345 0.4080816
carclassSUV 977.06786 4163.55775 0.2346714 0.8148018
carclassTS -5942.56060 5570.00832 -1.0668854 0.2878341
yeargroup[2007-2013] -415.94796 2854.42418 -0.1457204 0.8843487

4.1 Summary of the Model

The final model is \[ msrp =9 094.89984 - 3793.575690.0076\times accelerate -499.15095\times mpg + 69.67344\times mpgmpge + 27174.22420\times carclassL -4309.02967\times carclassM +12113.05751\times carclassMV -5835.70554\times carclassPT + 977.06786 \times carclassSUV -5942.56060 \times carclass TS -415.94796\times yeargroup[2007-2013] \] The accelerate and mpgmpge variables both have a positive correlation to the manufacturer’s selling price so the higher these are the more expensive the car will be, generally. The yeargroup [2007-2013] is negatively linearly correlated to the yeargroup [1997-2006] meaning the older cars are more less expensive. The carclasses Large, Minivan, and SUV are all positively correlated to the index, which is Compact meaning they will be more expensive than the compact car. The carclass Midsize, Pickup truck, and two-seater are negatively correlated to the Compact car class meaning these are less expensive overall than the compact cars.

4.2 Discussion

The model is invalid because of not having a variable for car brand, as this is a big indicator of the msrp.