The data collected was collected from the UFL website and is a data set on comparing the technological advancement of hybrid electric vehicles. The source did not say how the data was collected.
The main question looking at this data is the effect of all variables on the manufacturer’s suggested retail price.
The vehicle variable tells which car each other variable is for and the carid is a code for that variable, so vehicle is character and carid is an integer. We will not be concerned with the vehicle variable in this report, because it is not split up into brands. The year variable was split into two groups, before and after 2007 to do the multiple linear regression, and changed into yeargroup. The carclass is a categorical variable that tells us the type of car with C being compact, M being midsize, TS being 2 Seater, L being Large, PT being Pickup Truck, MV being Minivan, and SUV being Sport Utility Vehicle. The carclass_id variable was removed from the dataset. Accelerate is the cars acceleration rate in km/hour/second. The mpg variable is the miles per gallon and the mpgmpge variable includes the maximum of the electric miles and the the gas miles. The msrp is the manufacturer’s suggested retail price in 2013, which is the response variable here.
car = read.csv("https://raw.githubusercontent.com/emmalaughin/sta321/main/data/hybrid_reg%20(1).csv")
car2 = select(car,-1, -2, -9) #taking variables out of original CSV
car2$yeargroup = ifelse(car2$year > 2007, "[2007-2013]", "[1997-2006]]") #vectorized function, changing value into function with single step
pander(head(car2[, c("year", "yeargroup")])) # new variable is yeargroup, we're adding it to the data
| year | yeargroup |
|---|---|
| 1997 | [1997-2006]] |
| 2000 | [1997-2006]] |
| 2000 | [1997-2006]] |
| 2000 | [1997-2006]] |
| 2001 | [1997-2006]] |
| 2001 | [1997-2006]] |
car3 = select(car2, -1) #dataset for MLR
#options(digits = 7) #making 7 digits in the model
full.model = lm(msrp ~ ., data = car3) #creating a full linear model
kable(summary(full.model)$coef, caption ="Statistics of Regression Coefficients")
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 9094.89984 | 9978.99205 | 0.9114047 | 0.3636268 |
| accelrate | 3793.57569 | 494.89550 | 7.6654075 | 0.0000000 |
| mpg | -499.15095 | 172.63771 | -2.8913205 | 0.0044400 |
| mpgmpge | 69.67344 | 82.33238 | 0.8462460 | 0.3988394 |
| carclassL | 27174.22420 | 6219.53416 | 4.3691736 | 0.0000239 |
| carclassM | -4309.02967 | 3301.39418 | -1.3052151 | 0.1939306 |
| carclassMV | 12113.05751 | 7212.85145 | 1.6793715 | 0.0952786 |
| carclassPT | -5835.70554 | 7033.21999 | -0.8297345 | 0.4080816 |
| carclassSUV | 977.06786 | 4163.55775 | 0.2346714 | 0.8148018 |
| carclassTS | -5942.56060 | 5570.00832 | -1.0668854 | 0.2878341 |
| yeargroup[2007-2013] | -415.94796 | 2854.42418 | -0.1457204 | 0.8843487 |
The final model is \[ msrp =9 094.89984 - 3793.575690.0076\times accelerate -499.15095\times mpg + 69.67344\times mpgmpge + 27174.22420\times carclassL -4309.02967\times carclassM +12113.05751\times carclassMV -5835.70554\times carclassPT + 977.06786 \times carclassSUV -5942.56060 \times carclass TS -415.94796\times yeargroup[2007-2013] \] The accelerate and mpgmpge variables both have a positive correlation to the manufacturer’s selling price so the higher these are the more expensive the car will be, generally. The yeargroup [2007-2013] is negatively linearly correlated to the yeargroup [1997-2006] meaning the older cars are more less expensive. The carclasses Large, Minivan, and SUV are all positively correlated to the index, which is Compact meaning they will be more expensive than the compact car. The carclass Midsize, Pickup truck, and two-seater are negatively correlated to the Compact car class meaning these are less expensive overall than the compact cars.
The model is invalid because of not having a variable for car brand, as this is a big indicator of the msrp.