This is a data set that compares technological advancement of hybrid electric vehichles in different market segments. With technology improving, electic vehicles have become more popular but are expensive. The electric vehicles in the dataset are from different countries and years. The sample size in 154 HEVs including 11 plugin HEVs from 1997 to 2013. The EPA database was used to collect the required information from the vehicles.
The numeric variables are vehicle id (carid), model year(year), manufacturer’s suggested retail price in 2013 $ (msrp), acceleration rate in km/hour/seccond (accelrate), fuel economy in miles/gallon (mpg), max of mpg and mpge (mpgmpge), and molel class ID (carclass_id). The categorical varaibles are vehicle (vehicle) and model class (carclass): C = compact, M = midsize, TS = 2 seater, L = large, PT = pickup truck, MV = minivan, SUV = sport utility vehicle. Using these variables, is there a linear model that can predict the price of an electric car that was manufactured between 1997 and 2013? The data should have enough information to answer this question.
The variables msrp (response) and accelrate (explanatory) apear to have a linear relationship. The variables mpg and mpgmpge appear to almost have perfect linear correlation but that is because they almost measure the same thing. All of the other variables do not appear to have strong linear relationships with other variables.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -21282.174 | 5244.5880 | -4.057931 | 7.92e-05 |
| accel | 5067.661 | 425.9614 | 11.896995 | 0.00e+00 |
Both the top left and bottom left plots reveal that the residuals are clustered which means that they violate the constant variance assumption. In the Normal Q-Q plot, the ends stray away from the line indicating that the data is not normal and voilates the normality assumption. The Residuals vs Leverage plot indicates that there are no serious outliers or leverage values.
Using the t test, the t value = 11.897, p-value < 0.0001. This is highly significant which means that the slope of the linear regression model is statistically significantly positive. But some of the assumputions have been violated so we will try the bootstrap method.
| 2.5% | 97.5% | |
|---|---|---|
| boot.beta0.ci | -32581.620 | -9906.084 |
| boot.beta1.ci | 4059.339 | 6068.927 |
The bootstrap 95% Confidence Intervals of the slope of the regression yield (4019.891, 6050.916). The number 0 and negative numbers are not included in this interval so the slope of the regression equation must be statistically significant and positive. This means that there is a linear relationship between the varibles vehicle price (msrp) and acceleration rate (accelrate). Both the p-value method and the bootstrap method agree. Because some of our assumptions were violated for the p-value method, I would choose to report the bootstrap method.