1 Executive Summary

The following study will outline the analysis of a cars data set. In this data set we are trying to find the factors (x variables) that influence our chosen response variable “Suggested Retail Price.” Management can then use this model to better understand the factors that influence the retail price of 234 different vehicles. This could give the management a competitive advantage because understanding the factors that affect the retail price of a vehicle can allow them to make vehicles with features that more consumers want. We started by preprocessing the data and checking the normality of our response variable Suggested Retail Price. It turned out that when the summary statistics of all the variables were ran, Suggested Retail Price had a mean significantly higher than its median. After further exploratory analysis (looking at histograms and boxplots) we reached the conclusion that our response variable was right skewed and not normally distributed. This meant that we needed to transform our response variable by taking the Logarithm of Suggested Retail Price, and once we did the histogram of our new response variable (LogPrice) was normally distributed. Now that our response variable was normally distributed, we checked the categorical data that needs indicator (dummy) variables. We found that Hybrid and Cylinder were both columns with categorical data, so appropriate indicator variables were assigned for both. For the predictive analysis, linear regression was appropriate for the data set and two models containing all the variables were created. In Model 1, the adjusted R squared value was 0.969 which meant that the influence of the x variables on the response variable was very strong. We also found which variables were significant at the 0.05 and 0.001 levels. In Model 2, the adjusted R squared value was 0.950 which meant that the x variables had a very strong influence on the response variable. We also found which variables were significant at the 0.05 level. From our findings in both models we came up with several conclusions on how management can increase and decrease retail price to create a competitive advantage in the automobile industry. From our findings in Model 1, we concluded that to increase suggested retail price, management should manufacture faster vehicles (cars with more cylinders, more horsepower and more weight) and increase costs to the dealerships. From our findings in Model 2, we concluded that to decrease suggested retail price, management should manufacture hybrid vehicles (with better highway and city miles per gallon) and decrease costs to the dealerships. The following analysis will go more in depth on the factors that influence the suggested retail price of a car.

Vehicle.Name Hybrid SuggestedRetailPrice DealerCost EngineSize Cylinders Horsepower CityMPG HighwayMPG Weight WheelBase Length Width
Chevrolet Aveo 4dr 0 11690 10965 1.6 4 103 28 34 2370 98 167 66
Chevrolet Aveo LS 4dr hatch 0 12585 11802 1.6 4 103 28 34 2348 98 153 66
Chevrolet Cavalier 2dr 0 14610 13697 2.2 4 140 26 37 2617 104 183 69
Chevrolet Cavalier 4dr 0 14810 13884 2.2 4 140 26 37 2676 104 183 68
Chevrolet Cavalier LS 2dr 0 16385 15357 2.2 4 140 26 37 2617 104 183 69
Dodge Neon SE 4dr 0 13670 12849 2.0 4 132 29 36 2581 105 174 67

2 Data Preprocessing

For the preprocessing part, we checked the Excel file for any blank cells but no blank cells were in this data set. When looking at the summary statistics for the data set, the response variable we want to analyze (Suggested Retail Price) has a mean greater than its median. This means that the response variable is right skewed and does not follow a normal distribution. The implications of Suggested Retail Price being right skewed is that we cannot perform an accurate linear regression on skewed response variables. The outliers in the tail region will adversely affect the model’s performance, thus we have to transform our response variable, Suggested Retail Price, so that it follows a normal distribution. Once transformed we can re-analyze the histogram of the response variable and confirm that it follows a normal distribution and then linear regression can be performed (see Exploratory Analysis for full detail of transformation).

Additionally, in this data set we had one column (x variable) with categorical data. The Hybrid column has data with exactly two values where 0 defines the vehicle as “not hybrid” and 1 defines the vehicle as “hybrid.” We must also process this categorical data into indicator variables so that it can be used in the linear regression model (see Exploratory Analysis for full detail of indicator variables).

3 Exploratory Analysis

The purpose of this exploratory analysis is to analyze the normality of the response variable Suggested Retail Price, by looking at its histogram, boxplot, and running the shapiro-wilk test of normality. The reason for which we need to check the response variable normality is to see if it will accurately fit our linear regression model and if the response variable is not normal, we need to use a function (either log or squared) to make it normal. Additionally, we need to explore the correlation of our response variable with the x variables and identify any strong correlations before making a linear regression model to observe any interaction effects. Finally, we need to take the categorical data in this data set and create indicator (dummy) variables so that they can be used in the linear regression model and be multiplied to observe an interaction effect.

3.0.1 Histogram of Suggested Retail Price

> with(cars, Hist(SuggestedRetailPrice, scale="frequency", breaks="Sturges", 
+   col="darkgray"))

By observing the histogram of Suggested Retail Price, it is obvious that the response variable is right skewed. The mean is greater than the median and the bulk of the distribution hugs the Y axis and the tail slopes down to the right.

3.0.2 Boxplot of Suggested Retail Price

> Boxplot( ~ SuggestedRetailPrice, data=cars, id=list(method="y"))

[1] "194" "199" "200" "210" "212" "222" "223" "228" "229"

By observing the boxplot of Suggested Retail Price, it is obvious that the response variable is right skewed. The median line is on the bottom of the interquartile range (IQR) and the whisker on the top is longer than the whisker on the bottom. Outliers are also present in lines “194”, “199”, “200”, “210”, “212”, “222”, “223”, “228”, and “229” of the data. In all these lines, the suggested retail price is much higher than the average. For most of these outliers, their vehicle weight is also much higher than the average which could mean that weight has a significant influence on suggested retail price.

3.0.3 Testing Response Variable Normality

> normalityTest(~SuggestedRetailPrice, test="shapiro.test", data=cars)

    Shapiro-Wilk normality test

data:  SuggestedRetailPrice
W = 0.84592, p-value = 1.624e-14

The null hypothesis in the shapiro-wilk normality test is that the response variable (Suggested Retial Price) is normally distributed. Since the test gives us a p-value very close to 0 and is less than the 0.05 level of significance, we reject the null hypothesis. This means we reject the claim that the response variable is normally distributed.

3.0.4 Transforming the Response Variable

> summary(powerTransform(SuggestedRetailPrice ~ 1, data=cars, 
+   family="bcPower"))
bcPower Transformation to Normality 
   Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
Y1   -0.2107           0      -0.4497       0.0282

Likelihood ratio test that transformation parameter is equal to 0
 (log transformation)
                           LRT df     pval
LR test, lambda = (0) 3.062229  1 0.080132

Likelihood ratio test that no transformation is needed
                           LRT df       pval
LR test, lambda = (1) 112.7978  1 < 2.22e-16
> cars$LogPrice <- with(cars, log(SuggestedRetailPrice))

In order to transform the response variable, we must take the log of the response variable. This will eliminate the outliers and transform the Suggested Retail Price data, so that it improves the fit of the model and follows a normal distribution.

3.0.4.1 Histogram of Transformed Response Variable: LogPrice

> with(cars, Hist(LogPrice, scale="frequency", breaks="Sturges", 
+   col="darkgray"))

By looking at the histogram of our newly transformed response variable, LogPrice, we can see that it follows a more normal distribution. The bulk of the distribution is in the middle and the tails fall to both the left and right sides of the graph.

3.0.4.2 Normality Test for LogPrice

> normalityTest(~LogPrice, test="shapiro.test", data=cars)

    Shapiro-Wilk normality test

data:  LogPrice
W = 0.98922, p-value = 0.07783

The null hypothesis in the shapiro-wilk normality test is that the response variable (LogPrice) is normally distributed. Since the test gives us a p-value of 0.077 and is greater than the 0.05 level of significance, we fail to reject the null hypothesis. This means we fail to reject the claim that the response variable is normally distributed, therefore, LogPrice follows a normal distribution.

3.0.5 Correlation Matrices

> knitr::kable(cor(cars[,c("CityMPG","Cylinders","DealerCost","EngineSize","LogPrice")], 
+   use="complete"))
CityMPG Cylinders DealerCost EngineSize LogPrice
CityMPG 1.0000000 -0.6430625 -0.5224286 -0.6631293 -0.6162975
Cylinders -0.6430625 1.0000000 0.7540660 0.9275933 0.7725827
DealerCost -0.5224286 0.7540660 1.0000000 0.6965503 0.9363557
EngineSize -0.6631293 0.9275933 0.6965503 1.0000000 0.7392290
LogPrice -0.6162975 0.7725827 0.9363557 0.7392290 1.0000000
> knitr::kable(cor(cars[,c("HighwayMPG","Horsepower","Hybrid","Length","LogPrice")], 
+   use="complete"))
HighwayMPG Horsepower Hybrid Length LogPrice
HighwayMPG 1.0000000 -0.7189530 0.5655500 -0.5174236 -0.6491948
Horsepower -0.7189530 1.0000000 -0.1922594 0.5739366 0.8781931
Hybrid 0.5655500 -0.1922594 1.0000000 -0.1550270 -0.0686683
Length -0.5174236 0.5739366 -0.1550270 1.0000000 0.5226263
LogPrice -0.6491948 0.8781931 -0.0686683 0.5226263 1.0000000
> knitr::kable(cor(cars[,c("LogPrice","Weight","WheelBase","Width")], use="complete"))
LogPrice Weight WheelBase Width
LogPrice 1.0000000 0.8794300 0.7420595 0.5543109
Weight 0.8794300 1.0000000 0.8524790 0.7843869
WheelBase 0.7420595 0.8524790 1.0000000 0.8360213
Width 0.5543109 0.7843869 0.8360213 1.0000000

The above correlation matrices shows the strength of the correlation between the response variable, LogPrice, and all other x variables. The purpose of the correlation matrices is to find the values closest to 1 which have the strongest correlation with LogPrice. From our matrices the x variables with the strongest correlation with LogPrice are: DealerCost (0.936), Weight (0.879), Horsepower (0.878), Cylinders (0.772), WheelBase (0.742), and EngineSize (0.739). Interesting to note, HighwayMPG (-0.649) and CityMPG (-0.616) show a decent negative correlation with LogPrice. To investigate the correlations further, let’s create scatterplots of the relationships.

3.0.6 Scatterplots

3.0.6.1 Scatterplot of LogPrice vs DealerCost & EngineSize

> scatterplotMatrix(~DealerCost+EngineSize+LogPrice, regLine=FALSE, smooth=FALSE, 
+   diagonal=list(method="density"), data=cars)

When looking at the above scatterplot, you can notice that there is a strong positive correlation between the response variable LogPrice and the x variables DealerCost and EngineSize. This could mean that DealerCost and EngineSize are good predictors of LogPrice (suggested retail price of a vehicle).

3.0.6.2 Scatterplot of LogPrice vs Horsepower & Weight

> scatterplotMatrix(~Horsepower+LogPrice+Weight, regLine=FALSE, smooth=FALSE, 
+   diagonal=list(method="density"), data=cars)

When looking at the above scatterplot, you can notice that there is a strong positive correlation between the response variable LogPrice and the x variables Horsepower and Weight. This could mean that Horsepower and Weight are good predictors of LogPrice (suggested retail price of a vehicle).

3.0.6.3 Scatterplot of LogPrice vs CityMPG & HighwayMPG

> scatterplotMatrix(~CityMPG+HighwayMPG+LogPrice, regLine=FALSE, smooth=FALSE, 
+   diagonal=list(method="density"), data=cars)

When looking at the above scatterplot, you can notice that there is a strong negative correlation between the response variable LogPrice and the x variables CityMPG and HighwayMPG. This could mean that the more fuel efficient a vehicle is, the lower its suggested retail price is. This would make sense given that in the automobile industry, cars with good fuel economies are generally more budget friendly options for consumers.

3.0.7 Indicator Variables

In the cars data set, there are two columns of categorical data. Hybrid and Cylinders are both data that contain discrete data which represents something. Hybrid has 1s and 0s, where 1 means the car is a hybrid and 0 meaning the car is not a hybrid. The Cylinders data has values 3, 4, 5, 6, 8, and 12, where each value corresponds to the number of cylinders in the engine of that vehicle. Since both these columns contain categorical data, they need indicator variables to be useful for linear regression. The following indicator variables were assigned:

> cars$IsHybrid <- with(cars, ifelse(Hybrid=="1", 1, 0))
> cars$NotHybrid <- with(cars, ifelse(Hybrid=="0", 1, 0))
> cars$ThreeCyl <- with(cars, ifelse(Cylinders=="3", 1, 0))
> cars$FourCyl <- with(cars, ifelse(Cylinders=="4", 1, 0))
> cars$FiveCyl <- with(cars, ifelse(Cylinders=="5", 1, 0))
> cars$SixCyl <- with(cars, ifelse(Cylinders=="6", 1, 0))
> cars$EightCyl <- with(cars, ifelse(Cylinders=="8", 1, 0))
> cars$TwelveCyl <- with(cars, ifelse(Cylinders=="12", 1, 0))

3.0.8 Interaction Effect

To observe an interaction effect, we need to take a categorical (indicator) variable and multiply it by a different x variable and then analyze the influence it has on the response variable. If the influence on the response variable is greater on the two variables combined rather than the the two variables alone an interaction effect can be observed. First we need to find which variables should be multiplied together that might have greater influence on LogPrice. It is possible that a car that is hybrid and has good highway mpg may have a lower retail price. Additionally, It is possible that a car with more cylinders and a larger engine size have a higher retail price. Lets try an observe these variables using correlation matrices, to see if there is any significance between the variables before transforming them.

> knitr::kable(cor(cars[,c("HighwayMPG","IsHybrid","LogPrice")], use="complete"))
HighwayMPG IsHybrid LogPrice
HighwayMPG 1.0000000 0.5655500 -0.6491948
IsHybrid 0.5655500 1.0000000 -0.0686683
LogPrice -0.6491948 -0.0686683 1.0000000

Both IsHybrid and HighwayMPG have a negative correlation with LogPrice which means that vehicles that are hybrid with good highway MPG have a lower suggested retail price.

> knitr::kable(cor(cars[,c("EightCyl","EngineSize","LogPrice")], use="complete"))
EightCyl EngineSize LogPrice
EightCyl 1.0000000 0.7173619 0.5489916
EngineSize 0.7173619 1.0000000 0.7392290
LogPrice 0.5489916 0.7392290 1.0000000

Both EightCyl and Engine Size have decent positive correlations with LogPrice. This means that an engine with 8 cylinders is large and has a higher suggested retail price.

We can now compute new variables for both categorical data to be used in linear regression.

> cars$HybridxHwyMPG <- with(cars, IsHybrid*HighwayMPG)
> cars$CylxEngineSize <- with(cars, EightCyl*EngineSize)

4 Predictive Analysis

4.1 Linear Regression

Now that our response variable is normally distributed and indicator variables were assigned, we can conduct linear regression to see what factors influence the suggested retail price of a car. The purpose of this regression is to find the best model that management can use to predict the suggested retail price of a car which will create a competitive advantage.

4.1.1 Regression Model 1

> RegModel.1 <- 
+   lm(LogPrice~CityMPG+DealerCost+EightCyl+EngineSize+FiveCyl+FourCyl+HighwayMPG+Horsepower+Length+NotHybrid+SixCyl+ThreeCyl+Weight+WheelBase+Width,
+    data=cars)
> summary(RegModel.1)

Call:
lm(formula = LogPrice ~ CityMPG + DealerCost + EightCyl + EngineSize + 
    FiveCyl + FourCyl + HighwayMPG + Horsepower + Length + NotHybrid + 
    SixCyl + ThreeCyl + Weight + WheelBase + Width, data = cars)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.280419 -0.039364  0.005487  0.055369  0.200999 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.187e+00  3.362e-01  24.349  < 2e-16 ***
CityMPG     -6.078e-03  4.544e-03  -1.337   0.1825    
DealerCost   2.163e-05  1.033e-06  20.943  < 2e-16 ***
EightCyl     9.758e-01  1.055e-01   9.252  < 2e-16 ***
EngineSize  -4.842e-02  2.093e-02  -2.314   0.0216 *  
FiveCyl      1.042e+00  1.240e-01   8.403 5.62e-15 ***
FourCyl      1.043e+00  1.276e-01   8.178 2.37e-14 ***
HighwayMPG   3.901e-03  3.984e-03   0.979   0.3287    
Horsepower   1.217e-03  2.246e-04   5.416 1.60e-07 ***
Length      -2.265e-03  1.138e-03  -1.990   0.0479 *  
NotHybrid   -2.333e-01  9.598e-02  -2.430   0.0159 *  
SixCyl       1.090e+00  1.179e-01   9.247  < 2e-16 ***
ThreeCyl     1.354e+00  1.598e-01   8.471 3.62e-15 ***
Weight       3.663e-04  3.585e-05  10.218  < 2e-16 ***
WheelBase    5.462e-04  2.698e-03   0.202   0.8398    
Width       -4.979e-03  4.910e-03  -1.014   0.3117    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08229 on 218 degrees of freedom
Multiple R-squared:  0.9719,    Adjusted R-squared:  0.9699 
F-statistic: 501.8 on 15 and 218 DF,  p-value: < 2.2e-16

In Model 1, the multiple R-squared value is equal to 0.972 which means that 97.2% of the variation in LogPrice can be explained by the factors: DealerCost, EngineSize, Horsepower, Length, Width, Weight, Wheelbase, CityMPG, HighwayMPG NotHybrid, ThreeCyl, FourCyl, FiveCyl, SixCyl and EightCyl.

The adjusted R-squared value is 0.969 which means that the combined variables in Model 1 have a very strong influence on the LogPrice (suggested retail price) of a car.

The variables significant at the 0.05 level were: NotHybrid, Length and EngineSize.

The variables significant at the 0.001 level were: DealerCost, Horsepower, Weight, ThreeCyl, FourCyl, FiveCyl, SixCyl, and EightCyl.

For the significant variables in this model at both the 0.05 and 0.001 levels we reject the null hypothesis. Therefore, the significant x variables in Model 1 have an influnce on the response variable LogPrice (suggested retail price).

4.1.2 Confidence Interval for Model 1

> confint(RegModel.1)
                    2.5 %        97.5 %
(Intercept)  7.524329e+00  8.849698e+00
CityMPG     -1.503426e-02  2.878671e-03
DealerCost   1.959771e-05  2.366945e-05
EightCyl     7.679438e-01  1.183706e+00
EngineSize  -8.967261e-02 -7.171268e-03
FiveCyl      7.977231e-01  1.286597e+00
FourCyl      7.918551e-01  1.294727e+00
HighwayMPG  -3.952148e-03  1.175361e-02
Horsepower   7.738649e-04  1.659280e-03
Length      -4.508279e-03 -2.147025e-05
NotHybrid   -4.224408e-01 -4.408808e-02
SixCyl       8.580731e-01  1.322911e+00
ThreeCyl     1.038850e+00  1.668812e+00
Weight       2.956598e-04  4.369717e-04
WheelBase   -4.772237e-03  5.864576e-03
Width       -1.465699e-02  4.698246e-03

When looking at the confidence intervals, there is a possibility that 0 is not between the variables on each side of the distribution. For this reason, we reject the null hypothesis, which conlcludes the model has influnce on LogPrice.

4.1.3 Regression Model 2

> RegModel.2 <- 
+   lm(LogPrice~CylxEngineSize+DealerCost+Horsepower+HybridxHwyMPG+Length+Weight,
+    data=cars)
> summary(RegModel.2)

Call:
lm(formula = LogPrice ~ CylxEngineSize + DealerCost + Horsepower + 
    HybridxHwyMPG + Length + Weight, data = cars)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.80466 -0.04665 -0.00023  0.06386  0.24152 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     8.865e+00  1.196e-01  74.128  < 2e-16 ***
CylxEngineSize -2.053e-02  5.636e-03  -3.644 0.000333 ***
DealerCost      1.748e-05  1.011e-06  17.299  < 2e-16 ***
Horsepower      9.589e-04  2.451e-04   3.912 0.000121 ***
HybridxHwyMPG   4.854e-03  1.137e-03   4.271 2.87e-05 ***
Length         -4.518e-03  8.918e-04  -5.066 8.40e-07 ***
Weight          4.525e-04  3.243e-05  13.953  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1057 on 227 degrees of freedom
Multiple R-squared:  0.9516,    Adjusted R-squared:  0.9503 
F-statistic:   744 on 6 and 227 DF,  p-value: < 2.2e-16

In Model 2, we are using a different set of variables and including the variables we made to try and observe an interaction effect. The multiple R-squared value is equal to 0.952 which means that 95.2% of the variation in LogPrice can be explained by the factors: CylxEngineSize, DealerCost, Horsepower, HybridxHwyMPG, Length, and Weight

The adjusted R-squared value is 0.950 which means that the combined variables in Model 1 have a very strong influence on the LogPrice (suggested retail price) of a car.

The variables significant at the 0.05 level were: CylxEngineSize, DealerCost, Horsepower, HybridxHwyMPG, Length, and Weight.

For the significant variables in this model at the 0.05 level we reject the null hypothesis. Therefore, the significant x variables in Model 2 have an influence on the response variable LogPrice (suggested retail price).

4.1.4 Confidence Interval for Model 2

> confint(RegModel.2)
                       2.5 %        97.5 %
(Intercept)     8.629470e+00  9.100776e+00
CylxEngineSize -3.163979e-02 -9.429694e-03
DealerCost      1.549333e-05  1.947676e-05
Horsepower      4.759579e-04  1.441837e-03
HybridxHwyMPG   2.614395e-03  7.093518e-03
Length         -6.275346e-03 -2.760825e-03
Weight          3.886287e-04  5.164457e-04

When looking at the confidence intervals, there is a possibility that 0 is not between the variables on each side of the distribution. For this reason, we reject the null hypothesis, which conlcludes the model has influnce on LogPrice.

5 Conclusions

The objective of analyzing the cars data set was to find what factors (x variables) had an influence on the suggested retail price of 234 different vehicles, so that management could generate a competitive advantage in their market. After conducting linear regression we came up with 2 models containing variables that have significant influence on suggested retail price.

In Model 1, we found that DealerCost, Horsepower, Weight, NotHybrid, Length, EngineSize, ThreeCyl, FourCyl, FiveCyl, SixCyl, and EightCyl were all significant variables that reject the null hypothesis. These variables have an influence on the suggested retail price of a car. Additionally, the adjusted R squared value of 0.969 means that the influence of the x variables on the response variable was very strong and this model would be a good predictor of suggested retail price. Management can use this model to adjust retail price of vehicles given the significant factors. For example, if the dealer cost of the vehicle is high, then the retail price of the car will also be high because the dealership needs to keep a profit margin on the cars they sell. In the future, if management wants to lower the retail price of vehicles to sell more models then they should focus on reducing the costs incurred by the dealers. Additionally, factors like the amount of cylinders in the engine of a vehicle also influeces its retail price. A car with more cylinders in the engine increases retail cost and the overall value of the car (many high cost vehicle models offer more cylinders). Important to note, cars with more cylinders often have more horsepower so by manufacturing the vehicles to be faster, the suggested retail price becomes higher.

In Model 2, we found that CylxEngineSize, DealerCost, Horsepower, HybridxHwyMPG, Length, and Weight were all significant variables that reject the null hypothesis. These variables have an influence on the suggested retail price of a car and the adjusted R squared value of 0.950 meant that the x variables had a strong influence on the response variable. Model 2 would also be a good predictor of suggested retail price. Interesting to note, CylxEngineSize and HybridxHwyMPG were were significant in this model and they were not included in Model 1. There was no interaction effect observed on Model 2 because the model is slightly weaker than Model 1 (adjusted R squared was lower) where the x variables were analyzed alone. The interaction effect states that the variables together are more statistically significant on the response variable then if they were analyzed alone, but this was not the case in Model 2. Management can still use this information to their advantage by advertising hybrid car models as perfect budget options with great highway and city miles per gallon. This could reduce the suggested retail price of hybrid vehicles. Similarly to Model 1, management could look to reduce costs to the dealership and manufacture car models with large 8 cylinder engines, both of which will increase the suggested retail price.

Overall, if management wants to increase the suggested retail price of vehicles they can manufacture them to be faster by adding more cylinders. By adding more cylinders, the engine size, horsepower and weight all increase which will increase the suggested retail price of the car. If management wants to decrease the suggested retail price they could manufacture more hybrid cars with good highway and city miles per gallon and advertise them as budget options. Additionally, they could decrease costs to the dealerships which could lower the retail price of a vehicle or increase costs to the dealerships which could increase the retail price of a vehicle.