library(mosaic)
library(tidyverse)
library(readr)
library(pander)
library(car)
library(ggplot2)
FordEscape <- read_csv("FordEscape.csv")
view(FordEscape)
The question I wanted to analyze was when would be the best time to sell my car based off of mileage and price trends in the market currently and how it relates to where I bought my car. I bought my car at $9,500 and 86,000 miles in 2020 so I will see how that compares to the chart as well.
I performed a logistic transformation to form a curve fit rather then a linear fit to the model which seems to predict the data in a stronger manner.
escape.lm <- lm(log(Price) ~ Mileage, data = FordEscape)
coeff_ford <- coef(escape.lm)
car_conf <- pander(exp(predict(escape.lm, data.frame(Mileage=185000),interval = "predict")))
ggplot(FordEscape, aes(x = Mileage, y = Price))+
geom_point(fill = "darkblue", color = "skyblue2", pch = 21)+
stat_function(fun=function(x) (exp(coeff_ford[1] + coeff_ford[2]*x))) +
geom_point(x = 87000, y = 9500, size = 5, color = "red") +
geom_point(x = 185000, y = 9929, size = 5, color = "green") +
geom_text(x = 87000, y = 11000, label = "My Purchase \n (87,000, 9,500)", color = "blue")+
geom_text(x = 175000, y = 13000, label = "Plan to Sell \n (185,000, 9,929)", color = "blue")+
geom_vline(xintercept = 106000, col = "red", linetype = "dashed") +
geom_segment(x = 87000, y = 9500, xend = 185000, yend = 9929, size = 1, color = "gray", linetype = "longdash", alpha = 0.75) +
geom_segment(aes(x=185000, xend=185000, y=8332, yend=11831), lwd=4, color= "skyblue2", alpha = .02) +
labs(title = "Selling Model for 2016 Ford Escape", x = "Mileage", y= "Price") +
theme_bw()
Here I have a point at where I bought my car at and how it compares to my data I collected. I bought my car at the beginning of the pandemic when prices were dropping and as you can see I was able to get a great deal. The red dotted line is where my car is at currently in mileage, and the green dot is where I plan on selling my car. The blue box around the point is my prediction interval of the predicted price of the car at that time.
In conclusion, I was able to get my car at a great price and was well below the curve. Using the predict function, I was able to see that a value of about 185,000 miles would have a slightly higher value (as shown below) as when I bought my car originally and could be a good time to sell back while getting a lot of value out of the car. That is about another 80,000 miles for me or about 8 years from now on average.
pander(exp(predict(escape.lm, data.frame(Mileage=185000))))
| 1 |
|---|
| 9929 |
\[ \underbrace{Y'_i}_\text{Log Price} = \overbrace{\beta_0}^\text{y-int} + \overbrace{\beta_1}^\text{slope} \underbrace{X_i}_\text{Mileage} + \epsilon_i \quad \text{where} \ \epsilon_i \sim N(0, \sigma^2) \]
\[ \left.\begin{array}{ll} H_0: \beta_1 = 0 \\ H_a: \beta_1 \neq 0 \end{array} \right\} \ \text{Slope Hypotheses} \]
\[ \alpha = .05 \]
\[ \underbrace{\hat{Y'}_i}_\text{Log Price} = \overbrace{b_0}^\text{est. y-int} + \overbrace{b_1}^\text{est. slope} \underbrace{X_i}_\text{Mileage} \]
Here is the breakdown of the regression. We can see that both our slope and intercept are very significant values
escape.lm <- lm(log(Price) ~ Mileage, data = FordEscape)
pander(summary(escape.lm))
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 10.02 | 0.02056 | 487.4 | 1.349e-93 |
| Mileage | -4.428e-06 | 2.321e-07 | -19.08 | 1.334e-24 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 52 | 0.08257 | 0.8792 | 0.8768 |
We see that both our intercept and slope are significant with a relatively high r^2 value of .88 showing that our model fits the data well and that it is important. So we will replace our model with our estimated values from the regression and use them for interpretation.
\[ \underbrace{\hat{Y'}_i}_\text{Log Price} = \overbrace{10.02}^\text{est. y-int} + \overbrace{-4.428e-06}^\text{est. slope} \underbrace{X_i}_\text{Mileage} \]
\[ \underbrace{\hat{Y}_i}_\text{Price} = \overbrace{e^{10.02}}^\text{est. y-int} + \overbrace{e^{0.956686}}^\text{est. slope} \underbrace{X_i}_\text{Mileage} \]
If we solve for price to put the model into actual price we get the following model:
\[ \underbrace{\hat{Y}_i}_\text{Price} = \overbrace{22,471.43}^\text{est. y-int} + \overbrace{0.956686}^\text{est. slope} \underbrace{X_i}_\text{Mileage} \]
Our model tells that a new Ford Escape should be around $22,500 with no miles on it. Our slope is saying how much value our car is retaining after a years worth of driving, or 10,000 miles. So for every 10,000 miles you drive, you keep about 95% of the original value of the car. Pretty good deal if you ask me. If we interpret for every 1,000 miles we receive a value of 0.9955818. This tells us that at this level we retain 99% of our cars value.
Here is our prediction function with 185,000 miles plugged into the predict function and the predicted price as the output.
pander(exp(predict(escape.lm, data.frame(Mileage=185000),interval = "predict")))
| fit | lwr | upr |
|---|---|---|
| 9929 | 8332 | 11831 |
par(mfrow=c(1,3))
plot(escape.lm,which = 1:2)
plot(escape.lm$residuals)
Our Residuals vs fitted plot shows constant variance and our QQ Plot shows normality for the most part with a little tail at the top but our model should be appropriate enough with our transformation. Our linearity model now looks good so our log transformation looks approproate and there are no more needed adjustments. Independence is also satisfied as we see no clear trend in our orders plot.