Used-Car Prices

Group Members: Ian Callaway, Soren Dudley, Iris Vrioni, Liana Ray

Introduction

We chose to look at used Ford 350 trucks. We searched the cars on http://www.cars.com/. The variables we considered were “Price”, “Mileage”, “Age” and “Location” of the car. We decided to express the location of each car as the distance from the location corresponding to the zip code 55105. By expressing the location of the cars as distance from a certain point we were able transform the categorical variable “Location” into a quantitative variable. Our response variable will be “Price” and the explanatory variables are “Age”, “Mileage” and “Location”.

Reading in the Spreadsheet

dataSource = "https://docs.google.com/spreadsheet/pub?key=0AtElPC-OuJSWdFFQTV9nZ05GaF9kRTM0YWpGWmYyX3c&single=true&gid=0&output=csv"
cars = fetchGoogle(dataSource)

Description of Data

xyplot(Price~Age, data=cars, 
       ylab="Price ($)", xlab="Age (yrs)")

plot of chunk unnamed-chunk-3

This scatter plot shows the “Price” against “Age”. We can see that the price of cars that are older is lower than the price of cars that are younger. That is, the price decreases as we move to the right of the x axis and the cars get older.

xyplot(Price~Milage, data=cars, 
       ylab="Price ($)", xlab="Mileage (miles)")

plot of chunk unnamed-chunk-4

This scatter plot shows the “Price” against “Milage”. Again, we see that the price decreases for cars that have a higher mileage. As the mileage increases, we notice a decrease in price. It is important to note that these are mileages and prices of different individual cars.

xyplot(Price~Location, data=cars, 
       ylab="Price ($)", xlab="Location (miles from 55105)")

plot of chunk unnamed-chunk-5

This scatter plot shows the “Price” against distance of each car from Saint Paul (“Location”). The data points are very scattered and it is impossible to see any pattern of increase or decrease in price as distance increases without a fitted model.

Note: This does not mean that location does not matter when it comes to the price of a Ford 350. The distance of a car from St. Paul is the lenght of the radius of a circle with center at St. Paul and the car could be located in any of the points of this circle. This means that the location expressed as distance from St. Paul does not give the exact location of a car and therefore does not allow us to say anything about the relationship between price and actual location of the car.

bwplot(~Milage, data=cars)

plot of chunk unnamed-chunk-6

This boxplot shows that all but three of the cars in our data set have a mileage between 0 and 105 miles. To see the values of the mileages between which the middle 50% of the cars we looked at fall, we find the 25th and 27th percentile.

qdata(c(0.25, 0.75), Milage, data = cars)
##   25%   75% 
## 18297 58274

We see that the middle 50% of the cars we looked at have mileages that vary from 18297miles to 58274mile and that the meadian of the mileages is 36500 as shown below:

median(Milage, data=cars)
## [1] 36500

In the same way, we proceed with age:

bwplot(~Age, data=cars)

plot of chunk unnamed-chunk-9

qdata(c(0.25, 0.75), Age, data = cars)
## 25% 75% 
##   2   5

From the boxplot and the calculations, we see that the middle 50% of the cars we looked at have ages between 2 and 5 years.

median(Age, data=cars)
## [1] 3

We see that the median age of our cars is 3, the second quarter of the cars are between 2 and 3 years old, the third quarter of the cars are between 3 and 5 years.

Models

Model 1

mod1 = lm( Price ~ Milage*Age, data=cars)
coef(mod1)
## (Intercept)      Milage         Age  Milage:Age 
##   6.069e+04  -1.886e-01  -4.049e+03   1.629e-02

These coefficients show that a brand new Ford350 should cost $60690. For every mile that the car is driven the price of the car declines by approximately $0.19. Age also affects price. For every year that the car is used the price of the car should decrease by $4049. Finally, the influence that mileage has on price decreases as the car increases in age. The graph of the fitted model and the prices follows:

xyplot(Price+fitted(mod1)~Age, data = cars)

plot of chunk unnamed-chunk-13

Model 2

mod2 = lm( Price ~ Age*Location, data=cars)
coef(mod2)
##  (Intercept)          Age     Location 
##    43343.249    -1778.481       19.297 
## Age:Location 
##       -3.598

According to the second model, a brand new Ford350 that is located at Macalester College would cost $43343. For every year that the car is used the price will decrease by approximately $1778. For every mile that the car is further away from Macalester, the price of the car will increase by $19.297. As the car is further from Macalester, the effect of the age on price decreases. The graph of the fitted model and the prices follows:

xyplot(Price+fitted(mod2)~Age, data = cars)

plot of chunk unnamed-chunk-15

Model 3

mod3 = lm( Price ~ Milage, data=cars)
coef(mod3)
## (Intercept)      Milage 
##  51127.0621     -0.1877

According to this model, a brand new Ford350 would cost $51127. For every mile that the car travels the price will decrease by approximately $0.1877. The graph of the fitted model and the prices follows:

xyplot(Price+fitted(mod3)~Milage, data = cars)

plot of chunk unnamed-chunk-17

Model 4

mod4 = lm( Price ~ Location, data=cars)
coef(mod4)
## (Intercept)    Location 
##   40432.263       2.754

According to this model, a brand new Ford350 would cost $40432. For every mile that a car is further away from the zip code, the price will increase by $2.75. However, as seen from the graph, the data are very scattered, showing relatively low correlation between the distance that the car is from 55105 and price. The graph of the fitted model and the prices follows:

xyplot(Price+fitted(mod4)~Location, data = cars)

plot of chunk unnamed-chunk-19

Model 5

mod5 = lm( Price ~ Age, data=cars)
coef(mod5)
## (Intercept)         Age 
##       56971       -4426

According to this model, a brand new Ford350 would cost $56971. For every year that the car has been used the price will decrease by $4426. The graph of the fitted model and the prices follows:

xyplot(Price+fitted(mod5)~Age, data = cars)

plot of chunk unnamed-chunk-21