For this project I extracted data from VIN numbers from a used car dataset to help answer some business questions about the used car marketplace.
The question I will try to answer are…
library(dplyr)
library(ggplot2)
library(ggthemes)
loc <- 'C:/Users/Owner/Desktop/true_car_listings.csv'
cardata_raw <- read.csv(loc)
cardata <- sample_n(cardata_raw, 100000)
cardata$Vin <- as.character(cardata$Vin)
cardata$country <- substr(cardata$Vin, start = 1, stop = 1)
unique(cardata$country)
for(i in 1:length(cardata$country)){
if(cardata$country[i]=='1'||cardata$country[i]=='4'
||cardata$country[i]=='5'){
cardata$country[i] <- 'USA'
}
else if(cardata$country[i]=='J'){
cardata$country[i] <- 'Japan'
}
else if(cardata$country[i]=='K'){
cardata$country[i] <- 'Korea'
}
else if(cardata$country[i]=='S'){
cardata$country[i] <- 'Germany'
}
else if(cardata$country[i]=='L'){
cardata$country[i] <- 'China'
}
else{
cardata$country[i] <- 'DOP'
}
}
cardata<- cardata[cardata$country != 'DOP', ]
cardata<- cardata[cardata$Mileage <= 250000, ]
carbrands <- c('Toyota','Honda','Ford','Mercedes-Benz', 'Chevrolet')
cardata_brand <- cardata[cardata$Make==carbrands, ]
pl1 <- ggplot(cardata, aes(x=Mileage, y=Price))+
ggtitle('Car depreciation by Manufacturing Country')+
geom_smooth(aes(color=country))+
ylim(0, 100000)
print(pl1)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 64 rows containing non-finite values (stat_smooth).
pl2 <- ggplot(cardata_brand, aes(x=Mileage, y=Price))+
ggtitle('Car depreciation by Brand')+
geom_smooth(aes(color=Make))+
ylim(0, 100000)
print(pl2)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Cars from Germany depreciate the fastest. Also Mercedes-Benz makes cars the depreciate the fastest out of any of the top 5 car brands in America. The stereotype that Mercedes-Benz makes cars that depreciate quickly is true from the data.