This analysis tries to visualise used car prices based on age, kilometers driven and horsepower. The dataset comes from kaggle and can be found here: https://www.kaggle.com/orgesleka/used-cars-database. Plotly is used to create the graph. After a bit of data cleaning a sample of 5000 cars is taken to limit the computational effort when plotting.
library(plotly)
library(dplyr)
library(data.table)
cars <- fread(file = "E:/Coursera/DataProducts/autos.csv")
##
Read 69.9% of 371824 rows
Read 99.5% of 371824 rows
Read 371824 rows and 20 (of 20) columns from 0.064 GB file in 00:00:04
cars <- cars[,c(5,8,10)]
cars <- cars[complete.cases(cars),]
cars <- mutate(cars, age = 2016 - yearOfRegistration)
cars <- cars[cars$age <= 30,]
cars <- cars[cars$age >= 0,]
cars <- cars[cars$price <= 1E5,]
cars <- cars[cars$powerPS >10,]
cars <- cars[cars$powerPS<500,]
cars <- sample_n(cars, 5000)
A 3D scatter plot:
p <- plot_ly(cars, x = ~powerPS, y = ~age, z = ~price,marker = list(color = ~price, colorscale = c('#FF1111', '#664444'), showscale = TRUE)) %>% add_markers() %>%
layout(scene = list(xaxis = list(title = 'Horsepower'),
yaxis = list(title = 'Age'),
zaxis = list(title = 'Price')))
p