Data Description

The data set contains information about used cars listings. It has information about cars that belongs to around 9 different brands, their model details, mileage, fuel type, tax details etc. I have taken this data from Kaggle Used Car Data set. The data from used has cars data registered from 1970 to 2020. But data from 1970 to 2000 is not frequent so for the sake of analysis I have used data from 2000 to 2020 for visualization and removed data from 1970 to 2000.

My aim is to plot histogram for mpg across 6 brands Audi, Mercedez, BMW, Toyota,Ford,Hyundai based on fuel types.

So i have made modifications on the data set to filter the data that i want to analyze as shown in code below:

library(tidyverse)
usedcars = read_csv('/Users/swethainuganti/Desktop/Cincinnati /1st Sem/Data Wrangling/week 5/Week 5/Homework/archive/cars_used.csv')
#View(usedcars)
usedcars <- filter(usedcars, year >= 2000)
usedcars <- filter(usedcars, mpg < 100)
usedcars <- filter(usedcars, transmission != 'Other')
usedcars <- filter(usedcars, fuelType != 'Other')
usedcars <- filter(usedcars, brand %in% c('audi','bmw','toyota','ford','merc','hyundi'))
#View(usedcars)

Data Visualization

ggplot(usedcars) + 
  geom_histogram(mapping = aes(x=mpg, color=fuelType), bins=30, fill='black') + 
  facet_grid(rows = vars(transmission), cols = vars(brand))