Data visualization homework no.1
library(readr)
library(tidyverse)
library(DataExplorer)
library(treemap)
library(scales)
library(dplyr)
library(ggplot2)
cars <- read_csv("train.csv")
Preparing data set to visualize.
data <- select(cars, -ID, -Doors)
data <- as.data.frame(data)
data <- data[rowSums(is.na(data)) == 0, ] # Deleting NA values
data1 <- data[data$Fueltype != "Hydrogen", ]
head(data)
## Price Levy Manufacturer Model Prod.year Category Leatherinterior Fueltype
## 1 13328 1399 LEXUS RX450 2010 Jeep Yes Hybrid
## 2 16621 1018 CHEVROLET Equinox 2011 Jeep No Petrol
## 3 8467 - HONDA FIT 2006 Hatchback No Petrol
## 4 3607 862 FORD Escape 2011 Jeep Yes Hybrid
## 5 11726 446 HONDA FIT 2014 Hatchback Yes Petrol
## 6 39493 891 HYUNDAI SantaFE 2016 Jeep Yes Diesel
## Enginevolume Mileage Cylinders Gearboxtype Drivewheels Wheel Color
## 1 3.5 186005 6 Automatic 4x4 Leftwheel Silver
## 2 3 192000 6 Tiptronic 4x4 Leftwheel Black
## 3 1.3 200000 4 Variator Front Right-handdrive Black
## 4 2.5 168966 4 Automatic 4x4 Leftwheel White
## 5 1.3 91901 4 Automatic Front Leftwheel Silver
## 6 2 160931 4 Automatic Front Leftwheel White
## Airbags
## 1 12
## 2 8
## 3 2
## 4 0
## 5 4
## 6 4
This dataset is is published in https://www.kaggle.com/sidharth178/car-prices-dataset for car price prediction. With the rise in the variety of cars with differentiated capabilities and features such as model, production year, category, brand, fuel type, engine volume, mileage, cylinders, colour, airbags and many more, author is bringing a car price prediction challenge for all.
I chose this dataset for the variables variety and nowadays relevance.
Question 1. What kind of transmission is the most common for each fuel type?
ggplot(data = data1, aes(x = Fueltype, fill = Gearboxtype)) +
geom_bar() +
xlab('FUEL TYPE') +
ylab('COUNT') +
ggtitle('GEARBOX TYPE BASED ON FUEL TYPE') +
labs(fill = "GEARBOX TYPE")
Answer: The most popular is automatic transmission with petrol fuel. The least petrol cars are with continuously variable transmission. 2nd based by popularity fuel type is diesel with automatic transmission.
Question 2. What kind of car brand is the most popular based on the data?
model_count <- data %>%
group_by(Manufacturer)%>%
summarize(count=n())%>%
arrange(desc(count))
model_count_10 <- model_count[1:10,]
ggplot(model_count_10, aes(x="",y=reorder(count,Manufacturer),fill= Manufacturer))+
geom_col(position = "dodge") +
scale_fill_brewer(palette="Paired") +
theme_minimal() +
labs(x="MANUFACTURER",y="COUNT") +
ggtitle('TOP 10 BIGGEST CAR MANUFACTURERS') +
labs(fill = "MANUFACTURER")
The most popular cars manufacturer is hyundai with 3769 counted values. The second is toyota (3661) and the third – mercedes-benz (2073).
Question 3. What kind of cars are the most expensive?
data %>% group_by(Manufacturer) %>% summarise(AVERAGE=mean(Price)) %>%
arrange(desc(AVERAGE)) %>% head(10) %>%
ggplot(aes(x=reorder(Manufacturer,AVERAGE),y=AVERAGE, fill=factor(Manufacturer))) +
geom_bar(stat='identity') + coord_flip() +
theme_light() +
ggtitle("TOP 10 EXPENSIVE CARS") + xlab("MODEL") + ylab("AVERAGE PRICE") +
theme(legend.position="none")
Answer: This graph shows 10 the most expensive cars of the data. It is clearly visible that the most expensive is lamborghini.
Question 4. What is the most popular car color?
color <- data %>%
group_by(Color)%>%
summarize(count=n())%>%
arrange(desc(count))
par(mar = c(7, 4, 2, 2) + 0.2)
barplot(color$count,names.arg = color$Color,col = c("Black", "White", "lightgray", "Grey40", "Blue", "Red", "Green", "Orange", "saddlebrown", "orangered3",
"goldenrod2", "beige", "skyblue1", "yellow", "purple", "pink"), las=2,main="Cars popularity by color")
The most popular colors are black and white, and the least – purple and pink.
Question 5. What kind of car body types are the most common and are they made with leather interior or not?
abs <- ggplot(data) + ggtitle("CARS BODY TYPES BASED ON LEATHER INTERIOR") + xlab("CATEGORY") + ylab("COUNT") + labs(fill = "LEATHER INTERIOR") +
geom_bar(aes(x = Category, fill = Leatherinterior), position = position_dodge(preserve = 'single'))
abs + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Answer: The most popular cars body type is sedan with leather interior.