1 CAR DATASET

This car data set will provide the readers some information about the cars and all related things to cars such as reviews for the cars from multiple sources of website, fuel type which is being used by specific cars, and all other information which will enrich the knowledge about cars.

At the end of this project, I hope the reader will comprehend things that matters as what interest people to review the car, and what is other thing that leave people a good impression towards that specific car as well. Let’s wait no more and dive deep into the data :)

2 DATA PREPARATION

2.1 Import Data

car <- read.csv("CARS_1.csv")

Now the data has been imported, let’s moving onto inspect the data!

2.2 Inspect Data

head(car)
dim(car)
names(car)

After inspecting the data, we can draw us some conclusions: 1. The Data consists of 203 rows and 16 columns 2. column “car_name” consists of the names of the cars, and other columns giving out different kinds of information about specific car in each row.

2.3 Data Cleansing & Coertions

str(car)
car$seating_capacity <- as.integer(car$seating_capacity)

We’ve got all the data types as it should, we will be treating the missing values. But first, we gotta check for it whether if there is any.

2.4 Check Missing Values

colSums(is.na(car))

of all columns, we can see that there is one missing value on seating_capacity column. Now, we’re gonna treat this missing value with excluding this missing value out of the data.

car <- na.exclude(car)

after doing the data some treatment, we will re-check if the missing data has been excluded successfully.

colSums(is.na(car))
dim(car)

we see that there is no more missing value on the data in any column. as for the dimension, the total rows has been deducted one single row to 202 rows from its initial form, which was 203 rows. As the data has been cleansed and coerced, we can now proceed to process, analyze, and explain the data.

3 DATA EXPLANATION

3.1 Summarize the data set

summary(car)

From the summary of data, we can conclude several things as follows:

  1. There are 202 cars on the data

  2. The minimum rating on the single car is 3 and the maximum rating of a single car is 5. There is also the mean rating for all cars which is 4.43

  3. There is a car which only got 1 review, and there is another car which got an outstanding 2392 reviews.

  4. The seating capacity for cars on the data ranging from 2 seaters to 8 seaters.

  5. The cheapest car’s starting price is standing at 339,000 meanwhile for the most expensive car’s starting price is at an amazing price of 70,600,000

  6. The cheapest car’s ending price is at 361.000, as for the most expensive car’s end price is stood at an awesome price of 90,000,000

  7. The lowest max torque of a car by NM is 16.1, as for the highest max torque by NM is 1020.0

  8. The lowest max torque of a car by RPM is 0, as for the highest max torque by RPM is 7,000

  9. The lowest max power by BHP is 10.8 and for the highest of it is standing at 788.5

  10. The lowest max power by RP is 0 meanwhile for the highest of it is at 8,500

table(car$body_type)
  1. Based on body type, most cars on the data are the SUVs
table(car$fuel_type)
  1. Based on fuel type, most cars are the Petrol ones
table(car$transmission_type)
  1. While based on transmission type, most cars are automatic

4 WHAT PEOPLE THINK ABOUT CAR’S..

4.1 Body Type

Convertible <- (mean(car$rating[car$body_type == "Convertible"]))
Coupe <- (mean(car$rating[car$body_type == "Coupe"]))
Hatchback <- (mean(car$rating[car$body_type == "Hatchback"]))
Hybrid <- (mean(car$rating[car$body_type == "Hybrid"]))
Luxury <- (mean(car$rating[car$body_type == "Luxury"]))
Minivan <- (mean(car$rating[car$body_type == "Minivan"]))
MUV <- (mean(car$rating[car$body_type == "MUV"]))
Pickup_truck <- (mean(car$rating[car$body_type == "Pickup Truck"]))
Sedan <- (mean(car$rating[car$body_type == "Sedan"]))
SUV <- (mean(car$rating[car$body_type == "SUV"]))
Wagon <- (mean(car$rating[car$body_type == "Wagon"]))

bt_mean <- cbind(body_type=c("Convertible","Coupe","Hatchback","Hybrid","Luxury","Minivan","MUV","Pickup Truck","Sedan", "SUV","Wagon"))
bt_mean <- cbind(bt_mean,as.data.frame(c(Convertible,Coupe,Hatchback,Hybrid,Luxury,Minivan,MUV,Pickup_truck,Sedan,SUV,Wagon)))
names(bt_mean)[2] <- paste("Mean")
btmo <- bt_mean[order(bt_mean$Mean, decreasing= T),]
btmo

Based on Body Type, cars with kind of body type Coupe, Hybrid, Luxury, Minivan, MUV, Pickup Truck, and Wagon, has the best mean of rating of 4.5

4.2 Fuel Type

CNG.rev <- (mean(car$rating[car$fuel_type == "CNG"]))
Diesel.rev <- (mean(car$rating[car$fuel_type == "Diesel"]))
Electric.rev <- (mean(car$rating[car$fuel_type == "Electric"]))
Petrol.rev <- (mean(car$rating[car$fuel_type == "Petrol"]))

ft_mean <- cbind(fuel_type=c("CNG","Diesel","Electric","Petrol"))
ft_mean <- cbind(ft_mean,as.data.frame(c(CNG.rev,Diesel.rev,Electric.rev,Petrol.rev)))
names(ft_mean)[2] <- paste("Mean")
ftmo <- ft_mean[order(ft_mean$Mean, decreasing= T),]
ftmo

If we look at based on the fuel type, the best rating mean goes to Electric for the highest mean rating with slightly better than the Diesel ones

4.3 Transmission Type

AutomaticTT <- (mean(car$rating[car$transmission_type == "Automatic"]))
ManualTT <- (mean(car$rating[car$transmission_type == "Manual"]))
ElectricTT <- (mean(car$rating[car$transmission_type == "Electric"]))

tt_mean <- cbind(transmission_type=c("Automatic","Manual","Electric"))
tt_mean <- cbind(tt_mean,as.data.frame(c(AutomaticTT,ManualTT,ElectricTT)))
names(tt_mean)[2] <- paste("Mean")
ttmo <- tt_mean[order(tt_mean$Mean, decreasing= T),]
ttmo

Based on the transmission type, the best rating with the highest mean is the Electric transmission car at 4.46.

Therefore, the best car with the highest rating is Electric Transmission Coupe Car with Electricity as the Fuel.

5 SEVERAL CORRELATION BETWEEN CAR’S ATTRIBUTES AND CAR’S RATING

cor(car$max_power_rp,car$rating)
cor(car$max_torque_nm,car$rating)
cor(car$max_torque_rpm,car$rating)
cor(car$max_power_bhp,car$rating)
cor(car$no_cylinder,car$rating)
cor(car$fuel_tank_capacity,car$rating)
cor(car$seating_capacity,car$rating)

from the correlation between data above, we can conclude that no performance of a car impacts the rating given by the audience, although there is somehow a weak positive correlation between the fuel tank capacity, and the seating capacity with the car’s rating

6 PROPORTION OF CAR REVIEWED BASED ON

6.1 Fuel Type

agg1 <- aggregate(reviews_count~fuel_type,car,sum)
pie(agg1$reviews_count, agg1$fuel_type, main = "Car Reviewed Based on Fuel Type") 

The most reviewed car based on the fuel type is the Diesel car

6.2 Body Type

agg2 <- aggregate(reviews_count~body_type,car,sum)
agg22 <- agg2[order(agg2$reviews_count, decreasing=T),]
agg222 <- head(agg22)
pie(agg222$reviews_count, agg222$body_type, cex=0.5, main = "Car Reviewed Based on Body Type")

The most reviewed car based on the body type is the SUV car

6.3 Transmission Type

agg3 <- aggregate(reviews_count~transmission_type,car,sum)
pie(agg3$reviews_count, agg3$transmission_type, main = "Car Reviewed Based on Transmission Type")

The most reviewed car based on the transmission is the Automatic car

Therefore, the most reviewed car is automatic transmission SUV with Diesel as a Fuel.

7 CONCLUSION

From the Data set, we can see that the automatic transmission SUV with Diesel as a fuel has a higher chance of getting the most review by people. However, given the fact that that specific type of car is drawing people attention to put a review on it, that is not the case with how good people will give the rating to it. The best rating given is to the electric transmission Coupe car with Electricity as a fuel regardless the performances of a car. As the data correlation shown above, the performance of a car are negating every possibility to positively correlates, does not matter whether it’s the performance by how much base horse power or the torques that the car entitled. However, the seating capacity and the fuel tank capacity has a little effect to the car’s rating. This case where Electric car is preferable, could be happened related to the reason of global warming, where green energy is more preferable by most of the people all around the world.

---
title: "Car"
author: "Sholah"
date: "2022-10-10"
output: 
  html_notebook: 
    code_folding: hide
    highlight: zenburn
    theme: spacelab
    fig_caption: yes
    number_sections: yes
    toc: True
    toc_float: True
---
# CAR DATASET
  This car data set will provide the readers some information about the cars and all related things to cars such as reviews for the cars from multiple sources of website, fuel type which is being used by specific cars, and all other information which will enrich the knowledge about cars. 
  
  At the end of this project, I hope the reader will comprehend things that matters as what interest people to review the car, and what is other thing that leave people a good impression towards that specific car as well. Let's wait no more and dive deep into the data :)
  
# DATA PREPARATION

## Import  Data
```{r}
car <- read.csv("CARS_1.csv")
```
Now the data has been imported, let's moving onto inspect the data!

## Inspect Data
```{r}
head(car)
```
```{r}
dim(car)
```

```{r}
names(car)
```
After inspecting the data, we can draw us some conclusions:
1. The Data consists of 203 rows and 16 columns
2. column "car_name" consists of the names of the cars, and other columns giving out different kinds of  information about specific car in each row. 

## Data Cleansing & Coertions
```{r}
str(car)
```


```{r}
car$seating_capacity <- as.integer(car$seating_capacity)
```
We've got all the data types as it should, we will be treating the missing values. But first, we gotta check for it whether if there is any.

## Check Missing Values

```{r}
colSums(is.na(car))
```
of all columns, we can see that there is one missing value on seating_capacity column. Now, we're gonna treat this missing value with excluding this missing value out of the data.
```{r}
car <- na.exclude(car)
```
after doing the data some treatment, we will re-check if the missing data has been excluded successfully.
```{r}
colSums(is.na(car))
```
```{r}
dim(car)
```

we see that there is no more missing value on the data in any column. as for the dimension, the total rows has been deducted one single row to 202 rows from its initial form, which was 203 rows. As the data has been cleansed and coerced, we can now proceed to process, analyze, and explain the data. 

# DATA EXPLANATION
## Summarize the data set

```{r}
summary(car)
```

From the summary of data, we can conclude several things as follows:

1. There are 202 cars on the data

2. The minimum rating on the single car is 3 and the maximum rating of a single car is 5. There is also the mean rating for all cars which is 4.43

3. There is a car which only got 1 review, and there is another car which got an outstanding 2392 reviews.

4. The seating capacity for cars on the data ranging from 2 seaters to 8 seaters. 

5. The cheapest car's starting price is standing at 339,000 meanwhile for the most expensive car's starting price is at an amazing price of 70,600,000

6. The cheapest car's ending price is at 361.000, as for the most expensive car's end price is stood at an awesome price of 90,000,000

7. The lowest max torque of a car by NM is 16.1, as for the highest max torque by NM is 1020.0

8. The lowest max torque of a car by RPM is 0, as for the highest max torque by RPM is 7,000

9. The lowest max power by BHP is 10.8 and for the highest of it is standing at 788.5

10. The lowest max power by RP is 0 meanwhile for the highest of it is at 8,500

```{r}
table(car$body_type)
```
11. Based on body type, most cars on the data are the SUVs
```{r}
table(car$fuel_type)
```
12. Based on fuel type, most cars are the Petrol ones
```{r}
table(car$transmission_type)
```
13. While based on transmission type, most cars are automatic


# WHAT PEOPLE THINK ABOUT CAR'S..
## Body Type
```{r}
Convertible <- (mean(car$rating[car$body_type == "Convertible"]))
Coupe <- (mean(car$rating[car$body_type == "Coupe"]))
Hatchback <- (mean(car$rating[car$body_type == "Hatchback"]))
Hybrid <- (mean(car$rating[car$body_type == "Hybrid"]))
Luxury <- (mean(car$rating[car$body_type == "Luxury"]))
Minivan <- (mean(car$rating[car$body_type == "Minivan"]))
MUV <- (mean(car$rating[car$body_type == "MUV"]))
Pickup_truck <- (mean(car$rating[car$body_type == "Pickup Truck"]))
Sedan <- (mean(car$rating[car$body_type == "Sedan"]))
SUV <- (mean(car$rating[car$body_type == "SUV"]))
Wagon <- (mean(car$rating[car$body_type == "Wagon"]))

bt_mean <- cbind(body_type=c("Convertible","Coupe","Hatchback","Hybrid","Luxury","Minivan","MUV","Pickup Truck","Sedan", "SUV","Wagon"))
bt_mean <- cbind(bt_mean,as.data.frame(c(Convertible,Coupe,Hatchback,Hybrid,Luxury,Minivan,MUV,Pickup_truck,Sedan,SUV,Wagon)))
names(bt_mean)[2] <- paste("Mean")
btmo <- bt_mean[order(bt_mean$Mean, decreasing= T),]
btmo
```

**Based on Body Type, cars with kind of body type Coupe**, Hybrid, Luxury, Minivan, MUV, Pickup Truck, and Wagon, **has the best mean of rating of 4.5**

## Fuel Type
```{r}
CNG.rev <- (mean(car$rating[car$fuel_type == "CNG"]))
Diesel.rev <- (mean(car$rating[car$fuel_type == "Diesel"]))
Electric.rev <- (mean(car$rating[car$fuel_type == "Electric"]))
Petrol.rev <- (mean(car$rating[car$fuel_type == "Petrol"]))

ft_mean <- cbind(fuel_type=c("CNG","Diesel","Electric","Petrol"))
ft_mean <- cbind(ft_mean,as.data.frame(c(CNG.rev,Diesel.rev,Electric.rev,Petrol.rev)))
names(ft_mean)[2] <- paste("Mean")
ftmo <- ft_mean[order(ft_mean$Mean, decreasing= T),]
ftmo
```


If we look at **based on the fuel type, the best rating** mean goes to **Electric** for the highest mean rating with slightly better than the Diesel ones

## Transmission Type

```{r}
AutomaticTT <- (mean(car$rating[car$transmission_type == "Automatic"]))
ManualTT <- (mean(car$rating[car$transmission_type == "Manual"]))
ElectricTT <- (mean(car$rating[car$transmission_type == "Electric"]))

tt_mean <- cbind(transmission_type=c("Automatic","Manual","Electric"))
tt_mean <- cbind(tt_mean,as.data.frame(c(AutomaticTT,ManualTT,ElectricTT)))
names(tt_mean)[2] <- paste("Mean")
ttmo <- tt_mean[order(tt_mean$Mean, decreasing= T),]
ttmo
```

**Based on the transmission type, the best rating** with the highest mean is the **Electric** transmission car at 4.46.

Therefore, **the best car with the highest rating** is **Electric Transmission Coupe Car with Electricity as the Fuel**.

# SEVERAL CORRELATION BETWEEN CAR'S ATTRIBUTES AND CAR'S RATING

```{r}
cor(car$max_power_rp,car$rating)
cor(car$max_torque_nm,car$rating)
cor(car$max_torque_rpm,car$rating)
cor(car$max_power_bhp,car$rating)
cor(car$no_cylinder,car$rating)
cor(car$fuel_tank_capacity,car$rating)
cor(car$seating_capacity,car$rating)
```
from the correlation between data above, we can conclude that **no performance of a car impacts the rating given by the audience**, although there is somehow a **weak positive correlation** between **the fuel tank capacity, and the seating capacity with the car's rating**


# PROPORTION OF CAR REVIEWED BASED ON
## Fuel Type

```{r}
agg1 <- aggregate(reviews_count~fuel_type,car,sum)
pie(agg1$reviews_count, agg1$fuel_type, main = "Car Reviewed Based on Fuel Type") 
```


The most **reviewed** car based on the **fuel type** is the **Diesel car**

## Body Type
```{r}
agg2 <- aggregate(reviews_count~body_type,car,sum)
agg22 <- agg2[order(agg2$reviews_count, decreasing=T),]
agg222 <- head(agg22)
pie(agg222$reviews_count, agg222$body_type, cex=0.5, main = "Car Reviewed Based on Body Type")
```


The most **reviewed** car based on the **body type** is the **SUV car**


## Transmission Type
```{r}
agg3 <- aggregate(reviews_count~transmission_type,car,sum)
pie(agg3$reviews_count, agg3$transmission_type, main = "Car Reviewed Based on Transmission Type")
```


The most **reviewed** car based on the **transmission** is the **Automatic car**

Therefore, **the most reviewed car** is **automatic transmission SUV with Diesel as a Fuel**. 

# CONCLUSION
From the Data set, we can see that the **automatic transmission SUV with Diesel as a fuel has a higher chance of getting the most review** by people. However, given the fact that that specific type of car is drawing people attention to put a review on it, that is not the case with how good people will give the rating to it. **The best rating given is to the electric transmission Coupe car with Electricity as a fuel regardless the performances of a car**. As the data correlation shown above, **the performance of a car are negating every possibility to positively correlates,** does not matter whether it's the performance by how much base horse power or the torques that the car entitled. **However, the seating capacity and the fuel tank capacity has a little effect to the car's rating.** This case where Electric car is preferable, could be happened related to the reason of global warming, where green energy is more preferable by most of the people all around the world. 