Environment Setting

Package Loading

In order to provide an analysis of the car performance, we used psych, tidyversee, sm packages

library(psych)
## Warning: package 'psych' was built under R version 4.1.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.8     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x ggplot2::%+%()   masks psych::%+%()
## x ggplot2::alpha() masks psych::alpha()
## x dplyr::filter()  masks stats::filter()
## x dplyr::lag()     masks stats::lag()
library(sm)
## Warning: package 'sm' was built under R version 4.1.2
## Package 'sm', version 2.2-5.7: type help(sm) for summary information

Loading Data

We now read the data from the file since, as was previously indicated, Data that I used was ‘MTCARS’, which includes information about car’s specification and performance. The “mtcars” dataset includes information on 32 cars in 11 categories related to performance and design. Therefore, the “mtcars” dataset includes 32 data samples with 11 performance-related factors for cars. And the main focus of this research will be on the quarter-mile time of autos and the elements that affect it.

data <-read.csv('../data/data.csv')

Analysis

Descriptive Statstics

For the starting point, I used descriptive statistics for summary and desribe the overall situation. It can help us have better understanding of the general performance of cars.

summary(data)
##       X                  mpg             cyl             disp      
##  Length:32          Min.   :10.40   Min.   :4.000   Min.   : 71.1  
##  Class :character   1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8  
##  Mode  :character   Median :19.20   Median :6.000   Median :196.3  
##                     Mean   :20.09   Mean   :6.188   Mean   :230.7  
##                     3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0  
##                     Max.   :33.90   Max.   :8.000   Max.   :472.0  
##        hp             drat             wt             qsec      
##  Min.   : 52.0   Min.   :2.760   Min.   :1.513   Min.   :14.50  
##  1st Qu.: 96.5   1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89  
##  Median :123.0   Median :3.695   Median :3.325   Median :17.71  
##  Mean   :146.7   Mean   :3.597   Mean   :3.217   Mean   :17.85  
##  3rd Qu.:180.0   3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90  
##  Max.   :335.0   Max.   :4.930   Max.   :5.424   Max.   :22.90  
##        vs               am              gear            carb      
##  Min.   :0.0000   Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4375   Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :5.000   Max.   :8.000
describe(data)
##      vars  n   mean     sd median trimmed    mad   min    max  range  skew
## X*      1 32  16.50   9.38  16.50   16.50  11.86  1.00  32.00  31.00  0.00
## mpg     2 32  20.09   6.03  19.20   19.70   5.41 10.40  33.90  23.50  0.61
## cyl     3 32   6.19   1.79   6.00    6.23   2.97  4.00   8.00   4.00 -0.17
## disp    4 32 230.72 123.94 196.30  222.52 140.48 71.10 472.00 400.90  0.38
## hp      5 32 146.69  68.56 123.00  141.19  77.10 52.00 335.00 283.00  0.73
## drat    6 32   3.60   0.53   3.70    3.58   0.70  2.76   4.93   2.17  0.27
## wt      7 32   3.22   0.98   3.33    3.15   0.77  1.51   5.42   3.91  0.42
## qsec    8 32  17.85   1.79  17.71   17.83   1.42 14.50  22.90   8.40  0.37
## vs      9 32   0.44   0.50   0.00    0.42   0.00  0.00   1.00   1.00  0.24
## am     10 32   0.41   0.50   0.00    0.38   0.00  0.00   1.00   1.00  0.36
## gear   11 32   3.69   0.74   4.00    3.62   1.48  3.00   5.00   2.00  0.53
## carb   12 32   2.81   1.62   2.00    2.65   1.48  1.00   8.00   7.00  1.05
##      kurtosis    se
## X*      -1.31  1.66
## mpg     -0.37  1.07
## cyl     -1.76  0.32
## disp    -1.21 21.91
## hp      -0.14 12.12
## drat    -0.71  0.09
## wt      -0.02  0.17
## qsec     0.34  0.32
## vs      -2.00  0.09
## am      -1.92  0.09
## gear    -1.07  0.13
## carb     1.26  0.29

Finally, let me conclude what we did in this part: In order to present a comprehensive view of a car’s performance, we first summarize the data information of the 11 automotive indicators. The output shows the data for 32 cars as “mean, minimum, 1st quartile, median, 3rd quartile, maximum.” Miles per gallon (mpg), the number of cylinders (cyl), displacement (disp), gross horsepower (hp), rear axle ratio (drat), weight (wt), quarter-mile time (qsec), engine (vs), transmission (am), number of forward gears (gear), and number of carburetors (carb) are some of the other metrics. Next, we “describe” 11 cars indicators including standard deviation, range and ect.

Data Manipulation

During the time that I review the data, I wondered the relationship between the speed and transmission type of the car NOTE: data_a is automatic transmission cars; data_b is manual transmission cars

table(data$am)
## 
##  0  1 
## 19 13
data_a <- filter(data,am == 0)
data_b <- filter(data, am != 0)
mean(data_a$qsec)
## [1] 18.18316
mean(data_b$qsec)
## [1] 17.36

The average of automatic transmission car’s quarter-mile time is 17.1473684. The average of manumal transmission car’s quarter-mile time is 24.3923077. WHICH means, the latter are better. ## Visualization Here are the visualization (historgram, kernel Density, boxplot) of the data, help us have better understanding

hist(data$qsec)

plot(density(data$qsec))

sm.density.compare(data$qsec, data$am, model='equal')
## Test of equal densities:  p-value =  0.55

boxplot(qsec ~ am, data=data)

Modeling

T-test

T-test is used to check the correctness of difference in car speed (1/4 mile time) between these two types of cars in sample.

m1 <- t.test(qsec ~ am, data = data)

According to results, p-value=0.23093, which > tham 0.05, meaning we can not reject the null hypothesis. Thus, the difference is not siginificant

m2 <- lm(qsec ~ am, data = data)
summary(m2)
## 
## Call:
## lm(formula = qsec ~ am, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8600 -0.9583 -0.3516  1.2517  4.7168 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.1832     0.4056  44.833   <2e-16 ***
## am           -0.8232     0.6363  -1.294    0.206    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.768 on 30 degrees of freedom
## Multiple R-squared:  0.05284,    Adjusted R-squared:  0.02126 
## F-statistic: 1.674 on 1 and 30 DF,  p-value: 0.2057

As we can seen, the coefficient estimate of am is -0.8232, which means when on additional increase, there will have 0.8232 increase in 1/4 mile time. On the other hand, p-value is 0.2057, so there is no significant different in this regression model.

In conclusion, the automatic transmission cars have longer time than munual transmission cars in 1/4 mile time. Although manual one have fasr speed, but not significant in difference.