#Generate practice dataset
#data <- mtcars
#Write.csv(data, 'data/data.csv')
library(psych)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%() masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(sm)
## Package 'sm', version 2.2-5.7: type help(sm) for summary information
Extracting data from psych, tidyverse and sm.
data <- read.csv('../data/data.csv')
Reading data from data of data.csv
Descriptive statistics are used in this report’s initial analysis of the overall situation of the dataset “mtcars,” which aids in understanding the general performance of the sample’s cars from the perspectives of central tendency and variability of the 11 car variables.
summary(data)
## X mpg cyl disp
## Length:32 Min. :10.40 Min. :4.000 Min. : 71.1
## Class :character 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8
## Mode :character Median :19.20 Median :6.000 Median :196.3
## Mean :20.09 Mean :6.188 Mean :230.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0
## Max. :33.90 Max. :8.000 Max. :472.0
## hp drat wt qsec
## Min. : 52.0 Min. :2.760 Min. :1.513 Min. :14.50
## 1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89
## Median :123.0 Median :3.695 Median :3.325 Median :17.71
## Mean :146.7 Mean :3.597 Mean :3.217 Mean :17.85
## 3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90
## Max. :335.0 Max. :4.930 Max. :5.424 Max. :22.90
## vs am gear carb
## Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4375 Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :8.000
describe(data)
## vars n mean sd median trimmed mad min max range skew
## X* 1 32 16.50 9.38 16.50 16.50 11.86 1.00 32.00 31.00 0.00
## mpg 2 32 20.09 6.03 19.20 19.70 5.41 10.40 33.90 23.50 0.61
## cyl 3 32 6.19 1.79 6.00 6.23 2.97 4.00 8.00 4.00 -0.17
## disp 4 32 230.72 123.94 196.30 222.52 140.48 71.10 472.00 400.90 0.38
## hp 5 32 146.69 68.56 123.00 141.19 77.10 52.00 335.00 283.00 0.73
## drat 6 32 3.60 0.53 3.70 3.58 0.70 2.76 4.93 2.17 0.27
## wt 7 32 3.22 0.98 3.33 3.15 0.77 1.51 5.42 3.91 0.42
## qsec 8 32 17.85 1.79 17.71 17.83 1.42 14.50 22.90 8.40 0.37
## vs 9 32 0.44 0.50 0.00 0.42 0.00 0.00 1.00 1.00 0.24
## am 10 32 0.41 0.50 0.00 0.38 0.00 0.00 1.00 1.00 0.36
## gear 11 32 3.69 0.74 4.00 3.62 1.48 3.00 5.00 2.00 0.53
## carb 12 32 2.81 1.62 2.00 2.65 1.48 1.00 8.00 7.00 1.05
## kurtosis se
## X* -1.31 1.66
## mpg -0.37 1.07
## cyl -1.76 0.32
## disp -1.21 21.91
## hp -0.14 12.12
## drat -0.71 0.09
## wt -0.02 0.17
## qsec 0.34 0.32
## vs -2.00 0.09
## am -1.92 0.09
## gear -1.07 0.13
## carb 1.26 0.29
In order to present a comprehensive view of a car’s performance, we first summarize the data information of the 11 automotive indicators. The output shows the data for 32 cars as “mean, minimum, 1st quartile, median, 3rd quartile, maximum.” Miles per gallon (mpg), the number of cylinders (cyl), displacement (disp), gross horsepower (hp), rear axle ratio (drat), weight (wt), quarter-mile time (qsec), engine (vs), transmission (am), number of forward gears (gear), and number of carburetors (carb) are some of the other metrics.Next, we use the “describe” function to provide additional descriptive metrics for the 11 automobile indications, including the standard deviation, trimmed mean, median absolute deviation, range, skewness, kurtosis, and standard error of the mean.
data_a <- filter(data, am == 0)
data_b <- filter(data, am != 0)
From data a, filter data when am equals 0 From data b, filter data when am equals 1. 19 vehicles with automatic transmissions are represented in the dataset by “0,” while 13 vehicles have manual transmissions and are represented by “1.” We alter the data by splitting the data sample into two categories in accordance with the two transmission kinds because we want to assess the performance of vehicles with various transmission types. Cars with automatic transmissions are represented by “data a” and those with manual transmissions by “data b.”
We first create a histogram of the 1/4 mile time of the automobiles in order to determine the relationship between that time and the transmission type of the cars.
hist(data$qsec)
plot(density(data$qsec))
sm.density.compare(data$qsec, data$am, model = "equal")
## Test of equal densities: p-value = 0.49
Comparing the quarter-mile time distribution of vehicles with two different types of transmissions is made easier with the “sm.density.compare” function. We can see from the graph that vehicles with two different types of transmissions have different 1/4-mile times. The average 1/4-mile time for cars with automatic transmissions is around 17.5 seconds, whereas the average 1/4-mile time for cars with manual transmissions is between 16 and 20 seconds. That is, there is little variation in the amount of vehicles having quarter-mile times between 16 and 20 seconds.
boxplot(qsec ~ am, data = data)
data %>%
ggplot(
aes(x=am, y=qsec, group = am)
)+
geom_boxplot()+
geom_jitter(color="black", size=0.4, alpha=0.9)
We then create a boxplot to display the quarter-mile differences between vehicles with two different types of transmissions, and the Kernel density displays a result that is similar. The maximum, lowest, and median 1/4 mile times for the automatic transmission automobile group are faster than those for the manual transmission car group. In general, automatic transmission cars typically have faster 1/4-mile times than manual transmission cars. The difference is not that big, though.
We can determine that there is a 1/4-mile difference in time between vehicles with two distinct types of transmissions from the kernel density and boxplot. The mean statistic and visualization, however, indicate that the difference is not very significant.
The t-test must be used to determine whether the difference in car speed (1/4 mile time) between the two types of transmissions found in the sample is accurate due to the small size of our sample.
t.test(qsec ~ am, data = data)
##
## Welch Two Sample t-test
##
## data: qsec by am
## t = 1.2878, df = 25.534, p-value = 0.2093
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.4918522 2.1381679
## sample estimates:
## mean in group 0 mean in group 1
## 18.18316 17.36000
Using t-test to compare the means of the two samples. find the relationship between am and qsec, we need to use linear regression, finding the coefficient.
In order to analyze the association between 1/4 mile time and different transmission types for cars, we use the “lm” function.
m1 <- lm(qsec ~ am, data = data)
summary(m1)
##
## Call:
## lm(formula = qsec ~ am, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8600 -0.9583 -0.3516 1.2517 4.7168
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.1832 0.4056 44.833 <2e-16 ***
## am -0.8232 0.6363 -1.294 0.206
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.768 on 30 degrees of freedom
## Multiple R-squared: 0.05284, Adjusted R-squared: 0.02126
## F-statistic: 1.674 on 1 and 30 DF, p-value: 0.2057
The coefficient estimate of am (transmission type) is -0.8232, according to the summary of the regression model’s fit. Therefore, “1/4 mile time=18.1832-0.8232*am (am=0, 1)” can be used to represent the calculated regression equation. Additionally, the regression model is not statistically significant overall with a p-value of 0.2057. Since the difference in 1/4-mile times between automatic and manual gearbox cars is not statistically significant, the linear regression model confirms our earlier findings. This report examines the performance of cars using the dataset “mtcars”. We may infer from 1/4 mile time graphs and modeling that automatic transmission automobiles have greater 1/4 mile times than manual transmission cars, showing the quick speed of manual transmission cars, but the difference is not very large.