Title: Understanding Probability Distributions
In this blog entry, we delve into the exploration and analysis of automotive data using the mtcars dataset available in R. This dataset comprises measurements of various attributes for 32 different car models from the early 1970s, including metrics like miles per gallon (mpg), horsepower (hp), and weight (wt), among others.
Our analysis begins with a structural overview of the dataset, providing insights into its composition and variables. We then proceed to conduct a series of exploratory data analysis (EDA) tasks to uncover patterns, relationships, and distributions within the data.
To understand the interplay between different variables, we utilize visualization techniques such as scatterplot matrices, boxplots, histograms, and density plots. These visualizations offer valuable insights into the relationships between mpg, hp, and wt, allowing us to identify trends and potential outliers.
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
#Correlation analysis
correlation_matrix <- cor(mtcars[, c("mpg", "hp", "wt")])
print(correlation_matrix)## mpg hp wt
## mpg 1.0000000 -0.7761684 -0.8676594
## hp -0.7761684 1.0000000 0.6587479
## wt -0.8676594 0.6587479 1.0000000
#Scatterplot matrix
pairs(~ mpg + hp + wt, data = mtcars, main = "Scatterplot Matrix of mpg, hp, and wt")#Boxplot of mpg grouped by number of cylinders
boxplot(mpg ~ cyl, data = mtcars, main = "Boxplot of mpg by number of cylinders",
xlab = "Number of Cylinders", ylab = "Miles per Gallon")#Density plot of weight
plot(density(mtcars$wt), main = "Density Plot of Weight", xlab = "Weight", ylab = "Density")#Scatterplot of mpg vs. hp colored by number of cylinders
plot(mpg ~ hp, data = mtcars, col = mtcars$cyl, pch = 16, main = "Scatterplot of mpg vs. hp",
xlab = "Horsepower", ylab = "Miles per Gallon")
legend("topright", legend = unique(mtcars$cyl), col = unique(mtcars$cyl), pch = 16)#Regression modeling
#Simple linear regression
model_simple <- lm(mpg ~ hp, data = mtcars)
summary(model_simple)##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7121 -2.1122 -0.8854 1.5819 8.2360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
## hp -0.06823 0.01012 -6.742 1.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
#Plotting the regression line
plot(mtcars$hp, mtcars$mpg, main = "Simple Linear Regression of mpg on hp",
xlab = "Horsepower", ylab = "Miles per Gallon")
abline(model_simple, col = "red")#Multiple linear regression
model_multiple <- lm(mpg ~ hp + wt, data = mtcars)
summary(model_multiple)##
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.941 -1.600 -0.182 1.050 5.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
## hp -0.03177 0.00903 -3.519 0.00145 **
## wt -3.87783 0.63273 -6.129 1.12e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
#Plotting the regression plane
#install.packages("scatterplot3d")
library(scatterplot3d)
s3d <- scatterplot3d(mtcars$hp, mtcars$wt, mtcars$mpg, main = "Multiple Linear Regression of mpg on hp and wt",
xlab = "Horsepower", ylab = "Weight", zlab = "Miles per Gallon")
s3d$plane3d(model_multiple, col = "blue", lty = "dotted")##Conclusion
Overall, this blog entry demonstrates the power of statistical analysis and visualization techniques in uncovering insights from real-life datasets. By exploring automotive data through the lens of statistical methods, we gain valuable insights into the factors influencing fuel efficiency and performance, paving the way for informed decision-making and further exploration in the field of automotive engineering and transportation.