Blog Entry 2: Understanding Probability Distributions

Title: Understanding Probability Distributions

Overview

In this blog entry, we delve into the exploration and analysis of automotive data using the mtcars dataset available in R. This dataset comprises measurements of various attributes for 32 different car models from the early 1970s, including metrics like miles per gallon (mpg), horsepower (hp), and weight (wt), among others.

Our analysis begins with a structural overview of the dataset, providing insights into its composition and variables. We then proceed to conduct a series of exploratory data analysis (EDA) tasks to uncover patterns, relationships, and distributions within the data.

To understand the interplay between different variables, we utilize visualization techniques such as scatterplot matrices, boxplots, histograms, and density plots. These visualizations offer valuable insights into the relationships between mpg, hp, and wt, allowing us to identify trends and potential outliers.

#Loading the mtcars dataset
data(mtcars)

#Displaying the structure of the dataset
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
#Summary statistics
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
#Correlation analysis
correlation_matrix <- cor(mtcars[, c("mpg", "hp", "wt")])
print(correlation_matrix)
##            mpg         hp         wt
## mpg  1.0000000 -0.7761684 -0.8676594
## hp  -0.7761684  1.0000000  0.6587479
## wt  -0.8676594  0.6587479  1.0000000
#Scatterplot matrix
pairs(~ mpg + hp + wt, data = mtcars, main = "Scatterplot Matrix of mpg, hp, and wt")

#Boxplot of mpg grouped by number of cylinders
boxplot(mpg ~ cyl, data = mtcars, main = "Boxplot of mpg by number of cylinders",
        xlab = "Number of Cylinders", ylab = "Miles per Gallon")

#Histogram of horsepower
hist(mtcars$hp, main = "Histogram of Horsepower", xlab = "Horsepower")

#Density plot of weight
plot(density(mtcars$wt), main = "Density Plot of Weight", xlab = "Weight", ylab = "Density")

#Scatterplot of mpg vs. hp colored by number of cylinders
plot(mpg ~ hp, data = mtcars, col = mtcars$cyl, pch = 16, main = "Scatterplot of mpg vs. hp",
     xlab = "Horsepower", ylab = "Miles per Gallon")
legend("topright", legend = unique(mtcars$cyl), col = unique(mtcars$cyl), pch = 16)

#Regression modeling

#Simple linear regression
model_simple <- lm(mpg ~ hp, data = mtcars)
summary(model_simple)
## 
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07
#Plotting the regression line
plot(mtcars$hp, mtcars$mpg, main = "Simple Linear Regression of mpg on hp",
     xlab = "Horsepower", ylab = "Miles per Gallon")
abline(model_simple, col = "red")

#Multiple linear regression
model_multiple <- lm(mpg ~ hp + wt, data = mtcars)
summary(model_multiple)
## 
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12
#Plotting the regression plane
#install.packages("scatterplot3d")
library(scatterplot3d)
s3d <- scatterplot3d(mtcars$hp, mtcars$wt, mtcars$mpg, main = "Multiple Linear Regression of mpg on hp and wt",
                     xlab = "Horsepower", ylab = "Weight", zlab = "Miles per Gallon")
s3d$plane3d(model_multiple, col = "blue", lty = "dotted")

##Conclusion

Overall, this blog entry demonstrates the power of statistical analysis and visualization techniques in uncovering insights from real-life datasets. By exploring automotive data through the lens of statistical methods, we gain valuable insights into the factors influencing fuel efficiency and performance, paving the way for informed decision-making and further exploration in the field of automotive engineering and transportation.

LS0tDQp0aXRsZTogIkJsb2cgMiBEYXRhIDYyMSINCmF1dGhvcjogIkxhdXJhIEIiDQpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiDQpvdXRwdXQ6IG9wZW5pbnRybzo6bGFiX3JlcG9ydA0KLS0tDQoNCiMjIEJsb2cgRW50cnkgMjogVW5kZXJzdGFuZGluZyBQcm9iYWJpbGl0eSBEaXN0cmlidXRpb25zDQoNCg0KDQpUaXRsZTogVW5kZXJzdGFuZGluZyBQcm9iYWJpbGl0eSBEaXN0cmlidXRpb25zDQoNCiMjIyBPdmVydmlldw0KDQpJbiB0aGlzIGJsb2cgZW50cnksIHdlIGRlbHZlIGludG8gdGhlIGV4cGxvcmF0aW9uIGFuZCBhbmFseXNpcyBvZiBhdXRvbW90aXZlIGRhdGEgdXNpbmcgdGhlIG10Y2FycyBkYXRhc2V0IGF2YWlsYWJsZSBpbiBSLiBUaGlzIGRhdGFzZXQgY29tcHJpc2VzIG1lYXN1cmVtZW50cyBvZiB2YXJpb3VzIGF0dHJpYnV0ZXMgZm9yIDMyIGRpZmZlcmVudCBjYXIgbW9kZWxzIGZyb20gdGhlIGVhcmx5IDE5NzBzLCBpbmNsdWRpbmcgbWV0cmljcyBsaWtlIG1pbGVzIHBlciBnYWxsb24gKG1wZyksIGhvcnNlcG93ZXIgKGhwKSwgYW5kIHdlaWdodCAod3QpLCBhbW9uZyBvdGhlcnMuDQoNCk91ciBhbmFseXNpcyBiZWdpbnMgd2l0aCBhIHN0cnVjdHVyYWwgb3ZlcnZpZXcgb2YgdGhlIGRhdGFzZXQsIHByb3ZpZGluZyBpbnNpZ2h0cyBpbnRvIGl0cyBjb21wb3NpdGlvbiBhbmQgdmFyaWFibGVzLiBXZSB0aGVuIHByb2NlZWQgdG8gY29uZHVjdCBhIHNlcmllcyBvZiBleHBsb3JhdG9yeSBkYXRhIGFuYWx5c2lzIChFREEpIHRhc2tzIHRvIHVuY292ZXIgcGF0dGVybnMsIHJlbGF0aW9uc2hpcHMsIGFuZCBkaXN0cmlidXRpb25zIHdpdGhpbiB0aGUgZGF0YS4NCg0KVG8gdW5kZXJzdGFuZCB0aGUgaW50ZXJwbGF5IGJldHdlZW4gZGlmZmVyZW50IHZhcmlhYmxlcywgd2UgdXRpbGl6ZSB2aXN1YWxpemF0aW9uIHRlY2huaXF1ZXMgc3VjaCBhcyBzY2F0dGVycGxvdCBtYXRyaWNlcywgYm94cGxvdHMsIGhpc3RvZ3JhbXMsIGFuZCBkZW5zaXR5IHBsb3RzLiBUaGVzZSB2aXN1YWxpemF0aW9ucyBvZmZlciB2YWx1YWJsZSBpbnNpZ2h0cyBpbnRvIHRoZSByZWxhdGlvbnNoaXBzIGJldHdlZW4gbXBnLCBocCwgYW5kIHd0LCBhbGxvd2luZyB1cyB0byBpZGVudGlmeSB0cmVuZHMgYW5kIHBvdGVudGlhbCBvdXRsaWVycy4NCg0KYGBge3J9DQojTG9hZGluZyB0aGUgbXRjYXJzIGRhdGFzZXQNCmRhdGEobXRjYXJzKQ0KDQojRGlzcGxheWluZyB0aGUgc3RydWN0dXJlIG9mIHRoZSBkYXRhc2V0DQpzdHIobXRjYXJzKQ0KDQojU3VtbWFyeSBzdGF0aXN0aWNzDQpzdW1tYXJ5KG10Y2FycykNCg0KI0NvcnJlbGF0aW9uIGFuYWx5c2lzDQpjb3JyZWxhdGlvbl9tYXRyaXggPC0gY29yKG10Y2Fyc1ssIGMoIm1wZyIsICJocCIsICJ3dCIpXSkNCnByaW50KGNvcnJlbGF0aW9uX21hdHJpeCkNCg0KI1NjYXR0ZXJwbG90IG1hdHJpeA0KcGFpcnMofiBtcGcgKyBocCArIHd0LCBkYXRhID0gbXRjYXJzLCBtYWluID0gIlNjYXR0ZXJwbG90IE1hdHJpeCBvZiBtcGcsIGhwLCBhbmQgd3QiKQ0KDQojQm94cGxvdCBvZiBtcGcgZ3JvdXBlZCBieSBudW1iZXIgb2YgY3lsaW5kZXJzDQpib3hwbG90KG1wZyB+IGN5bCwgZGF0YSA9IG10Y2FycywgbWFpbiA9ICJCb3hwbG90IG9mIG1wZyBieSBudW1iZXIgb2YgY3lsaW5kZXJzIiwNCiAgICAgICAgeGxhYiA9ICJOdW1iZXIgb2YgQ3lsaW5kZXJzIiwgeWxhYiA9ICJNaWxlcyBwZXIgR2FsbG9uIikNCg0KI0hpc3RvZ3JhbSBvZiBob3JzZXBvd2VyDQpoaXN0KG10Y2FycyRocCwgbWFpbiA9ICJIaXN0b2dyYW0gb2YgSG9yc2Vwb3dlciIsIHhsYWIgPSAiSG9yc2Vwb3dlciIpDQoNCiNEZW5zaXR5IHBsb3Qgb2Ygd2VpZ2h0DQpwbG90KGRlbnNpdHkobXRjYXJzJHd0KSwgbWFpbiA9ICJEZW5zaXR5IFBsb3Qgb2YgV2VpZ2h0IiwgeGxhYiA9ICJXZWlnaHQiLCB5bGFiID0gIkRlbnNpdHkiKQ0KDQojU2NhdHRlcnBsb3Qgb2YgbXBnIHZzLiBocCBjb2xvcmVkIGJ5IG51bWJlciBvZiBjeWxpbmRlcnMNCnBsb3QobXBnIH4gaHAsIGRhdGEgPSBtdGNhcnMsIGNvbCA9IG10Y2FycyRjeWwsIHBjaCA9IDE2LCBtYWluID0gIlNjYXR0ZXJwbG90IG9mIG1wZyB2cy4gaHAiLA0KICAgICB4bGFiID0gIkhvcnNlcG93ZXIiLCB5bGFiID0gIk1pbGVzIHBlciBHYWxsb24iKQ0KbGVnZW5kKCJ0b3ByaWdodCIsIGxlZ2VuZCA9IHVuaXF1ZShtdGNhcnMkY3lsKSwgY29sID0gdW5pcXVlKG10Y2FycyRjeWwpLCBwY2ggPSAxNikNCg0KI1JlZ3Jlc3Npb24gbW9kZWxpbmcNCg0KI1NpbXBsZSBsaW5lYXIgcmVncmVzc2lvbg0KbW9kZWxfc2ltcGxlIDwtIGxtKG1wZyB+IGhwLCBkYXRhID0gbXRjYXJzKQ0Kc3VtbWFyeShtb2RlbF9zaW1wbGUpDQoNCiNQbG90dGluZyB0aGUgcmVncmVzc2lvbiBsaW5lDQpwbG90KG10Y2FycyRocCwgbXRjYXJzJG1wZywgbWFpbiA9ICJTaW1wbGUgTGluZWFyIFJlZ3Jlc3Npb24gb2YgbXBnIG9uIGhwIiwNCiAgICAgeGxhYiA9ICJIb3JzZXBvd2VyIiwgeWxhYiA9ICJNaWxlcyBwZXIgR2FsbG9uIikNCmFibGluZShtb2RlbF9zaW1wbGUsIGNvbCA9ICJyZWQiKQ0KDQojTXVsdGlwbGUgbGluZWFyIHJlZ3Jlc3Npb24NCm1vZGVsX211bHRpcGxlIDwtIGxtKG1wZyB+IGhwICsgd3QsIGRhdGEgPSBtdGNhcnMpDQpzdW1tYXJ5KG1vZGVsX211bHRpcGxlKQ0KDQojUGxvdHRpbmcgdGhlIHJlZ3Jlc3Npb24gcGxhbmUNCiNpbnN0YWxsLnBhY2thZ2VzKCJzY2F0dGVycGxvdDNkIikNCmxpYnJhcnkoc2NhdHRlcnBsb3QzZCkNCnMzZCA8LSBzY2F0dGVycGxvdDNkKG10Y2FycyRocCwgbXRjYXJzJHd0LCBtdGNhcnMkbXBnLCBtYWluID0gIk11bHRpcGxlIExpbmVhciBSZWdyZXNzaW9uIG9mIG1wZyBvbiBocCBhbmQgd3QiLA0KICAgICAgICAgICAgICAgICAgICAgeGxhYiA9ICJIb3JzZXBvd2VyIiwgeWxhYiA9ICJXZWlnaHQiLCB6bGFiID0gIk1pbGVzIHBlciBHYWxsb24iKQ0KczNkJHBsYW5lM2QobW9kZWxfbXVsdGlwbGUsIGNvbCA9ICJibHVlIiwgbHR5ID0gImRvdHRlZCIpDQoNCmBgYA0KDQojI0NvbmNsdXNpb24NCg0KT3ZlcmFsbCwgdGhpcyBibG9nIGVudHJ5IGRlbW9uc3RyYXRlcyB0aGUgcG93ZXIgb2Ygc3RhdGlzdGljYWwgYW5hbHlzaXMgYW5kIHZpc3VhbGl6YXRpb24gdGVjaG5pcXVlcyBpbiB1bmNvdmVyaW5nIGluc2lnaHRzIGZyb20gcmVhbC1saWZlIGRhdGFzZXRzLiBCeSBleHBsb3JpbmcgYXV0b21vdGl2ZSBkYXRhIHRocm91Z2ggdGhlIGxlbnMgb2Ygc3RhdGlzdGljYWwgbWV0aG9kcywgd2UgZ2FpbiB2YWx1YWJsZSBpbnNpZ2h0cyBpbnRvIHRoZSBmYWN0b3JzIGluZmx1ZW5jaW5nIGZ1ZWwgZWZmaWNpZW5jeSBhbmQgcGVyZm9ybWFuY2UsIHBhdmluZyB0aGUgd2F5IGZvciBpbmZvcm1lZCBkZWNpc2lvbi1tYWtpbmcgYW5kIGZ1cnRoZXIgZXhwbG9yYXRpb24gaW4gdGhlIGZpZWxkIG9mIGF1dG9tb3RpdmUgZW5naW5lZXJpbmcgYW5kIHRyYW5zcG9ydGF0aW9uLg0KDQoNCg0K