Title: Analyzing Car Performance Data with Statistical Methods
In this blog entry, we get into the world of automotive engineering using the “cars” dataset available in R. Cars are integral parts of our daily lives, and understanding their performance characteristics is essential for both consumers and manufacturers. The dataset contains various attributes of cars such as miles per gallon (mpg), cylinders, displacement, horsepower, weight, acceleration, model year, and origin. Through statistical methods, we aim to uncover insights into the factors that influence car performance and explore the relationships between different attributes.
#Loading library
library(ggplot2)
#Loading cars dataset
data(cars)
#Exploring the structure of the dataset
str(cars)## 'data.frame': 50 obs. of 2 variables:
## $ speed: num 4 4 7 7 8 9 10 10 10 11 ...
## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
#Histogram of speed
ggplot(cars, aes(x = speed)) +
geom_histogram(binwidth = 2, fill = "skyblue", color = "black") +
labs(title = "Distribution of Speed",
x = "Speed",
y = "Frequency")#Scatter plot of speed vs. distance
ggplot(cars, aes(x = speed, y = dist)) +
geom_point(color = "darkblue") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Scatter Plot of Speed vs. Distance",
x = "Speed",
y = "Distance")## `geom_smooth()` using formula = 'y ~ x'
#Correlation between speed and distance
correlation <- cor(cars$speed, cars$dist)
cat("Correlation between Speed and Distance:", correlation, "\n")## Correlation between Speed and Distance: 0.8068949
#Running linear regression
model <- lm(dist ~ speed, data = cars)
#Summary of the regression model
summary(model)##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
In conclusion, the analysis of car performance data using statistical methods offers valuable insights into the factors influencing car performance and the relationships between different attributes. By leveraging statistical techniques such as EDA, correlation analysis, and regression analysis, we gain a deeper understanding of car characteristics and their impact on performance and fuel efficiency. This analysis not only benefits consumers making car purchasing decisions but also manufacturers seeking to enhance the performance and fuel efficiency of their vehicles.