1 Data Exploration 1 data(mtcars)
?mtcars
summary(mtcars)
mean(mtcars\(mpg) sd(mtcars\)mpg)
hist(mtcars\(mpg, main = "Distribution of Miles per Gallon", xlab = "Miles per Gallon (mpg)", col = "blue", border = "black") plot(mtcars\)wt, mtcars\(mpg, main = "Weight vs. MPG", xlab = "Weight (1000 lbs)", pch = 19, col = "green") boxplot(mpg ~ am, data = mtcars, main = "MPG by Transmission Type", xlab = "Transmission (0 = Auto, 1 = Manual)", ylab = "Miles per Gallon", col = c("red", "purple")) 2 From the histogram, I can see that majority of the cars in the dataset are between 15 and 25 miles per gallon. Very few reach over 30 mpg. So, it is fair to say majority of cars are moderately fuel-efficient. There is a slight right skew. There is a clear downward slope in the Weight vs. MPG plot, as weight increases, miles per gallon decrease. We can conclude heavier cars tend to consume more fuel. 3 cor(mtcars\)mpg, mtcars) Cylinders have a -0.85 correlation, a strong negative relationship. This is because cars that have more cylinders have larger engines and therefore, burn more fuel per second, and mpg decreases. Weight has a -0.87 correlation, another strong negative relationship.This is because cars that weigh more need more energy, meaning they burn more fuel, and have lower mpg. Finally, displacement has a -0.85 correlation, strong negative. Displacement is total volume of all the engines cylinders. Once again, bigger engines use more fuel, and therefore have lower mpg.
2 Data Processing 1 any(is.na(mtcars)) colSums(is.na(mtcars)) There is no missing data 2 summary(mtcars) unique(mtcars\(cyl) unique(mtcars\)vs) unique(mtcars$am) apply(mtcars, 2, function(x) any (x < 0)) All variables are within valid ranges: binary variables, vs and am, only contain 0 or 1 and numeric variables were all positive and reasonable.
3 Linear Regression using Im 1 model <- lm(mpg ~ ., data = mtcars)
summary(model) 2 Based on the information above, weight is the largest
negative predictor. every 1,000 lb increases in car weight, the mpg
decreases by about 3.9. Heavier cars burn more fuel, reducing
efficiency. Cylinders have a strong negative effect, each one reduces
mpg by about 0.9. Cars with larger engines are typically less fuel
efficient. Transmission (am) has a positive effect. Cars that have am =
1 have about 2.5 higher mpg than automatic cars, so manual cars tend to
be more fuel efficient. 3 Linear regression assumes linearity, constant
variance, independence, and no extreme outliers. Diagnostic plots for
the mpg model show these assumptions are generally met. Residuals are
roughly centered around zero and generally consistent across fitted
values. Approximate normality is suggested and there are no extreme
influential points. Model seems appropriate for analyzing the
relationship between mpg and predictors. 4 model <- lm(mpg ~ ., data
= mtcars) predicted <- predict(model, newdata = mtcars) residuals
<- mtcars\(mpg - predicted
mse <- mean(residuals^2)
mse
5
model_interaction <- lm(mpg ~ . + wt:hp, data = mtcars)
summary(model_interaction)
wt:hp is significant - effect of weight on mpg depends on horsepower.
The
negative effect of weight on mpg becomes slightly smaller for cars with
high
horsepower.
summary(model)\)r.squared summary(model_interaction)\(r.squared
Adding the interaction improved the model fit
wt and hp still negatively affect mpg. The effect of weight on mpg
depends on horsepower. There was a fit improvement, a slightly higher
R2, showing the interaction captures some variation the original model
missed.
6
boxplot(mtcars, main ="Boxplot of mtcars variables")
model <- lm(mpg ~ ., data = mtcars)
rstandard(model)
install.packages("DescTools")
library(DescTools)
mtcars\)hp_w <- DescTools::Winsorize(mtcars\(hp, limits =c(quantile(mtcars\)hp, 0.05),
quantile(mtcars\(hp, 0.95)
p5 <- quantile(mtcars\)hp, 0.05) p95 <-
quantile(mtcars\(hp, 0.95)
mtcars\)hp_w <- mtcars\(hp
mtcars\)hp_w[mtcars\(hp_w < p5]
<- p5
mtcars\)hp_w[mtcars\(hp_w > p95]
<- p95
head(mtcars\)hp_w) str(mtcars) model_original <- lm(mpg ~.,
data = mtcars) model_winsor <- lm(mpg ~ . - hp + hp_w, data =
mtcars)
summary(model_original)\(r.squared
summary(model_winsor)\)r.squared Winsorization reduces the
influence of outliers, especially extreme ones. The model becomes a bit
more stable, coefficients are less extreme. R2 is generally similar or
slightly higher. Predictions for extreme cars are less extreme, which
could improve its ability to be generalized. 7 An improved R2 does not
guarantee better predictability, but is a good sign for in-sample
fit.