R Markdown

1 Data Exploration 1 data(mtcars)

?mtcars

summary(mtcars)

mean(mtcars\(mpg) sd(mtcars\)mpg)

hist(mtcars\(mpg, main = "Distribution of Miles per Gallon", xlab = "Miles per Gallon (mpg)", col = "blue", border = "black") plot(mtcars\)wt, mtcars\(mpg, main = "Weight vs. MPG", xlab = "Weight (1000 lbs)", pch = 19, col = "green") boxplot(mpg ~ am, data = mtcars, main = "MPG by Transmission Type", xlab = "Transmission (0 = Auto, 1 = Manual)", ylab = "Miles per Gallon", col = c("red", "purple")) 2 From the histogram, I can see that majority of the cars in the dataset are between 15 and 25 miles per gallon. Very few reach over 30 mpg. So, it is fair to say majority of cars are moderately fuel-efficient. There is a slight right skew. There is a clear downward slope in the Weight vs. MPG plot, as weight increases, miles per gallon decrease. We can conclude heavier cars tend to consume more fuel. 3 cor(mtcars\)mpg, mtcars) Cylinders have a -0.85 correlation, a strong negative relationship. This is because cars that have more cylinders have larger engines and therefore, burn more fuel per second, and mpg decreases. Weight has a -0.87 correlation, another strong negative relationship.This is because cars that weigh more need more energy, meaning they burn more fuel, and have lower mpg. Finally, displacement has a -0.85 correlation, strong negative. Displacement is total volume of all the engines cylinders. Once again, bigger engines use more fuel, and therefore have lower mpg.

2 Data Processing 1 any(is.na(mtcars)) colSums(is.na(mtcars)) There is no missing data 2 summary(mtcars) unique(mtcars\(cyl) unique(mtcars\)vs) unique(mtcars$am) apply(mtcars, 2, function(x) any (x < 0)) All variables are within valid ranges: binary variables, vs and am, only contain 0 or 1 and numeric variables were all positive and reasonable.

3 Linear Regression using Im 1 model <- lm(mpg ~ ., data = mtcars) summary(model) 2 Based on the information above, weight is the largest negative predictor. every 1,000 lb increases in car weight, the mpg decreases by about 3.9. Heavier cars burn more fuel, reducing efficiency. Cylinders have a strong negative effect, each one reduces mpg by about 0.9. Cars with larger engines are typically less fuel efficient. Transmission (am) has a positive effect. Cars that have am = 1 have about 2.5 higher mpg than automatic cars, so manual cars tend to be more fuel efficient. 3 Linear regression assumes linearity, constant variance, independence, and no extreme outliers. Diagnostic plots for the mpg model show these assumptions are generally met. Residuals are roughly centered around zero and generally consistent across fitted values. Approximate normality is suggested and there are no extreme influential points. Model seems appropriate for analyzing the relationship between mpg and predictors. 4 model <- lm(mpg ~ ., data = mtcars) predicted <- predict(model, newdata = mtcars) residuals <- mtcars\(mpg - predicted mse <- mean(residuals^2) mse 5 model_interaction <- lm(mpg ~ . + wt:hp, data = mtcars) summary(model_interaction) wt:hp is significant - effect of weight on mpg depends on horsepower. The negative effect of weight on mpg becomes slightly smaller for cars with high horsepower. summary(model)\)r.squared summary(model_interaction)\(r.squared Adding the interaction improved the model fit wt and hp still negatively affect mpg. The effect of weight on mpg depends on horsepower. There was a fit improvement, a slightly higher R2, showing the interaction captures some variation the original model missed. 6 boxplot(mtcars, main ="Boxplot of mtcars variables") model <- lm(mpg ~ ., data = mtcars) rstandard(model) install.packages("DescTools") library(DescTools) mtcars\)hp_w <- DescTools::Winsorize(mtcars\(hp, limits =c(quantile(mtcars\)hp, 0.05), quantile(mtcars\(hp, 0.95) p5 <- quantile(mtcars\)hp, 0.05) p95 <- quantile(mtcars\(hp, 0.95) mtcars\)hp_w <- mtcars\(hp mtcars\)hp_w[mtcars\(hp_w < p5] <- p5 mtcars\)hp_w[mtcars\(hp_w > p95] <- p95 head(mtcars\)hp_w) str(mtcars) model_original <- lm(mpg ~., data = mtcars) model_winsor <- lm(mpg ~ . - hp + hp_w, data = mtcars)
summary(model_original)\(r.squared summary(model_winsor)\)r.squared Winsorization reduces the influence of outliers, especially extreme ones. The model becomes a bit more stable, coefficients are less extreme. R2 is generally similar or slightly higher. Predictions for extreme cars are less extreme, which could improve its ability to be generalized. 7 An improved R2 does not guarantee better predictability, but is a good sign for in-sample fit.