library(ggplot2)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

data(mtcars)
head(mtcars, 10)

##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

Module Two Problem Set: Interaction Terms and Qualitative Predictors

In this notebook, you have been given a set of steps that will show you how to create multiple regression models that include interaction terms and qualitative predictors in R. It is very important to run the steps in order. Some steps depend on the outputs of earlier steps. Once you have run all the steps, you will be asked to create your own regression models to help you answer the questions in the Module Two Problem Set. You are expected to write the R script yourself to answer these questions.

Reminder: If you have not already reviewed the Problem Set Report template for your Module Two Problem Set, be sure to do so now. That will give you an idea of the questions you will need to answer with the outputs of this script. You should use the code you are given as reference when writing your own scripts.

Step 1: Loading the Data Set

You are an analyst working for a car maker. You have access to a set of data that can be used to study the fuel economy of a car. Car makers are interested in studying factors that are associated with better fuel economy. This data set includes several important variables that are associated with fuel economy. You will use this data set to create models to predict fuel economy.

This block of R code will load the data set from mtcars.csv file. Here are the variables contained in the dataset.

Reference: R data sets. (1974). Motor trend car road tests [Data file]. Retrieved from https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars

Click the code section below and hit the Run button above.

Converting appropriate variables to factors

mtcars2 <- within(mtcars, { vs <- factor(vs) am <- factor(am) cyl <- factor(cyl) gear <- factor(gear) carb <- factor(carb) })

Print the first six rows

print("head")

## [1] "head"

head(mtcars2, 6)

## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Step 2: Subsetting Data and Correlation Matrix

In this step, you will subset the original data set to select some variables and create a new data set. You will then print the correlation matrix for these variables in the new data set.

Click the block of code below and hit the Run button above.

myvars <- c("mpg","wt","drat") mtcars_subset <- mtcars2[myvars]

Print the first six rows

print("head")

## [1] "head"

head(mtcars_subset, 6)

## mpg wt drat ## Mazda RX4 21.0 2.620 3.90 ## Mazda RX4 Wag 21.0 2.875 3.90 ## Datsun 710 22.8 2.320 3.85 ## Hornet 4 Drive 21.4 3.215 3.08 ## Hornet Sportabout 18.7 3.440 3.15 ## Valiant 18.1 3.460 2.76

Print the correlation matrix

print("cor")

## [1] "cor"

corr_matrix <- cor(mtcars_subset, method = "pearson") round(corr_matrix, 4)

## mpg wt drat ## mpg 1.0000 -0.8677 0.6812 ## wt -0.8677 1.0000 -0.7124 ## drat 0.6812 -0.7124 1.0000

Step 3: Multiple Regression With Interaction Term

In this step, you will create a multiple regression model for fuel economy as the response variable, and weight and rear axle ratio as predictors. A numerically higher rear axle ratio sends more of the engine’s available torque to the tires, thus increasing the fuel economy. We expect that the fuel economy and weight of the car are negatively correlated, meaning that as weight increases, the fuel economy decreases (or vice versa). However, the rate of decrease in fuel economy will be offset by a numerically higher rear axle ratio. This favors a multiple regression model in which the two predictor variables interact.

The general form of this regression model is:

\[\begin{equation*} \large E(y) = {\beta}_0\ +\ {\beta}_1\ {x}_1\ +\ {\beta}_2\ {x}_2\ +\ {\beta}_3\ {x}_1\ {x}_2 \end{equation*}\]

The prediction regression model is:

\[\begin{equation*} \large \hat{y} = \hat{{\beta}_0} +\ \hat{{\beta}_1}\ {x}_1\ +\ \hat{{\beta}_2}\ {x}_2+\ \hat{{\beta}_3}\ {x}_1\ {x}_2 \end{equation*}\]

\[\begin{equation*} \text{where } \hat{y} \text{ is the predicted value of the fuel economy,}\ {x}_1\ \text{is weight,}\ {x}_2\ \text{is rear axle ratio} \end{equation*}\]

\[\begin{equation*} \hat{{\beta}_0} \text{,} \hspace{0.25cm} \hat{{\beta}_1} \text{, } \hspace{0.25cm} \hat{{\beta}_2} \text{, } \hspace{0.25cm} \hat{{\beta}_3} \text{ } \text{ are estimates of} \text{ } {\beta}_0\ \text{,} \hspace{0.25cm} {\beta}_1\ \text{,} \hspace{0.25cm} {\beta}_2\ \text{,} \hspace{0.25cm} {\beta}_3\ \text{ respectively } \end{equation*}\]

Click the block of code below and hit the Run button above.

Create the multiple regression model and print summary statistics. Note that this model includes the interaction term.

model1 <- lm(mpg ~ wt + drat + wt:drat, data=mtcars_subset) summary(model1)

## ## Call: ## lm(formula = mpg ~ wt + drat + wt:drat, data = mtcars_subset) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.8913 -1.8634 -0.3398 1.3247 6.4730 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.550 12.631 0.439 0.6637 ## wt 3.884 3.798 1.023 0.3153 ## drat 8.494 3.321 2.557 0.0162 * ## wt:drat -2.543 1.093 -2.327 0.0274 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.839 on 28 degrees of freedom ## Multiple R-squared: 0.7996, Adjusted R-squared: 0.7782 ## F-statistic: 37.25 on 3 and 28 DF, p-value: 6.567e-10

Interpretation of Beta Estimates

From the output in previous step, the prediction model equation is:

\[\begin{equation*} \large \hat{y} = 5.5500\ +\ 3.8840\ {x}_1\ +\ 8.4940\ {x}_2\ -\ 2.5430\ {x}_1\ {x}_2 \end{equation*}\]

\[\begin{equation*} \text{where } \hat{y} \text{ is the predicted fuel efficiency,}\ {x}_1\ \text{is weight,}\ {x}_2\ \text{and rear axle ratio} \end{equation*}\]

Interpret the estimated coefficient of rear axle ratio variable.

Let us assume there was no interaction present and the multiple regression model was 5.55 + 3.884 x1 + 8.494 x2. Then, the fuel economy of a car would increase on average by 8.494 for each unit increase in rear axle ratio (since 8.494 is the estimated coefficient for a rear axle ratio variable).

However, weight and rear axle ratio interact in the model in step 3, therefore the rate of change of average fuel economy with rear axle ratio depends on the weight of the car. The estimated coefficient for x2 is:

8.4940 x2 - 2.5430 (x1)(x2) = ( 8.4940 - 2.5430 x1 ) x2

The estimated coefficient for x2 is ( 8.494 - 2.543 x1 )

Suppose we have a car with weight 2.50. The estimated coeffficient for rear axle ratio can be calculated as:

( 8.494 - 2.543 x1 ) = 8.494 - 2.543 (2.50) = +2.1365

In other words, we estimate that the fuel economy of a car with weight 2,500lbs will increase by 2.1365 units for each unit increase in rear axle ratio. Note that this is different than the 8.494 from the first bullet. This is because the multiple regression model has an interaction term between weight and rear axle ratio.

Suppose we now have a car with weight 2.32. If there were no interaction term included in the model, then the fuel economy of this car would increase on average by 8.494 for each unit increase in rear axle ratio. However, since we now have the interaction term in the model, the fuel economy of the car will increase by 8.494 - 2.543 (2.32) = 2.59 units for each unit increase in rear axle ratio.

Therefore, the rate of increase in fuel economy now varies due to the presence of an interaction term.

Step 4: Adding in a Qualitative Predictor

In this step, you will add a qualitative predictor into the multiple regression model from step 3. The transmission variable am is a qualitative variable with two levels; manual and automatic. Since this is a variable with two levels, we can use one dummy variable to represent it. R will create the dummy variable automatically and will also label it appropriately.

The general form of this regression model is:

\[\begin{equation*} \large E(y) = {\beta}_0\ +\ {\beta}_1\ {x}_1\ +\ {\beta}_2\ {x}_2\ +\ {\beta}_3\ {x}_1\ {x}_2\ +\ {\beta}_4\ {x}_3 \end{equation*}\]

The prediction regression model is:

\[\begin{equation*} \large \hat{y} = \hat{{\beta}_0}\ +\ \hat{{\beta}_1}\ {x}_1\ +\ \hat{{\beta}_2}\ {x}_2\ +\ \hat{{\beta}_3}\ {x}_1\ {x}_2\ +\ \hat{{\beta}_4}\ {x}_3 \end{equation*}\]

\[\begin{equation*} \text{In the model above, } \hat{y} \text{ is the predicted fuel economy (mpg), x1 is weight (wt), x2 is rear axle ratio (drat), and x3 is the dummy variable for transmission (am).} \end{equation*}\]

\[\begin{equation*} \text{Note that am='1' for a manual transmission and am='0' for an automatic transmission.} \end{equation*}\]

\[\begin{equation*} \hat{{\beta}_0} \text{,} \hspace{0.25cm} \hat{{\beta}_1} \text{, } \hspace{0.25cm} \hat{{\beta}_2} \text{,} \hspace{0.25cm} \hat{{\beta}_3} \text{,} \hspace{0.25cm} \hat{{\beta}_4} \text{ } \text{ are estimates of} \text{ } {\beta}_0\ \text{,} \hspace{0.25cm} {\beta}_1\ \text{,} \hspace{0.25cm} {\beta}_2\ \text{,} \hspace{0.25cm} {\beta}_3\ \text{,} \hspace{0.25cm} {\beta}_4\ \text{ respectively }\\ \end{equation*}\]

Click the block of code below and hit the Run button above.

Subsetting data to only include the variables that are needed

myvars <- c("mpg","wt","drat","am") mtcars_subset <- mtcars2[myvars]

Create the model

model2 <- lm(mpg ~ wt + drat + wt:drat + am, data=mtcars_subset) summary(model2)

## ## Call: ## lm(formula = mpg ~ wt + drat + wt:drat + am, data = mtcars_subset) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.6907 -1.4711 -0.2512 0.9344 6.7453 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.247 12.914 0.251 0.8034 ## wt 4.168 3.822 1.091 0.2851 ## drat 9.562 3.529 2.710 0.0116 * ## am1 -1.464 1.597 -0.917 0.3674 ## wt:drat -2.708 1.111 -2.438 0.0216 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.847 on 27 degrees of freedom ## Multiple R-squared: 0.8057, Adjusted R-squared: 0.7769 ## F-statistic: 27.99 on 4 and 27 DF, p-value: 2.948e-09

Step 5: Fitted Values

In this step, you will obtain the fitted values of the data set using the model from step 4. Recall that the fitted value is just the predicted value of the dependent variable (miles per gallon) for data points from the data set.

Click the block of code below and hit the Run button above.

predicted values

print("fitted")

## [1] "fitted"

fitted_values <- fitted.values(model2) fitted_values

## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive ## 22.320207 20.689507 24.074750 19.278594 ## Hornet Sportabout Valiant Duster 360 Merc 240D ## 18.356600 18.194854 17.782915 19.945503 ## Merc 230 Merc 280 Merc 280C Merc 450SE ## 20.415578 18.545349 18.545349 15.724427 ## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental ## 17.134382 16.927036 11.483047 10.468474 ## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla ## 9.650796 25.654704 34.090685 28.809681 ## Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 ## 24.198310 17.996415 18.378418 16.124986 ## Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa ## 16.648968 27.478543 27.385767 28.689013 ## Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E ## 19.115459 20.784240 16.283559 21.723885

Step 6: Residuals

In this step, you will obtain the residuals using the model in step 4. Recall that the residual is the difference between the actual value and the predicted value of the dependent variable (miles per gallon).

Click the block of code below and hit the Run button above.

residuals

print("residuals")

## [1] "residuals"

residuals <- residuals(model2) residuals

## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive ## -1.32020710 0.31049250 -1.27475025 2.12140638 ## Hornet Sportabout Valiant Duster 360 Merc 240D ## 0.34339960 -0.09485431 -3.48291480 4.45449689 ## Merc 230 Merc 280 Merc 280C Merc 450SE ## 2.38442164 0.65465150 -0.74534850 0.67557281 ## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental ## 0.16561816 -1.72703557 -1.08304682 -0.06847434 ## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla ## 5.04920376 6.74529611 -3.69068489 5.09031922 ## Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 ## -2.69830968 -2.49641509 -3.17841839 -2.82498557 ## Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa ## 2.55103235 -0.17854335 -1.38576690 1.71098689 ## Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E ## -3.31545871 -1.08423985 -1.28355913 -0.32388453

Step 7: Diagnostic Plots — Residuals against Fitted Values

In this step, you will generate plot of residuals against fitted values to test the assumption of homoscadasticity.

Click the block of code below and hit the Run button above.
NOTE: If the plot is not created, click the code section and hit the Run button again.

plot(fitted_values, residuals, main = "Residuals against Fitted Values", xlab = "Fitted Values", ylab = "Residuals", col="red", pch = 19, frame = FALSE)

Step 8: Diagnostic Plots — Q-Q Plot

In this step, you will generate a Q-Q plot to test assumptions of normality of the residuals.

Click the block of code below and hit the Run button above.
NOTE: If the plot is not created, click the code section and hit the Run button again.

qqnorm(residuals, pch = 19, col="red", frame = FALSE) qqline(residuals, col = "blue", lwd = 2)

Step 9: Confidence Interval for Parameter Estimates

In this step, you will use the confint function to create 90% confidence intervals for the beta parameters.

Click the block of code below and hit the Run button above.

confidence intervals for model parameters

print("confint")

## [1] "confint"

conf_90_int <- confint(model2, level=0.90) round(conf_90_int, 4)

## 5 % 95 % ## (Intercept) -18.7488 25.2427 ## wt -2.3414 10.6771 ## drat 3.5516 15.5725 ## am1 -4.1845 1.2564 ## wt:drat -4.6004 -0.8164

Step 10: Predictions, Prediction Interval, and Confidence Interval

In this step, you will predict the fuel economy for a car that has a weight of 3.88, a rear axle ratio of 3.05, and has a manual transmission. You will also obtain a 90% prediction interval and confidence interval for this prediction. Note that this observation is not from the dataset that was used to create this model.

Click the block of code below and hit the Run button above.

newdata <- data.frame(wt=3.88, drat=3.05, am='1') print("prediction interval")

## [1] "prediction interval"

prediction_pred_int <- predict(model2, newdata, interval="predict", level=0.90) round(prediction_pred_int, 4)

## fit lwr upr ## 1 15.0672 9.4501 20.6844

print("confidence interval")

## [1] "confidence interval"

prediction_conf_int <- predict(model2, newdata, interval="confidence", level=0.90) round(prediction_conf_int, 4)

## fit lwr upr ## 1 15.0672 12.2316 17.9029

Your Code

You have been asked to create regression models in the Module Two Problem Set. Review the Problem Set Report template to see the questions you will be answering about your models.

Use the empty blocks below to write the R code for your models and get outputs. Then use the outputs to answer the questions in your problem set report.

Note: Use the + (plus) button to add new code blocks or the scissor icon to remove empty code blocks, if needed.

Pearson correlation coefficients

cor_mpg_hp <- cor(mtcars$mpg, mtcars$hp) cor_mpg_hp_qsec_drat <- cor(mtcars$mpg, mtcars$hp+mtcars$qsec+mtcars$drat) cor_hp_qsec <- cor(mtcars$hp, mtcars$qsec) cor_mpg_qsec <- cor(mtcars$mpg, mtcars$qsec) cor_mpg_drat <- cor(mtcars$mpg, mtcars$drat) print(cor_mpg_hp)

## [1] -0.7761684

print(cor_mpg_hp_qsec_drat)

## [1] -0.7768859

print(cor_hp_qsec)

## [1] -0.7082234

print(cor_mpg_qsec)

## [1] 0.418684

print(cor_mpg_drat)

## [1] 0.6811719

More details

cor.test(mtcars$mpg, mtcars$hp)

## ## Pearson's product-moment correlation ## ## data: mtcars$mpg and mtcars$hp ## t = -6.7424, df = 30, p-value = 1.788e-07 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.8852686 -0.5860994 ## sample estimates: ## cor ## -0.7761684

cor.test(mtcars$mpg, mtcars$hp+mtcars$qsec+mtcars$drat)

## ## Pearson's product-moment correlation ## ## data: mtcars$mpg and mtcars$hp + mtcars$qsec + mtcars$drat ## t = -6.7581, df = 30, p-value = 1.713e-07 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.8856589 -0.5872846 ## sample estimates: ## cor ## -0.7768859

cor.test(mtcars$hp, mtcars$qsec)

## ## Pearson's product-moment correlation ## ## data: mtcars$hp and mtcars$qsec ## t = -5.4946, df = 30, p-value = 5.766e-06 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.8475998 -0.4774331 ## sample estimates: ## cor ## -0.7082234

cor.test(mtcars$mpg, mtcars$qsec)

## ## Pearson's product-moment correlation ## ## data: mtcars$mpg and mtcars$qsec ## t = 2.5252, df = 30, p-value = 0.01708 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.08195487 0.66961864 ## sample estimates: ## cor ## 0.418684

cor.test(mtcars$mpg, mtcars$drat)

## ## Pearson's product-moment correlation ## ## data: mtcars$mpg and mtcars$drat ## t = 5.096, df = 30, p-value = 1.776e-05 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4360484 0.8322010 ## sample estimates: ## cor ## 0.6811719

#Linear Regression Models

“Write the general form and the prediction equation of the regression model for fuel economy using horsepower, quarter mile time, and rear axle ratio as predictors. Include interaction terms for horsepower and quarter mile time; horsepower and rear axle ratio. Create the regression model for fuel economy using horsepower, quarter mile time, and rear axle ratio as predictors. Include interaction terms for horsepower and quarter mile time; horsepower and rear axle ratio. Write the prediction model equation using outputs obtained from your R script.”

Linear Regression Model for MPG and HP

“MPG = β0 + β1 * HP + ϵ”

” MPG: is the dependent variabnle that I aim to predict” ” HP: is the independent variable used as a predictor” ” β0: The intercept of the regression line, representing the predicted MPG when HP is 0” ” β1: The slope coefficent, representing the change in MPG for each unit increase in. HP” ” ϵ: The error term, representing the difference between the opbderved MPG values and the values predicted by the model”

model <- lm(mpg ~ hp, data = mtcars) summary(model)

## ## Call: ## lm(formula = mpg ~ hp, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.7121 -2.1122 -0.8854 1.5819 8.2360 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 30.09886 1.63392 18.421 < 2e-16 *** ## hp -0.06823 0.01012 -6.742 1.79e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.863 on 30 degrees of freedom ## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892 ## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07

cat("R-squared:", summary(model)$r.squared, "\n")

## R-squared: 0.6024373

cat("Adjusted R-squared:", summary(model)$adj.r.squared)

## Adjusted R-squared: 0.5891853

ggplot(data = mtcars, aes(x= hp, y = mpg)) + geom_point() + geom_smooth(method = "lm", col = "red") + labs(title = "Linear Regression of MPG on HP", subtitle = "SNHU Module Two MTCARS", caption = "The red line is the Linear Regression Line. The closer the data is to the regression line shows higher correlation between the two variables. The data plots that qare further away have larger residuals could be considered as potential outliers.", x = "Horsepower (HP)", y = "Miles Per Gallon (MPG)") + theme_minimal()+ theme( plot.title = element_text(face = "bold", size = 14, hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), plot.caption = element_text(face = "italic", hjust = 0.5))

## `geom_smooth()` using formula = 'y ~ x'

Multiple Linear Regression Model for MPG on QSEC, DRAT and HP

“MPG = β0 + β1 * HP + β2 * QSEC + β3 * DRAT + ϵ”

” MPG: is the dependent variabnle that I aim to predict fuel economy” ” HP: Predictor variable for horsepowe” ” QSEC: Predictor variable for a.” ” DRAT: Predictor variable for rear axle ratio” ” β0: Intercept of the regression line, representing the predicted MPG when all predictors are zero.” ” β1, β2, β3 : Coefficients representing the change in MPG for a one-unit increase in HP, QSEC, and DRAT, respectively.” ” ϵ: Error term representing the difference between observed and predicted MPG values”

model1 <- lm(mpg ~ hp + qsec + drat, data = mtcars) summary(model1)

## ## Call: ## lm(formula = mpg ~ hp + qsec + drat, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.7977 -2.4804 -0.4937 1.1381 7.3188 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 17.73662 13.01979 1.362 0.183968 ## hp -0.05797 0.01421 -4.080 0.000339 *** ## qsec -0.28407 0.48923 -0.581 0.566116 ## drat 4.42875 1.29169 3.429 0.001897 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.207 on 28 degrees of freedom ## Multiple R-squared: 0.7443, Adjusted R-squared: 0.7168 ## F-statistic: 27.16 on 3 and 28 DF, p-value: 1.937e-08

cat("R-squared:", summary(model1)$r.squared, "\n")

## R-squared: 0.7442512

cat("Adjusted R-squared:", summary(model1)$adj.r.squared)

## Adjusted R-squared: 0.7168495

ggplot(data = mtcars, aes(x= hp + qsec + drat, y = mpg)) + geom_point() + geom_smooth(method = "lm", col = "blue") + labs(title = "Multiple Linear Regression of MPG on QSEC, DRAT and HP", subtitle = "SNHU Module Two MTCARS", caption = "The blue line is the Linear Regression Line. The closer the data is to the regression line shows higher correlation between the multiple variables and MPG. The data plots that qare further away have larger residuals could be considered as potential outliers.", x = "QSEC, DRAT and HP", y = "Miles Per Gallon (MPG)") + theme_minimal()+ theme( plot.title = element_text(face = "bold", size = 14, hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), plot.caption = element_text(face = "italic", hjust = 0.5))

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression Model for HP and QSEC

“HP = β0 + β1 * QSEC + ϵ”

” HP: is the dependent variabnle that I aim to predict” ” QSEC: is the independent variable used as a predictor” ” β0: The intercept of the regression line, representing the predicted HPx when QSEC is 0” ” β1: The slope coefficent, representing the change in HP for each unit increase in. QSEC” ” ϵ: The error term, representing the difference between the opbderved HP values and the values predicted by the model”

model2 <- lm(hp ~ qsec, data = mtcars) summary(model2)

## ## Call: ## lm(formula = hp ~ qsec, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -86.903 -33.629 5.336 27.925 100.032 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 631.704 88.700 7.122 6.38e-08 *** ## qsec -27.174 4.946 -5.495 5.77e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 49.2 on 30 degrees of freedom ## Multiple R-squared: 0.5016, Adjusted R-squared: 0.485 ## F-statistic: 30.19 on 1 and 30 DF, p-value: 5.766e-06

cat("R-squared:", summary(model2)$r.squared, "\n")

## R-squared: 0.5015804

cat("Adjusted R-squared:", summary(model2)$adj.r.squared)

## Adjusted R-squared: 0.4849664

ggplot(data = mtcars, aes(x= qsec, y = hp)) + geom_point() + geom_smooth(method = "lm", col = "purple") + labs(title = "Linear Regression of HP on QSEC", subtitle = "SNHU Module Two MTCARS", caption = "The purple line is the Linear Regression Line. The closer the data is to the regression line shows higher correlation between the two variables. The data plots that qare further away have larger residuals could be considered as potential outliers.", x = "Quarter Mile Time (QSEC)", y = "Horsepower (HP)") + theme_minimal()+ theme( plot.title = element_text(face = "bold", size = 14, hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), plot.caption = element_text(face = "italic", hjust = 0.5))

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression Model for HP and DRAT

“HP = β0 + β1 * DRAT + ϵ”

” HP: is the dependent variabnle that I aim to predict” ” QSEC: is the independent variable used as a predictor” ” β0: The intercept of the regression line, representing the predicted HP when QSEC is 0” ” β1: The slope coefficent, representing the change in HP for each unit increase in. QSEC” ” ϵ: The error term, representing the difference between the opbderved HP values and the values predicted by the model”

model3 <- lm(hp ~ drat, data = mtcars) summary(model3)

## ## Call: ## lm(formula = hp ~ drat, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -89.828 -40.261 -7.934 7.247 185.058 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 353.65 76.05 4.65 6.24e-05 *** ## drat -57.55 20.92 -2.75 0.00999 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 62.28 on 30 degrees of freedom ## Multiple R-squared: 0.2014, Adjusted R-squared: 0.1748 ## F-statistic: 7.565 on 1 and 30 DF, p-value: 0.009989

cat("R-squared:", summary(model3)$r.squared, "\n")

## R-squared: 0.2013847

cat("Adjusted R-squared:", summary(model3)$adj.r.squared)

## Adjusted R-squared: 0.1747642

ggplot(data = mtcars, aes(x= qsec, y = hp)) + geom_point() + geom_smooth(method = "lm", col = "orange") + labs(title = "Linear Regression of HP on DRAT", subtitle = "SNHU Module Two MTCARS", caption = "The orange line is the Linear Regression Line. The closer the data is to the regression line shows higher correlation between the two variables. The data plots that qare further away have larger residuals could be considered as potential outliers.", x = "Rear Axle Ratio (DRAT)", y = "Horsepower (HP)") + theme_minimal()+ theme( plot.title = element_text(face = "bold", size = 14, hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), plot.caption = element_text(face = "italic", hjust = 0.5))

## `geom_smooth()` using formula = 'y ~ x'

head(mtcars)

## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Multiple Linear Regression Model for MPG with 160HP with each unit of change of QSEC

“MPG = β0 + β1 * HP + β2 * QSEC”

” MPG: is the dependent variabnle that I aim to predict fuel economy” ” HP: Predictor variable for horsepowe” ” QSEC: Predictor variable for quarter-mile time.” ” β0: representing the predicted value of MPG when both HP and QSEC are zero.” ” β1: Coefficients representing the change in MPG for a one-unit increase in HP while holding QSEC constant.” ” β2: Coefficient Slope for QSEC. It represents the change in MPG for each unit increase in QSEC, holding HP constant.”

model4 <- lm(mpg ~ hp + qsec, data = mtcars) summary(model4)

## ## Call: ## lm(formula = mpg ~ hp + qsec, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.1782 -2.6030 -0.5098 1.2866 8.7178 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 48.32371 11.10331 4.352 0.000153 *** ## hp -0.08459 0.01393 -6.071 1.31e-06 *** ## qsec -0.88658 0.53459 -1.658 0.108007 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.755 on 29 degrees of freedom ## Multiple R-squared: 0.6369, Adjusted R-squared: 0.6118 ## F-statistic: 25.43 on 2 and 29 DF, p-value: 4.176e-07

“To see the actual change, I will create a prediction model to show the change”

new_data <- data.frame(hp = 160, qsec = c(1, 2)) prediction_model <- predict(model4, newdata = new_data) prediction_model

## 1 2 ## 33.90224 33.01566

difference <- prediction_model[1] - prediction_model[2] difference

## 1 ## 0.8865796

ggplot(data = mtcars, aes(x= qsec, y = mpg + hp)) + geom_point() + geom_smooth(method = "lm", col = "green") + labs(title = "Multiple Linear Regression of MPG & HP on QSEC", subtitle = "SNHU Module Two MTCARS", caption = "The green line is the Linear Regression Line. The closer the data is to the regression line shows higher correlation between the QSEC and MPG combined with HP. The data plots that qare further away have larger residuals could be considered as potential outliers.", x = "QSEC", y = "MPG & HP") + theme_minimal()+ theme( plot.title = element_text(face = "bold", size = 14, hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), plot.caption = element_text(face = "italic", hjust = 0.5))

## `geom_smooth()` using formula = 'y ~ x'

Display the summary to get the F-test results

model4_summary <- summary(model4) cat("Overall F-test P-value:", model4_summary$fstatistic, "\n")

## Overall F-test P-value: 25.43136 2 29

T-Test

Extract coefficients table for p-values

coeff_table <- model4_summary$coefficients

Create a Normal Q-Q plot of residuals

fitted_values <- fitted(model4) fitted_df <- data.frame(Car = names(fitted_values), Fitted = fitted_values) residuals <- resid(model4) residuals_df <- data.frame(Car = names(residuals), Residuals = residuals)

View the data frame

print(fitted_df)

## Car Fitted ## Mazda RX4 Mazda RX4 24.425370 ## Mazda RX4 Wag Mazda RX4 Wag 23.928885 ## Datsun 710 Datsun 710 23.957305 ## Hornet 4 Drive Hornet 4 Drive 21.783362 ## Hornet Sportabout Hornet Sportabout 18.430337 ## Valiant Valiant 21.514796 ## Duster 360 Duster 360 13.554988 ## Merc 240D Merc 240D 25.347344 ## Merc 230 Merc 230 19.984693 ## Merc 280 Merc 280 21.694354 ## Merc 280C Merc 280C 21.162406 ## Merc 450SE Merc 450SE 17.670472 ## Merc 450SL Merc 450SL 17.493156 ## Merc 450SLC Merc 450SLC 17.138524 ## Cadillac Fleetwood Cadillac Fleetwood 15.041430 ## Lincoln Continental Lincoln Continental 14.337352 ## Chrysler Imperial Chrysler Imperial 13.423088 ## Fiat 128 Fiat 128 25.478859 ## Honda Civic Honda Civic 27.505412 ## Toyota Corolla Toyota Corolla 25.182223 ## Toyota Corona Toyota Corona 22.377722 ## Dodge Challenger Dodge Challenger 20.678150 ## AMC Javelin AMC Javelin 20.296921 ## Camaro Z28 Camaro Z28 13.936217 ## Pontiac Firebird Pontiac Firebird 18.403740 ## Fiat X1-9 Fiat X1-9 25.984209 ## Porsche 914-2 Porsche 914-2 25.819858 ## Lotus Europa Lotus Europa 23.781496 ## Ford Pantera L Ford Pantera L 13.135737 ## Ferrari Dino Ferrari Dino 19.777938 ## Maserati Bora Maserati Bora 7.040973 ## Volvo 142E Volvo 142E 22.612682

print(residuals_df)

## Car Residuals ## Mazda RX4 Mazda RX4 -3.42536974 ## Mazda RX4 Wag Mazda RX4 Wag -2.92888515 ## Datsun 710 Datsun 710 -1.15730529 ## Hornet 4 Drive Hornet 4 Drive -0.38336246 ## Hornet Sportabout Hornet Sportabout 0.26966269 ## Valiant Valiant -3.41479557 ## Duster 360 Duster 360 0.74501179 ## Merc 240D Merc 240D -0.94734397 ## Merc 230 Merc 230 2.81530738 ## Merc 280 Merc 280 -2.49435367 ## Merc 280C Merc 280C -3.36240589 ## Merc 450SE Merc 450SE -1.27047184 ## Merc 450SL Merc 450SL -0.19315591 ## Merc 450SLC Merc 450SLC -1.93852406 ## Cadillac Fleetwood Cadillac Fleetwood -4.64142957 ## Lincoln Continental Lincoln Continental -3.93735187 ## Chrysler Imperial Chrysler Imperial 1.27691194 ## Fiat 128 Fiat 128 6.92114100 ## Honda Civic Honda Civic 2.89458775 ## Toyota Corolla Toyota Corolla 8.71777720 ## Toyota Corona Toyota Corona -0.87772164 ## Dodge Challenger Dodge Challenger -5.17815035 ## AMC Javelin AMC Javelin -5.09692111 ## Camaro Z28 Camaro Z28 -0.63621745 ## Pontiac Firebird Pontiac Firebird 0.79626007 ## Fiat X1-9 Fiat X1-9 1.31579062 ## Porsche 914-2 Porsche 914-2 0.18014154 ## Lotus Europa Lotus Europa 6.61850442 ## Ford Pantera L Ford Pantera L 2.66426292 ## Ferrari Dino Ferrari Dino -0.07793834 ## Maserati Bora Maserati Bora 7.95902698 ## Volvo 142E Volvo 142E -1.21268239

qqnorm(residuals, main = "Normal Q-Q Plot") qqline(residuals, col = "red", lwd = 2)

plot(fitted_values, residuals, main = "Residuals vs Fitted Values fuel economy (MPG) of a car with 160 horsepower (HP) for each unit increase in quarter mile time (QSEC)", xlab = "Fitted Values", ylab = "Residuals", pch = 19, col = "darkgreen") abline(h = 0, lty = 2, col = "red") grid()

Multiple Linear Regression Model for MPG with 160HP with each unit of change of DRAT

“MPG = β0 + β1 * HP + β2 * DRAT”

” MPG: is the dependent variabnle that I aim to predict fuel economy” ” HP: Predictor variable for horsepowe” ” DRAT: Predictor variable for rear axle ratio.” ” β0: representing the predicted value of MPG when both HP and DRAT are zero.” ” β1: Coefficients representing the change in MPG for a one-unit increase in HP while holding DRAT constant.” ” β2: Coefficient Slope for DRAT It represents the change in MPG for each unit increase in DRAT, holding HP constant.”

model5 <- lm(mpg ~ hp + drat, data = mtcars) summary(model5)

## ## Call: ## lm(formula = mpg ~ hp + drat, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.0369 -2.3487 -0.6034 1.1897 7.7500 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.789861 5.077752 2.125 0.042238 * ## hp -0.051787 0.009293 -5.573 5.17e-06 *** ## drat 4.698158 1.191633 3.943 0.000467 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.17 on 29 degrees of freedom ## Multiple R-squared: 0.7412, Adjusted R-squared: 0.7233 ## F-statistic: 41.52 on 2 and 29 DF, p-value: 3.081e-09

“To see the actual change, I will create a prediction model to show the change”

new_data <- data.frame(hp = 160, drat = c(1, 2)) prediction_model2 <- predict(model5, newdata = new_data) prediction_model2

## 1 2 ## 7.202155 11.900313

difference2 <- prediction_model2[1] - prediction_model2[2] difference2

## 1 ## -4.698158

ggplot(data = mtcars, aes(x= drat, y = mpg + hp)) + geom_point() + geom_smooth(method = "lm", col = "brown") + labs(title = "Multiple Linear Regression of MPG & HP on DRAT", subtitle = "SNHU Module Two MTCARS", caption = "The green line is the Linear Regression Line. The closer the data is to the regression line shows higher correlation between the DRAT and MPG combined with HP. The data plots that qare further away have larger residuals could be considered as potential outliers.", x = "DRAT", y = "MPG & HP") + theme_minimal()+ theme( plot.title = element_text(face = "bold", size = 14, hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), plot.caption = element_text(face = "italic", hjust = 0.5))

## `geom_smooth()` using formula = 'y ~ x'

Display the summary to get the F-test results and P-value

model5_summary <- summary(model5) cat("Overall F-test P-value:", model5_summary$fstatistic, "\n")

## Overall F-test P-value: 41.52167 2 29

T-Test

Extract coefficients table for p-values

coeff_table <- model5_summary$coefficients

Create a Normal Q-Q plot of residuals

fitted_values1 <- fitted(model5) fitted_df1 <- data.frame(Car = names(fitted_values1), Fitted = fitted_values1) residuals1 <- resid(model5) residuals_df1 <- data.frame(Car = names(residuals1), Residuals = residuals1) qqnorm(residuals1, main = "Normal Q-Q Plot") qqline(residuals1, col = "red", lwd = 2)

plot(fitted_values1, residuals1, main = "Residuals vs Fitted Values fuel economy (MPG) of a car with 160 horsepower (HP) for each unit increase in rear axle ratio (DRAT)", xlab = "Fitted Values", ylab = "Residuals", pch = 19, col = "purple") abline(h = 0, lty = 2, col = "blue") grid()

Model with Interaction Term and Qualitative Predictor

“General form: MPG=β0+β1⋅HP+β2⋅QSEC+β3⋅(HP⋅QSEC)+β4⋅CYL6+β5⋅CYL8+ϵ”

“Predicted general form: MPG-hat=β0+β1⋅HP+β2⋅QSEC+β3⋅(HP⋅QSEC)+β4⋅CYL6+β5⋅CYL8+ϵ”

model6 <- lm(mpg~ hp + qsec +qsec:hp + factor(cyl), data = mtcars) summary(model6)

## ## Call: ## lm(formula = mpg ~ hp + qsec + qsec:hp + factor(cyl), data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.0004 -1.6264 -0.2424 1.3322 5.7974 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 24.505565 13.186080 1.858 0.0745 . ## hp 0.141850 0.079164 1.792 0.0848 . ## qsec 0.531630 0.746717 0.712 0.4828 ## factor(cyl)6 -4.408372 1.627676 -2.708 0.0118 * ## factor(cyl)8 -4.580823 2.555742 -1.792 0.0847 . ## hp:qsec -0.012526 0.005251 -2.386 0.0246 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.692 on 26 degrees of freedom ## Multiple R-squared: 0.8327, Adjusted R-squared: 0.8005 ## F-statistic: 25.88 on 5 and 26 DF, p-value: 2.526e-09

Display the summary to get the F-test results

model6_summary <- summary(model6) cat("Overall F-test P-value:", model6_summary$fstatistic, "\n")

## Overall F-test P-value: 25.88205 5 26

T-Test

Extract coefficients table for p-values

coeff_table <- model6_summary$coefficients

Create a Normal Q-Q plot of residuals

fitted_values2 <- fitted(model6) fitted_df2 <- data.frame(Car = names(fitted_values2), Fitted = fitted_values2) residuals2 <- resid(model6) residuals_df2 <- data.frame(Car = names(residuals2), Residuals = residuals2) qqnorm(residuals2, main = "Normal Q-Q Plot") qqline(residuals2, col = "green", lwd = 2)

plot(fitted_values2, residuals2, main = "Residuals vs Fitted Values for fuel economy (MPG) using Horsepower (HP), Quarter Mile Time (QSEC), interaction term for HP and QSEC, and number of cylinders", xlab = "Fitted Values", ylab = "Residuals", pch = 19, col = "brown") abline(h = 0, lty = 2, col = "blue") grid()

Prediction Models

Prediction Model for predicted fuel economy for a car that has 175 horsepower, 14.2 quarter mile time and 3.91 rear-axle ratio?

new_data1 <- data.frame(hp = 175, qsec = 14.2, drat = 3.91) prediction_model3 <- predict(model1, newdata1 = new_data1) prediction_model3

## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive ## 23.955867 23.796785 24.109210 19.477748 ## Hornet Sportabout Valiant Duster 360 Merc 240D ## 16.706976 18.128834 13.249800 24.802909 ## Merc 230 Merc 280 Merc 280C Merc 450SE ## 23.084598 22.768097 22.597652 15.954863 ## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental ## 15.898048 15.784418 13.720749 13.496484 ## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla ## 13.759132 26.448790 31.294723 27.004637 ## Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 ## 22.815301 16.471698 18.076760 15.674904 ## Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa ## 16.388441 26.610713 27.336415 23.081217 ## Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E ## 17.002014 19.220283 9.845972 24.335959

prediction_model3_df <- data.frame(Car = names(prediction_model3), Predict = prediction_model3) prediction_model3_df

## Car Predict ## Mazda RX4 Mazda RX4 23.955867 ## Mazda RX4 Wag Mazda RX4 Wag 23.796785 ## Datsun 710 Datsun 710 24.109210 ## Hornet 4 Drive Hornet 4 Drive 19.477748 ## Hornet Sportabout Hornet Sportabout 16.706976 ## Valiant Valiant 18.128834 ## Duster 360 Duster 360 13.249800 ## Merc 240D Merc 240D 24.802909 ## Merc 230 Merc 230 23.084598 ## Merc 280 Merc 280 22.768097 ## Merc 280C Merc 280C 22.597652 ## Merc 450SE Merc 450SE 15.954863 ## Merc 450SL Merc 450SL 15.898048 ## Merc 450SLC Merc 450SLC 15.784418 ## Cadillac Fleetwood Cadillac Fleetwood 13.720749 ## Lincoln Continental Lincoln Continental 13.496484 ## Chrysler Imperial Chrysler Imperial 13.759132 ## Fiat 128 Fiat 128 26.448790 ## Honda Civic Honda Civic 31.294723 ## Toyota Corolla Toyota Corolla 27.004637 ## Toyota Corona Toyota Corona 22.815301 ## Dodge Challenger Dodge Challenger 16.471698 ## AMC Javelin AMC Javelin 18.076760 ## Camaro Z28 Camaro Z28 15.674904 ## Pontiac Firebird Pontiac Firebird 16.388441 ## Fiat X1-9 Fiat X1-9 26.610713 ## Porsche 914-2 Porsche 914-2 27.336415 ## Lotus Europa Lotus Europa 23.081217 ## Ford Pantera L Ford Pantera L 17.002014 ## Ferrari Dino Ferrari Dino 19.220283 ## Maserati Bora Maserati Bora 9.845972 ## Volvo 142E Volvo 142E 24.335959

Predict with 95% Conf Interval

prediction_with_interval <- predict(model1, newdata1 = new_data1, interval = "prediction", level = 0.95)

## Warning in predict.lm(model1, newdata1 = new_data1, interval = "prediction", : predictions on current data refer to _future_ responses

Convert the result to a data frame for easy viewing

prediction_with_interval_df <- data.frame(prediction_with_interval) prediction_with_interval_df

## fit lwr upr ## Mazda RX4 23.955867 16.976208 30.93553 ## Mazda RX4 Wag 23.796785 16.948899 30.64467 ## Datsun 710 24.109210 17.370295 30.84813 ## Hornet 4 Drive 19.477748 12.583415 26.37208 ## Hornet Sportabout 16.706976 9.933170 23.48078 ## Valiant 18.128834 10.962522 25.29515 ## Duster 360 13.249800 6.362600 20.13700 ## Merc 240D 24.802909 17.944540 31.66128 ## Merc 230 23.084598 15.121567 31.04763 ## Merc 280 22.768097 16.053721 29.48247 ## Merc 280C 22.597652 15.834798 29.36051 ## Merc 450SE 15.954863 9.179293 22.73043 ## Merc 450SL 15.898048 9.126116 22.66998 ## Merc 450SLC 15.784418 9.001975 22.56686 ## Cadillac Fleetwood 13.720749 6.835247 20.60625 ## Lincoln Continental 13.496484 6.606825 20.38614 ## Chrysler Imperial 13.759132 6.854298 20.66397 ## Fiat 128 26.448790 19.608904 33.28868 ## Honda Civic 31.294723 23.994106 38.59534 ## Toyota Corolla 27.004637 20.081856 33.92742 ## Toyota Corona 22.815301 15.984610 29.64599 ## Dodge Challenger 16.471698 9.291140 23.65226 ## AMC Javelin 18.076760 11.259460 24.89406 ## Camaro Z28 15.674904 8.690133 22.65967 ## Pontiac Firebird 16.388441 9.587063 23.18982 ## Fiat X1-9 26.610713 19.773972 33.44745 ## Porsche 914-2 27.336415 20.237929 34.43490 ## Lotus Europa 23.081217 16.212903 29.94953 ## Ford Pantera L 17.002014 9.576316 24.42771 ## Ferrari Dino 19.220283 12.308670 26.13190 ## Maserati Bora 9.845972 2.239377 17.45257 ## Volvo 142E 24.335959 17.554492 31.11743

Prediction Models

Prediction Model for predicted fuel economy for a car that has 175 horsepower, 14.2 quarter mile time and cyl is 6?

new_data2 <- data.frame(hp = 175, qsec = 14.2, cyl = "6") prediction_model3 <- predict(model6, newdata = new_data2) prediction_model3

## 1 ## 21.34244

prediction_model3_df <- data.frame(Car = names(prediction_model3), Predict = prediction_model3) prediction_model3_df

## Car Predict ## 1 1 21.34244

Predict with 95% Conf Interval

prediction_with_interval1 <- predict(model6, newdata = new_data2, interval = "prediction", level = 0.95)

Convert the result to a data frame for easy viewing

prediction_with_interval_df1 <- data.frame(prediction_with_interval) prediction_with_interval_df1

## fit lwr upr ## Mazda RX4 23.955867 16.976208 30.93553 ## Mazda RX4 Wag 23.796785 16.948899 30.64467 ## Datsun 710 24.109210 17.370295 30.84813 ## Hornet 4 Drive 19.477748 12.583415 26.37208 ## Hornet Sportabout 16.706976 9.933170 23.48078 ## Valiant 18.128834 10.962522 25.29515 ## Duster 360 13.249800 6.362600 20.13700 ## Merc 240D 24.802909 17.944540 31.66128 ## Merc 230 23.084598 15.121567 31.04763 ## Merc 280 22.768097 16.053721 29.48247 ## Merc 280C 22.597652 15.834798 29.36051 ## Merc 450SE 15.954863 9.179293 22.73043 ## Merc 450SL 15.898048 9.126116 22.66998 ## Merc 450SLC 15.784418 9.001975 22.56686 ## Cadillac Fleetwood 13.720749 6.835247 20.60625 ## Lincoln Continental 13.496484 6.606825 20.38614 ## Chrysler Imperial 13.759132 6.854298 20.66397 ## Fiat 128 26.448790 19.608904 33.28868 ## Honda Civic 31.294723 23.994106 38.59534 ## Toyota Corolla 27.004637 20.081856 33.92742 ## Toyota Corona 22.815301 15.984610 29.64599 ## Dodge Challenger 16.471698 9.291140 23.65226 ## AMC Javelin 18.076760 11.259460 24.89406 ## Camaro Z28 15.674904 8.690133 22.65967 ## Pontiac Firebird 16.388441 9.587063 23.18982 ## Fiat X1-9 26.610713 19.773972 33.44745 ## Porsche 914-2 27.336415 20.237929 34.43490 ## Lotus Europa 23.081217 16.212903 29.94953 ## Ford Pantera L 17.002014 9.576316 24.42771 ## Ferrari Dino 19.220283 12.308670 26.13190 ## Maserati Bora 9.845972 2.239377 17.45257 ## Volvo 142E 24.335959 17.554492 31.11743

End of Module Two Jupyter Notebook

Attach the HTML output along with your problem set report for the Module Two Problem Set. The HTML output can be downloaded by clicking File, then Download as, then HTML. Be sure to answer all of the questions in your problem set report.