Flipped Assignment 9

Fitting a Multiple Linear Regression Model

linear_model <- lm(data = fa_8_data, y~x1*x2)

Using the lm() function in R, we fitted the linear equation \(\hat y= \beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_1x_2+\epsilon\).

Model Adequacy

Residuals vs Fitted

We determined that the Residuals vs Fitted plot showed random scatter and we determined that there were no patterns.

Normal QQ Plot

We determined that data is approximately normally distributed.

After validating the constant variance and the normality assumptions, we determined a transformation is not need for the data.

Testing the Signifcance of the Regression

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq  F value    Pr(>F)    
## x1         1 1283.90 1283.90  58.2219 1.076e-08 ***
## x2         1 2723.17 2723.17 123.4894 1.614e-12 ***
## x1:x2      1   49.08   49.08   2.2259    0.1455    
## Residuals 32  705.66   22.05                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using the ANOVA function for \(\hat y= \beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_1x_2+\epsilon\), we determined the interaction between \(x_1\) and \(x_2\) is not significant. Therefore, the interaction term should be removed from the model.

While the other terms are highly significant.

Reduced Model

New_linear_model <-  lm(data = fa_8_data, y~x1+x2)

Using the lm() function in R, we fitted the reduced linear equation \(\hat y= \beta_0+\beta_1x_1+\beta_2x_2+\epsilon\).

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## x1         1 1283.90 1283.90  56.137 1.295e-08 ***
## x2         1 2723.17 2723.17 119.066 1.742e-12 ***
## Residuals 33  754.74   22.87                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After removing the interaction term, we determined that all terms are still highly significant.

## 
## Call:
## lm(formula = y ~ x1 + x2, data = fa_8_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7716 -4.1656  0.0802  3.8323  8.3349 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.109e+01  1.669e+00   6.642 1.48e-07 ***
## x1          3.501e+02  3.968e+01   8.823 3.38e-10 ***
## x2          1.089e-01  9.983e-03  10.912 1.74e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.782 on 33 degrees of freedom
## Multiple R-squared:  0.8415, Adjusted R-squared:  0.8319 
## F-statistic:  87.6 on 2 and 33 DF,  p-value: 6.316e-14

After using the ANVOA and the summary functions, we determined the final liner equation model to be \(\hat y= \beta_0+\beta_1x_1+\beta_2x_2+\epsilon\).

Code

fa_8_data <- read.csv('https://raw.githubusercontent.com/Rusty1299/HW_files/main/data-table-B8(4).csv')

linear_model <- lm(data = fa_8_data, y~x1*x2)

plot(linear_model)


summary(linear_model)


New_linear_model <-  lm(data = fa_8_data, y~x1+x2)



anova(New_linear_model)


summary(New_linear_model)