Course Code: STAT 3206 | Course Title: Regression Analysis-II (Lab)


1 Question 1: Temperature vs Viscosity of Toluene Blends

Byers and Williams studied the impact of temperature (the regressor) on the viscosity (the response) of toluene-teraline blends. The following data are for blends with a 0.4 molar fraction of toluene.

Temperature 24.9 35.0 44.9 55.1 65.2 75.2 85.2 95.2
Viscosity 1.133 0.9772 0.8532 0.7550 0.6723 0.6021 0.5420 0.5074
Temperature <- c(24.9, 35.0, 44.9, 55.1, 65.2, 75.2, 85.2, 95.2)
Viscosity   <- c(1.133, 0.9772, 0.8532, 0.7550, 0.6723, 0.6021, 0.5420, 0.5074)
data        <- data.frame(Temperature, Viscosity)
head(data)
##   Temperature Viscosity
## 1        24.9    1.1330
## 2        35.0    0.9772
## 3        44.9    0.8532
## 4        55.1    0.7550
## 5        65.2    0.6723
## 6        75.2    0.6021

1.1 (a) Scatter Diagram

Does it seem likely that a straight line model will be adequate?

plot(Temperature, Viscosity,
     main = "Scatterplot of Temperature vs Viscosity",
     xlab = "Temperature",
     ylab = "Viscosity",
     pch  = 16,
     col  = "blue")
abline(lm(Viscosity ~ Temperature, data = data), col = "red",   lwd = 2)
lines(smooth.spline(Temperature, Viscosity),     col = "green", lwd = 2)
legend("topright",
       legend = c("Linear Fit", "Smooth Spline"),
       col    = c("red", "green"),
       lwd    = 2)

Conclusion: The scatter plot shows a non-linear (curved) decreasing trend between Temperature and Viscosity. A straight line model may not be fully adequate; an exponential or log-transformed model is likely more appropriate.

1.2 (b) Straight Line Model — Summary & Residual Plots

# Fit a linear regression model
model <- lm(Viscosity ~ Temperature, data = data)
summary(model)
## 
## Call:
## lm(formula = Viscosity ~ Temperature, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.043955 -0.035863 -0.009305  0.019900  0.069559 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.2815107  0.0468683   27.34 1.58e-07 ***
## Temperature -0.0087578  0.0007284  -12.02 2.01e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04743 on 6 degrees of freedom
## Multiple R-squared:  0.9602, Adjusted R-squared:  0.9535 
## F-statistic: 144.6 on 1 and 6 DF,  p-value: 2.007e-05
# Plot the residuals
par(mfrow = c(2, 2))
plot(model, which = 1, main = "Residuals vs Fitted")
plot(model, which = 2, main = "Normal Q-Q")
plot(model, which = 3, main = "Scale-Location")
plot(model, which = 5, main = "Residuals vs Leverage")

par(mfrow = c(1, 1))

Conclusion on Model Adequacy: The residual plots reveal a pattern (non-random curvature) in the Residuals vs Fitted plot, indicating that the linear model does not capture the true relationship adequately. The Q-Q plot may also show minor deviations. A transformation is needed.

1.3 (c) Log Transformation (Exponential Model)

Physical chemistry suggests: \[\text{Viscosity} = \beta_0 \cdot e^{\beta_1 \cdot \text{Temperature}}\]

Taking natural log of both sides: \[\log(\text{Viscosity}) = \beta_0 + \beta_1 \cdot \text{Temperature}\]

# Apply log transformation to Viscosity
data$log_Viscosity <- log(data$Viscosity)

# Fit a linear regression model on log(Viscosity) ~ Temperature
model_log <- lm(log_Viscosity ~ Temperature, data = data)

# Summary statistics of the model
summary(model_log)
## 
## Call:
## lm(formula = log_Viscosity ~ Temperature, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.02188 -0.01827 -0.01134  0.01220  0.04308 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.3761794  0.0276526   13.60 9.79e-06 ***
## Temperature -0.0115306  0.0004297  -26.83 1.77e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02799 on 6 degrees of freedom
## Multiple R-squared:  0.9917, Adjusted R-squared:  0.9904 
## F-statistic: 719.9 on 1 and 6 DF,  p-value: 1.77e-07
# Plot the residuals for log-transformed model
par(mfrow = c(2, 2))
plot(model_log, which = 1, main = "Residuals vs Fitted (log transformation)")
plot(model_log, which = 2, main = "Normal Q-Q (log transformation)")
plot(model_log, which = 3, main = "Scale-Location (log transformation)")
plot(model_log, which = 5, main = "Residuals vs Leverage (log transformation)")

par(mfrow = c(1, 1))

Conclusion: The log-transformed model shows much better residual behavior — the residuals are more randomly scattered with no systematic pattern, confirming the exponential model is a better fit than the straight line model.


2 Question 2: Solid Fuel Rocket Propellant — Polynomial Regression

A solid fuel rocket propellant loses weight after it is produced. The following data are available:

Months since Production (x) Weight Loss (y) kg
0.25 1.42
0.50 1.39
0.75 1.55
1.00 1.89
1.25 2.43
1.50 3.15
1.75 4.05
2.00 5.15
2.25 6.43
2.50 7.89
months_since_production <- c(0.25, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 2.00, 2.25, 2.50)
weight_loss             <- c(1.42, 1.39, 1.55, 1.89, 2.43, 3.15, 4.05, 5.15, 6.43, 7.89)

2.1 (a) Fit a Second-Order Polynomial Model

\[\hat{y} = \beta_0 + \beta_1 x + \beta_2 x^2\]

# Fit a second-order polynomial model (quadratic model)
model <- lm(weight_loss ~ poly(months_since_production, 2, raw = TRUE))
model
## 
## Call:
## lm(formula = weight_loss ~ poly(months_since_production, 2, raw = TRUE))
## 
## Coefficients:
##                                   (Intercept)  
##                                         1.633  
## poly(months_since_production, 2, raw = TRUE)1  
##                                        -1.232  
## poly(months_since_production, 2, raw = TRUE)2  
##                                         1.495
summary(model)
## 
## Call:
## lm(formula = weight_loss ~ poly(months_since_production, 2, raw = TRUE))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.005364 -0.002727  0.001045  0.002409  0.003273 
## 
## Coefficients:
##                                                Estimate Std. Error t value
## (Intercept)                                    1.633000   0.004196   389.2
## poly(months_since_production, 2, raw = TRUE)1 -1.232182   0.007010  -175.8
## poly(months_since_production, 2, raw = TRUE)2  1.494545   0.002484   601.6
##                                               Pr(>|t|)    
## (Intercept)                                    < 2e-16 ***
## poly(months_since_production, 2, raw = TRUE)1 5.09e-14 ***
## poly(months_since_production, 2, raw = TRUE)2  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003568 on 7 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.859e+06 on 2 and 7 DF,  p-value: < 2.2e-16

2.2 (b) Test for Significance of Regression

anova_table <- anova(model)
print(anova_table)
## Analysis of Variance Table
## 
## Response: weight_loss
##                                              Df Sum Sq Mean Sq F value
## poly(months_since_production, 2, raw = TRUE)  2  47.31  23.655 1858613
## Residuals                                     7   0.00   0.000        
##                                                 Pr(>F)    
## poly(months_since_production, 2, raw = TRUE) < 2.2e-16 ***
## Residuals                                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: If the overall F-statistic p-value < 0.05, the regression is statistically significant, meaning at least one predictor (linear or quadratic term) significantly explains variation in weight loss.

2.3 (c) Plot Data and Fitted Polynomial Curve

# Plot observed data
plot(months_since_production, weight_loss,
     main = "Weight Loss vs Months Since Production",
     xlab = "Months Since Production",
     ylab = "Weight Loss (kg)",
     pch  = 16,
     col  = "blue")

# Add fitted polynomial curve
x_seq    <- seq(min(months_since_production), max(months_since_production), length.out = 200)
y_fitted <- predict(model, newdata = data.frame(months_since_production = x_seq))
lines(x_seq, y_fitted, col = "red", lwd = 2)
legend("topleft", legend = c("Observed", "Fitted Quadratic"),
       col = c("blue", "red"), pch = c(16, NA), lty = c(NA, 1), lwd = 2)

2.4 (d) Test Hypothesis \(H_0: \beta_2 = 0\)

# Extract p-value for the quadratic term (beta_2)
model_summary   <- summary(model)
p_value_quadratic <- model_summary$coefficients[3, 4]   # row 3 = quadratic term
cat("p-value for quadratic term (beta_2):", p_value_quadratic, "\n")
## p-value for quadratic term (beta_2): 9.255077e-18
# Hypothesis test for H0: B2 = 0 (quadratic term significance)
if (p_value_quadratic < 0.05) {
  print("Reject H0: The quadratic term is statistically significant.")
} else {
  print("Fail to reject H0: The quadratic term is not statistically significant.")
}
## [1] "Reject H0: The quadratic term is statistically significant."

Comment: If \(\beta_2\) is significant (p < 0.05), the quadratic term is necessary and the relationship between weight loss and months is curvilinear. Dropping the quadratic term would lead to a poorly fitting model.


3 Question 3: Multiple Regression with Variable Selection

Data with response y and predictors \(x_1, x_2, x_3, x_4\):

y  <- c(78.5, 74.3, 104.3, 87.6, 95.9, 109.2, 102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4)
x1 <- c(7, 1, 11, 11, 7, 11, 3, 1, 2, 21, 1, 11, 10)
x2 <- c(26, 29, 56, 31, 52, 55, 71, 31, 54, 47, 40, 66, 68)
x3 <- c(6, 15, 8, 8, 6, 9, 17, 22, 18, 4, 23, 9, 8)
x4 <- c(60, 52, 20, 47, 33, 22, 6, 44, 22, 26, 34, 12, 12)

# Create a data frame
data <- data.frame(y, x1, x2, x3, x4)

3.1 (a) Forward Selection

# Initial null model (no predictors)
null_model <- lm(y ~ 1, data = data)

# Full model with all predictors
full_model <- lm(y ~ x1 + x2 + x3 + x4, data = data)

# Perform forward selection based on AIC
stepwise_model_forward <- step(null_model,
                               scope     = list(lower = null_model, upper = full_model),
                               direction = "forward",
                               trace     = 1)
## Start:  AIC=71.44
## y ~ 1
## 
##        Df Sum of Sq     RSS    AIC
## + x4    1   1831.90  883.87 58.852
## + x2    1   1809.43  906.34 59.178
## + x1    1   1450.08 1265.69 63.519
## + x3    1    776.36 1939.40 69.067
## <none>              2715.76 71.444
## 
## Step:  AIC=58.85
## y ~ x4
## 
##        Df Sum of Sq    RSS    AIC
## + x1    1    809.10  74.76 28.742
## + x3    1    708.13 175.74 39.853
## <none>              883.87 58.852
## + x2    1     14.99 868.88 60.629
## 
## Step:  AIC=28.74
## y ~ x4 + x1
## 
##        Df Sum of Sq    RSS    AIC
## + x2    1    26.789 47.973 24.974
## + x3    1    23.926 50.836 25.728
## <none>              74.762 28.742
## 
## Step:  AIC=24.97
## y ~ x4 + x1 + x2
## 
##        Df Sum of Sq    RSS    AIC
## <none>              47.973 24.974
## + x3    1   0.10909 47.864 26.944
# Show the summary of the final selected model
summary(stepwise_model_forward)
## 
## Call:
## lm(formula = y ~ x4 + x1 + x2, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0919 -1.8016  0.2562  1.2818  3.8982 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  71.6483    14.1424   5.066 0.000675 ***
## x4           -0.2365     0.1733  -1.365 0.205395    
## x1            1.4519     0.1170  12.410 5.78e-07 ***
## x2            0.4161     0.1856   2.242 0.051687 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.309 on 9 degrees of freedom
## Multiple R-squared:  0.9823, Adjusted R-squared:  0.9764 
## F-statistic: 166.8 on 3 and 9 DF,  p-value: 3.323e-08
# Display the final selected model
print(stepwise_model_forward)
## 
## Call:
## lm(formula = y ~ x4 + x1 + x2, data = data)
## 
## Coefficients:
## (Intercept)           x4           x1           x2  
##     71.6483      -0.2365       1.4519       0.4161

3.2 (b) Backward Elimination

# Perform backward elimination based on AIC
stepwise_model_backward <- step(full_model,
                                direction = "backward",
                                trace     = 1)
## Start:  AIC=26.94
## y ~ x1 + x2 + x3 + x4
## 
##        Df Sum of Sq    RSS    AIC
## - x3    1    0.1091 47.973 24.974
## - x4    1    0.2470 48.111 25.011
## - x2    1    2.9725 50.836 25.728
## <none>              47.864 26.944
## - x1    1   25.9509 73.815 30.576
## 
## Step:  AIC=24.97
## y ~ x1 + x2 + x4
## 
##        Df Sum of Sq    RSS    AIC
## <none>               47.97 24.974
## - x4    1      9.93  57.90 25.420
## - x2    1     26.79  74.76 28.742
## - x1    1    820.91 868.88 60.629
# Show the summary of the final selected model
summary(stepwise_model_backward)
## 
## Call:
## lm(formula = y ~ x1 + x2 + x4, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0919 -1.8016  0.2562  1.2818  3.8982 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  71.6483    14.1424   5.066 0.000675 ***
## x1            1.4519     0.1170  12.410 5.78e-07 ***
## x2            0.4161     0.1856   2.242 0.051687 .  
## x4           -0.2365     0.1733  -1.365 0.205395    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.309 on 9 degrees of freedom
## Multiple R-squared:  0.9823, Adjusted R-squared:  0.9764 
## F-statistic: 166.8 on 3 and 9 DF,  p-value: 3.323e-08
# Display the final selected model
print(stepwise_model_backward)
## 
## Call:
## lm(formula = y ~ x1 + x2 + x4, data = data)
## 
## Coefficients:
## (Intercept)           x1           x2           x4  
##     71.6483       1.4519       0.4161      -0.2365

3.3 (c) Stepwise Regression (Both Directions)

# Perform stepwise regression based on AIC (both directions)
stepwise_model_both <- step(null_model,
                            scope     = list(lower = null_model, upper = full_model),
                            direction = "both",
                            trace     = 1)
## Start:  AIC=71.44
## y ~ 1
## 
##        Df Sum of Sq     RSS    AIC
## + x4    1   1831.90  883.87 58.852
## + x2    1   1809.43  906.34 59.178
## + x1    1   1450.08 1265.69 63.519
## + x3    1    776.36 1939.40 69.067
## <none>              2715.76 71.444
## 
## Step:  AIC=58.85
## y ~ x4
## 
##        Df Sum of Sq     RSS    AIC
## + x1    1    809.10   74.76 28.742
## + x3    1    708.13  175.74 39.853
## <none>               883.87 58.852
## + x2    1     14.99  868.88 60.629
## - x4    1   1831.90 2715.76 71.444
## 
## Step:  AIC=28.74
## y ~ x4 + x1
## 
##        Df Sum of Sq     RSS    AIC
## + x2    1     26.79   47.97 24.974
## + x3    1     23.93   50.84 25.728
## <none>                74.76 28.742
## - x1    1    809.10  883.87 58.852
## - x4    1   1190.92 1265.69 63.519
## 
## Step:  AIC=24.97
## y ~ x4 + x1 + x2
## 
##        Df Sum of Sq    RSS    AIC
## <none>               47.97 24.974
## - x4    1      9.93  57.90 25.420
## + x3    1      0.11  47.86 26.944
## - x2    1     26.79  74.76 28.742
## - x1    1    820.91 868.88 60.629
# Show the summary of the final selected model
summary(stepwise_model_both)
## 
## Call:
## lm(formula = y ~ x4 + x1 + x2, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0919 -1.8016  0.2562  1.2818  3.8982 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  71.6483    14.1424   5.066 0.000675 ***
## x4           -0.2365     0.1733  -1.365 0.205395    
## x1            1.4519     0.1170  12.410 5.78e-07 ***
## x2            0.4161     0.1856   2.242 0.051687 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.309 on 9 degrees of freedom
## Multiple R-squared:  0.9823, Adjusted R-squared:  0.9764 
## F-statistic: 166.8 on 3 and 9 DF,  p-value: 3.323e-08
# Display the final selected model
print(stepwise_model_both)
## 
## Call:
## lm(formula = y ~ x4 + x1 + x2, data = data)
## 
## Coefficients:
## (Intercept)           x4           x1           x2  
##     71.6483      -0.2365       1.4519       0.4161

3.4 (d) Comment on Final Model

Comment: All three variable selection methods (forward selection, backward elimination, and stepwise regression) are based on AIC minimization. They often converge to the same final model. The selected subset model retains only the predictors that contribute meaningfully to explaining variation in y, while discarding those that inflate variance without improving fit. The final model should be interpreted in terms of parsimony and statistical significance of retained predictors.


4 Question 4: Housing Data — Cross-Validation

A dataset of housing features including square footage (sqft), number of rooms (rooms), age (age), and price (price):

sqft  <- c(1500, 1800, 1200, 2000, 2200, 1600, 1700, 1400, 1900, 2500)
rooms <- c(3, 4, 3, 5, 5, 4, 4, 3, 4, 6)
age   <- c(10, 15, 20, 5, 2, 10, 8, 12, 4, 1)
price <- c(400000, 500000, 350000, 600000, 650000, 450000, 480000, 380000, 550000, 700000)

# Combine into a data frame
data <- data.frame(sqft, rooms, age, price)

4.1 (a) Leave-One-Out Cross-Validation (LOOCV)

# Define the model formula
formula <- price ~ sqft + rooms + age

# Initialize variables
n      <- nrow(data)
errors <- numeric(n)

# Leave-One-Out Cross-Validation
for (i in 1:n) {
  # Split the data into training and test sets
  train_data <- data[-i, ]          # all except the i-th row
  test_data  <- data[i, , drop = FALSE]  # only the i-th row

  # Fit the linear model on the training data
  model_cv <- lm(formula, data = train_data)

  # Make prediction for the test set (the i-th observation)
  predicted <- predict(model_cv, newdata = test_data)

  # Calculate the squared error for this fold
  errors[i] <- (test_data$price - predicted)^2
}

# Calculate the RMSE (Root Mean Squared Error)
rmse_loocv <- sqrt(mean(errors))
cat("LOOCV RMSE:", rmse_loocv, "\n")
## LOOCV RMSE: 25612.06

4.2 (b) 5-Fold Cross-Validation

# Set the number of folds for cross-validation
k <- 5
n <- nrow(data)

# Create indices for the folds
set.seed(123)  # Set seed for reproducibility
folds <- sample(rep(1:k, length.out = n))

# Initialize a vector to store RMSE values for each fold
rmse_values <- numeric(k)

# Loop through each fold
for (i in 1:k) {
  # Split the data into training and testing sets
  test_indices  <- which(folds == i)
  train_data    <- data[-test_indices, ]
  test_data     <- data[test_indices, ]

  # Fit a linear regression model on the training set
  model_fold <- lm(price ~ sqft + rooms + age, data = train_data)

  # Predict on the test set
  predictions <- predict(model_fold, newdata = test_data)

  # Calculate RMSE for this fold
  rmse_values[i] <- sqrt(mean((predictions - test_data$price)^2))
}

# Average RMSE across all folds
avg_rmse <- mean(rmse_values)
avg_rmse
## [1] 25129.48
# Calculate the average RMSE across all folds
print(paste("RMSE for each fold:", toString(rmse_values)))
## [1] "RMSE for each fold: 19521.66614908, 20550.0775918097, 28898.116995628, 39377.8852454853, 17299.6613988924"
cat("Average 5-Fold CV RMSE:", avg_rmse, "\n")
## Average 5-Fold CV RMSE: 25129.48

Conclusion: LOOCV uses each observation once as a test case, giving an almost unbiased estimate of prediction error but with higher variance. 5-fold CV balances bias-variance tradeoff and is computationally less expensive. Both methods provide an estimate of how well the model generalizes to unseen data.


5 Question 5: Multiple Linear Regression — ANOVA & Hypothesis Testing

Data with response y and two predictors \(x_1\) and \(x_2\):

y  <- c(16.68, 11.50, 12.03, 14.88, 13.75, 18.11, 8.00, 17.83, 79.24, 21.50, 40.33, 21.00)
x1 <- c(7, 3, 3, 4, 6, 7, 2, 7, 30, 5, 16, 10)
x2 <- c(560, 220, 340, 80, 150, 330, 110, 210, 1460, 605, 688, 215)

# Combine into a data frame
data <- data.frame(y, x1, x2)

5.1 (a) Fit Multiple Linear Regression Model

\[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2\]

# Fitting the multiple linear regression model
model <- lm(y ~ x1 + x2, data = data)
model
## 
## Call:
## lm(formula = y ~ x1 + x2, data = data)
## 
## Coefficients:
## (Intercept)           x1           x2  
##     1.74124      1.86784      0.01352

5.2 (b) ANOVA Table & Significance of Regression

# Constructing the ANOVA table
anova_table <- anova(model)
print(anova_table)
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq  F value    Pr(>F)    
## x1         1 4054.3  4054.3 501.6551 3.338e-09 ***
## x2         1   62.0    62.0   7.6707   0.02177 *  
## Residuals  9   72.7     8.1                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comment on Significance: If the overall F-statistic p-value < 0.05, we conclude the regression is statistically significant — i.e., at least one of \(x_1\) or \(x_2\) significantly explains variation in y.

5.3 (c) t-Statistics for \(H_0: \beta_1 = 0\) and \(H_0: \beta_2 = 0\)

# Summary of the model — gives t-statistics and p-values for each coefficient
summary(model)
## 
## Call:
## lm(formula = y ~ x1 + x2, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7077 -1.1826  0.1314  1.3163  4.5857 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.741239   1.251702   1.391   0.1976    
## x1          1.867842   0.241182   7.745 2.87e-05 ***
## x2          0.013521   0.004882   2.770   0.0218 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.843 on 9 degrees of freedom
## Multiple R-squared:  0.9826, Adjusted R-squared:  0.9788 
## F-statistic: 254.7 on 2 and 9 DF,  p-value: 1.198e-08

Conclusion on Role of Variables:
- If the p-value for \(x_1\) (t-test for \(\beta_1\)) < 0.05, we reject \(H_0: \beta_1 = 0\), concluding \(x_1\) has a significant effect on y.
- If the p-value for \(x_2\) (t-test for \(\beta_2\)) < 0.05, we reject \(H_0: \beta_2 = 0\), concluding \(x_2\) has a significant effect on y.
- Variables with p-values ≥ 0.05 may not contribute significantly beyond the other predictor.

5.4 (d) \(R^2\) and Adjusted \(R^2\)

model_summary <- summary(model)

# R-squared
cat("R-squared: ", model_summary$r.squared,   "\n")
## R-squared:  0.9826364
# Adjusted R-squared
cat("Adjusted R-squared: ", model_summary$adj.r.squared, "\n")
## Adjusted R-squared:  0.9787778

Interpretation:
- \(R^2\) measures the proportion of total variation in y explained by the model.
- Adjusted \(R^2\) penalizes for adding unnecessary predictors; it is more reliable for comparing models with different numbers of predictors.
- A high Adjusted \(R^2\) (close to 1) indicates the model fits the data well.


End of Lab Report — STAT 3206