Author

Penelope Pooler Eisenbies

Published

February 26, 2026

Setup

Run the following chunk of R code to install and load the packages needed for this assignment.

Click green triangle in upper right corner of the setup chunk to run the setup code.

Note that setup code will not appear in the rendered HTML file.

House Remodel Data - Questions 1 - 6

Import and Examine Data

Question 1

Examine the R output from the chunk below to answer these questions on Blackboard.

  • The house_remodel_hw6 dataset has ____ observations.

  • There are ____ remodeled houses and ____ un-remodeled houses in this dataset.

  • I can add a note to myself here about this question.

Code
```{r}
#|label: Question 1 - Import and examine house_remodel_hw6 data

# import and examine data
houses_hw6 <- read_csv("data/house_remodel_hw6.csv", show_col_types = F) |>
  glimpse(width=75)

# examine counts for each category
houses_hw6 |> select(Remodeled) |> table()
```
Rows: 30
Columns: 3
$ Price       <dbl> 391000, 354000, 410000, 349000, 409000, 393000, 32100…
$ Square_Feet <dbl> 1846, 1820, 1794, 1768, 1752, 1719, 1676, 1668, 1646,…
$ Remodeled   <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No",…
Remodeled
 No Yes 
 14  16 

Examine Correlations within Data

Question 2

Examine the R output from the chunk below to answer these questions.

  • The overall correlation between Price and Square Feet is ____.

  • The correlation between Price and Square Feet in un-remodeled houses is ____.

  • The correlation between Price and Square Feet in remodeled houses is ____.

I might want to add a note to myself about this question.

Code
```{r}
#|label: Question 2 - Examine Correlations

# examine correlation between Price and Square_Feet
houses_hw6 |> select(Price, Square_Feet) |> cor() |> round(2)

# examine correlation between price and square feet in un-remodeled houses
houses_hw6 |> filter(Remodeled=="No") |>
  select(Price, Square_Feet) |> cor() |> round(2)

# examine correlation between price and square feet in remodeled houses
houses_hw6 |> filter(Remodeled=="Yes") |>
  select(Price, Square_Feet) |> cor() |> round(2)
```
            Price Square_Feet
Price        1.00        0.65
Square_Feet  0.65        1.00
            Price Square_Feet
Price        1.00        0.52
Square_Feet  0.52        1.00
            Price Square_Feet
Price         1.0         0.6
Square_Feet   0.6         1.0

Modeling the House Remodel Data

Questions 3 - 6

  • Below are two chunks of R code.

  • The first chunk creates the interactive plot

    • Copy and paste ggPredict command into R Console to view plot more clearly in RStudio Viewer (Lower Left Pane).
  • The second chunk creates the model and prints it.

  • Use the interactive plot and model output to answer Questions 3 - 6

Question 3. What is the SLR model equation for un-remodeled houses (Remodeled = No)?

  • Round values to two decimal places.

  • Est. Price = ___ + ___ * Square_Feet.

  • Hint for Question 3:

    • The un-remodeled houses (Remodeled = No) are the baseline category (not listed in output).

    • The baseline Intercept Beta and Square_Feet Beta are the coefficients for the baseline category SLR.

Question 4. What is the SLR model equation for remodeled houses (Remodeled = Yes)?

  • Round values to two decimal places.

  • Est. Price = ___ + ___ * Square_Feet.

Hint for Question 4:

  • The intercept for the remodeled houses (Remodeled = Yes) is calculated as: baseline Intercept Beta + RemodeledYes Beta

Hint for Questions 3 and 4

  • You can check your work by examining the model equations for each line in the interactive plot.

  • The slope is the same for both Remodeled categories, but the intercepts differ.

Interactive Categorical Model Plot

Code
```{r}
#|label: Questions 3-6 - Categorical Regression Model Plot


# mlr categorical model
house_rem_cat_lm <- lm(Price ~ Square_Feet + Remodeled, data=houses_hw6)

# create interactive plot of model
# copy and paste this into console to view interactive plot
ggPredict(house_rem_cat_lm, interactive=T)
```
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggiraphExtra package.
  Please report the issue to the authors.

Question 5. Fill in the blanks. Round values to 2 decimal places.

  • If a house is remodeled, the estimated price increase will be ___.

  • For both remodeled houses and un-remodeled houses, the price increase for each additional square foot is ___.

Hints for Question 5:

  • The difference due to remodeling is the RemodeledYes Beta in the model output.

  • The price increase for each additional square foot is the slope, Square_Feet Beta, common to both models.

Question 6. Based on the P-value (Sig) for the difference due to remodeling (RemodeledYes), copy and paste the correct phrase to complete this sentence:

  • After accounting for the relationship between Price and Square Feet, we see that there is ___ in price between un-remodeled and remodeled houses.

    • Copy and paste the correct phrase from these options:

      • not a significant difference

      • suggestive evidence of a significant difference

      • definitely a significant difference

Categorical Regression Model output

Code
```{r}
#|label: Questions 3-6 - Categorical Regression Model Formal Output

# formatted regression output
# model is saved and printed to screen
(house_rem_cat_ols<- ols_regress(Price ~ Square_Feet + Remodeled, data=houses_hw6))
```
                              Model Summary                                
--------------------------------------------------------------------------
R                           0.884       RMSE                    31720.100 
R-Squared                   0.782       MSE                1006164737.274 
Adj. R-Squared              0.765       Coef. Var                   7.883 
Pred R-Squared              0.727       AIC                       715.019 
MAE                     27454.037       SBC                       720.623 
--------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                     ANOVA                                       
--------------------------------------------------------------------------------
                        Sum of                                                  
                       Squares        DF        Mean Square      F         Sig. 
--------------------------------------------------------------------------------
Regression    108020180265.937         2    54010090132.968    48.311    0.0000 
Residual       30184942118.230        27     1117960819.194                     
Total         138205122384.167        29                                        
--------------------------------------------------------------------------------

                                        Parameter Estimates                                         
---------------------------------------------------------------------------------------------------
       model          Beta    Std. Error    Std. Beta      t       Sig         lower         upper 
---------------------------------------------------------------------------------------------------
 (Intercept)    166419.209     55863.436                 2.979    0.006    51796.906    281041.511 
 Square_Feet       118.140        32.902        0.360    3.591    0.001       50.630       185.649 
RemodeledYes     90325.284     13640.194        0.664    6.622    0.000    62337.918    118312.649 
---------------------------------------------------------------------------------------------------

Diamonds Data - Questions 7 - 16

Import and Examine Data

Question 7

Examine the R output below to answer these questions on Blackboard.

  • The diamonds dataset has ____ observations.

  • In the diamonds dataset for HW6 there are

    • ____ Colorless diamonds.

    • ____ Faint yellow diamonds.

    • ____ Nearly colorless.

Import and Examine Diamonds Dataset

Code
```{r}
#|label: Question 7 - Import and Examine Diamonds Data

diamonds <- read_csv("data/diamonds_hw6.csv", show_col_types = F) |>
  glimpse(width = 75)

diamonds |> select(Color) |> table()
```
Rows: 77
Columns: 3
$ Price  <dbl> 2995, 4482, 2796, 2798, 3337, 4583, 4439, 5190, 5190, 8464…
$ Weight <dbl> 0.71, 0.82, 0.73, 0.73, 0.76, 0.90, 0.83, 0.90, 0.90, 1.27…
$ Color  <chr> "Colorless", "Colorless", "Colorless", "Colorless", "Color…
Color
       Colorless     Faint yellow Nearly colorless 
              20               30               27 

Question 8

Use the formal regression output and/or the interactive plot below to answer this question.

  • Recall that the baseline category, is the first category alphabetically, Colorless.

  • The beta values for Intercept and Weight are the SLR model for this baseline category:

  • For Colorless diamonds, the SLR model is (round terms to 2 decimal places):

    • Est. Price = ____ + ____ Weight

Question 9

  • What is the estimated price in dollars of a colorless diamond that weighs 0.75 carats?

    • Round estimate to closest whole dollar.

    • DO NOT include dollar sign.

    • This calculation can be done in the R Console using values found in Question 8.

Questions 10-15

  • The first code block below created the linear model and the interactive plot (run ggPredict in Console) for the diamonds data.

  • The second code block below saves the full model output but only prints the abridged output to avoid text-wrapping.

Questions 10-12

  • Use the formal regression output to answer Questions 10-11 about Faint yellow diamonds.

  • Use the formal regression output and/or the interactive plot to answer Question 12 about about Faint yellow diamonds.

Question 10

The difference in intercept from the baseline Intercept (Colorless) to the Intercept for the Faint yellow category is ____.

Question 11

The difference in slope from the baseline slope (Colorless) to the slope for the Faint yellow category is ____.

  • Hint: The numerical variable in this model is Weight so all slope terms will include Weight in their label.

Question 12

Use the answers from Questions 10 and 11 and/or the interactive plot to answer this question.

For Faint yellow diamonds, the slr model is (round terms to 2 decimal places):

- `Est. Price = ____ + ____ Weight`.

Questions 13-15

  • Use the formal regression output to answer Questions 13-14 about Nearly colorless diamonds.

  • Use the formal regression output and/or the interactive plot to answer Question 15 about about Nearly colorless diamonds.

Question 13

The difference in intercept from the baseline intercept (Colorless) to the intercept for Nearly colorless category is ____.

Question 14

The difference in slope from the baseline slope (Colorless) to the slope for the Nearly colorless category is ____.

Question 15

Use the answers from Questions 13 and 14 and/or the interactive plot to answer this question.

For Nearly colorless diamonds, the slr model is (round terms to 2 decimal places):

-   `Est. Price = ____ + ____ Weight`.

Question 16

Select the correct text to fill in the blanks to complete these sentences.

Based on all of the P-values (Sig column) in the model output, we can determine that:

  • The model intercepts for each the three diamond color categories are ____.

  • The model slopes for each the three diamond color categories are ____.

    • Copy and paste the correct phrase from these options:

      • not significantly different from each other

      • show some suggestive differences from each other

      • significantly different from each other

Interactive Model Plot

Code
```{r}
#|label: Questions 8-15 - Categorical Regression Interaction Model Plot

# mlr interaction model
diamonds_int_lm <- lm(Price ~ Weight + Color + Weight*Color, data=diamonds)

# create interactive plot of model
# copy and paste this into console to view interactive plot
ggPredict(diamonds_int_lm, interactive=T)
```

Abridged Model Output

Code
```{r}
#|label: Questions 8-15 - Categorical Regression Interaction Model Abridged Output

# abridged formatted regression output
(diamonds_int_ols <- ols_regress(Price ~ Weight + Color + Weight*Color, data=diamonds, iterm=T))

(model_out <- tibble(diamonds_int_ols$mvars,      # create temp dataset
                     diamonds_int_ols$betas,
                     diamonds_int_ols$std_errors,
                     diamonds_int_ols$tvalues,
                     diamonds_int_ols$pvalues) |>
    
    mutate(`model` = `diamonds_int_ols$mvars`,   # rename and round columns
           `Beta` = round(`diamonds_int_ols$betas`,2),
           `Std. Error` = round(`diamonds_int_ols$std_errors`,2),
           `t` = round(`diamonds_int_ols$tvalues`,2),
           `Sig` = round(`diamonds_int_ols$pvalues`,4)) |>
    
    select(6:10))                             # select output columns
```
                           Model Summary                            
-------------------------------------------------------------------
R                         0.991       RMSE                 264.309 
R-Squared                 0.983       MSE                69859.371 
Adj. R-Squared            0.981       Coef. Var              6.458 
Pred R-Squared            0.979       AIC                 1091.393 
MAE                     213.089       SBC                 1107.800 
-------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                   ANOVA                                    
---------------------------------------------------------------------------
                     Sum of                                                
                    Squares        DF     Mean Square       F         Sig. 
---------------------------------------------------------------------------
Regression    304345902.717         5    60869180.543    803.416    0.0000 
Residual        5379171.595        71       75762.980                      
Total         309725074.312        76                                      
---------------------------------------------------------------------------

                                                Parameter Estimates                                                 
-------------------------------------------------------------------------------------------------------------------
                       model         Beta    Std. Error    Std. Beta       t        Sig         lower        upper 
-------------------------------------------------------------------------------------------------------------------
                 (Intercept)    -4446.564       318.593                 -13.957    0.000    -5081.819    -3811.308 
                      Weight    10476.135       314.953        1.045     33.263    0.000     9848.136    11104.134 
           ColorFaint yellow     3464.410       415.456       -0.777      8.339    0.000     2636.013     4292.806 
       ColorNearly colorless     1218.589       411.033       -0.360      2.965    0.004      399.013     2038.165 
    Weight:ColorFaint yellow    -6670.528       413.274       -0.391    -16.141    0.000    -7494.572    -5846.483 
Weight:ColorNearly colorless    -2737.150       400.984       -0.173     -6.826    0.000    -3536.689    -1937.611 
-------------------------------------------------------------------------------------------------------------------
# A tibble: 6 × 5
  model                          Beta `Std. Error`      t    Sig
  <chr>                         <dbl>        <dbl>  <dbl>  <dbl>
1 (Intercept)                  -4447.         319. -14.0  0     
2 Weight                       10476.         315.  33.3  0     
3 ColorFaint yellow             3464.         415.   8.34 0     
4 ColorNearly colorless         1219.         411.   2.96 0.0041
5 Weight:ColorFaint yellow     -6671.         413. -16.1  0     
6 Weight:ColorNearly colorless -2737.         401.  -6.83 0     

Formatted Abridged Model Output

  • Below I use R coding to format the output to make it easier to read in the HTML file
  • The values are IDENTICAL to the unformatted output above.
  • Note: Formatted Output will differ in appearance depending on where it is viewed, i.e. slides, html file, or .qmd file.
    • This output will not show up well on a dark screen but will appear in the HTML file.

    • Output on a Quiz is likely to look like this.

Code
```{r}
#|label: Questions 8-15 - Categorical Regression Interaction Model Abridged Fromatted Output

model_out |> kable()
```
model Beta Std. Error t Sig
(Intercept) -4446.56 318.59 -13.96 0.0000
Weight 10476.13 314.95 33.26 0.0000
ColorFaint yellow 3464.41 415.46 8.34 0.0000
ColorNearly colorless 1218.59 411.03 2.96 0.0041
Weight:ColorFaint yellow -6670.53 413.27 -16.14 0.0000
Weight:ColorNearly colorless -2737.15 400.98 -6.83 0.0000

When you are done…

  1. Save your changes to this file.(Ctrl + S or Cmd + S)
  2. OPTIONAL: Click Render button to update html file with your changes.
  3. Close R/RStudio on your laptop or close Posit Cloud Browser.