Author

Penelope Pooler Eisenbies

Published

February 25, 2026

Setup

Run the following chunk of R code to install and load the packages needed for this assignment.

Click green triangle in upper right corner of the setup chunk to run the setup code.

Note that setup code will not appear in the rendered HTML file.

House Remodel Data - Questions 1 - 6

Import and Examine Data

Question 1

Examine the R output from the chunk below to answer these questions on Blackboard.

  • The house_remodel_hw6 dataset has ____ observations.

  • There are ____ remodeled houses and ____ un-remodeled houses in this dataset.

Code
```{r}
#|label: Question 1 - Import and examine house_remodel_hw6 data

# import and examine data
houses_hw6 <- read_csv("data/house_remodel_hw6.csv", show_col_types = F) |>
  glimpse(width=75)

# examine counts for each category
houses_hw6 |> select(Remodeled) |> table()
```
Rows: 30
Columns: 3
$ Price       <dbl> 391000, 354000, 410000, 349000, 409000, 393000, 32100…
$ Square_Feet <dbl> 1846, 1820, 1794, 1768, 1752, 1719, 1676, 1668, 1646,…
$ Remodeled   <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No",…
Remodeled
 No Yes 
 14  16 

Examine Correlations within Data

Question 2

Examine the R output from the chunk below to answer these questions.

  • The overall correlation between Price and Square Feet is ____.

  • The correlation between Price and Square Feet in un-remodeled houses is ____.

  • The correlation between Price and Square Feet in remodeled houses is ____.

Code
```{r}
#|label: Question 2 - Examine Correlations

# examine correlation between Price and Square_Feet
houses_hw6 |> select(Price, Square_Feet) |> cor() |> round(2)

# examine correlation between price and square feet in un-remodeled houses
houses_hw6 |> filter(Remodeled=="No") |>
  select(Price, Square_Feet) |> cor() |> round(2)

# examine correlation between price and square feet in remodeled houses
houses_hw6 |> filter(Remodeled=="Yes") |>
  select(Price, Square_Feet) |> cor() |> round(2)
```
            Price Square_Feet
Price        1.00        0.65
Square_Feet  0.65        1.00
            Price Square_Feet
Price        1.00        0.52
Square_Feet  0.52        1.00
            Price Square_Feet
Price         1.0         0.6
Square_Feet   0.6         1.0

Modeling the House Remodel Data

Questions 3 - 6

  • Below are two chunks of R code.

  • The first chunk creates the interactive plot

    • Copy and paste ggPredict command into R Console to view plot more clearly in RStudio Viewer (Lower Left Pane).
  • The second chunk creates the model and prints it.

  • Use the interactive plot and model output to answer Questions 3 - 6

Question 3. What is the SLR model equation for un-remodeled houses (Remodeled = No)?

  • Round values to two decimal places.

  • Est. Price = ___ + ___ * Square_Feet.

  • Hint for Question 3:

    • The un-remodeled houses (Remodeled = No) are the baseline category (not listed in output).

    • The baseline Intercept Beta and Square_Feet Beta are the coefficients for the baseline category SLR.

Question 4. What is the SLR model equation for remodeled houses (Remodeled = Yes)?

  • Round values to two decimal places.

  • Est. Price = ___ + ___ * Square_Feet.

Hint for Question 4:

  • The intercept for the remodeled houses (Remodeled = Yes) is calculated as: baseline Intercept Beta + RemodeledYes Beta

Hint for Questions 3 and 4

  • You can check your work by examining the model equations for each line in the interactive plot.

  • The slope is the same for both Remodeled categories, but the intercepts differ.

Interactive Categorical Model Plot

Code
```{r}
#|label: Questions 3-6 - Categorical Regression Model Plot


# mlr categorical model
house_rem_cat_lm <- lm(Price ~ Square_Feet + Remodeled, data=houses_hw6)

# create interactive plot of model
# copy and paste this into console to view interactive plot
ggPredict(house_rem_cat_lm, interactive=T)
```
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggiraphExtra package.
  Please report the issue to the authors.

Question 5. Fill in the blanks. Round values to 2 decimal places.

  • If a house is remodeled, the estimated price increase will be ___.

  • For both remodeled houses and un-remodeled houses, the price increase for each additional square foot is ___.

Hints for Question 5:

  • The difference due to remodeling is the RemodeledYes Beta in the model output.

  • The price increase for each additional square foot is the slope, Square_Feet Beta, common to both models.

Question 6. Based on the P-value (Sig) for the difference due to remodeling (RemodeledYes), copy and paste the correct phrase to complete this sentence:

  • After accounting for the relationship between Price and Square Feet, we see that there is ___ in price between un-remodeled and remodeled houses.

    • Copy and paste the correct phrase from these options:

      • not a significant difference

      • suggestive evidence of a significant difference

      • definitely a significant difference

Categorical Regression Model output

Code
```{r}
#|label: Questions 3-6 - Categorical Regression Model Formal Output

# formatted regression output
# model is saved and printed to screen
(house_rem_cat_ols<- ols_regress(Price ~ Square_Feet + Remodeled, data=houses_hw6))
```
                              Model Summary                                
--------------------------------------------------------------------------
R                           0.884       RMSE                    31720.100 
R-Squared                   0.782       MSE                1006164737.274 
Adj. R-Squared              0.765       Coef. Var                   7.883 
Pred R-Squared              0.727       AIC                       715.019 
MAE                     27454.037       SBC                       720.623 
--------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                     ANOVA                                       
--------------------------------------------------------------------------------
                        Sum of                                                  
                       Squares        DF        Mean Square      F         Sig. 
--------------------------------------------------------------------------------
Regression    108020180265.937         2    54010090132.968    48.311    0.0000 
Residual       30184942118.230        27     1117960819.194                     
Total         138205122384.167        29                                        
--------------------------------------------------------------------------------

                                        Parameter Estimates                                         
---------------------------------------------------------------------------------------------------
       model          Beta    Std. Error    Std. Beta      t       Sig         lower         upper 
---------------------------------------------------------------------------------------------------
 (Intercept)    166419.209     55863.436                 2.979    0.006    51796.906    281041.511 
 Square_Feet       118.140        32.902        0.360    3.591    0.001       50.630       185.649 
RemodeledYes     90325.284     13640.194        0.664    6.622    0.000    62337.918    118312.649 
---------------------------------------------------------------------------------------------------

Diamonds Data - Questions 7 - 16

Import and Examine Data

Question 7

Examine the R output below to answer these questions on Blackboard.

  • The diamonds dataset has ____ observations.

  • In the diamonds dataset for HW6 there are

    • ____ Colorless diamonds.

    • ____ Faint yellow diamonds.

    • ____ Nearly colorless.

Import and Examine Diamonds Dataset

Code
```{r}
#|label: Question 7 - Import and Examine Diamonds Data

diamonds <- read_csv("data/diamonds_hw6.csv", show_col_types = F) |>
  glimpse(width = 75)

diamonds |> select(Color) |> table()
```
Rows: 77
Columns: 3
$ Price  <dbl> 2995, 4482, 2796, 2798, 3337, 4583, 4439, 5190, 5190, 8464…
$ Weight <dbl> 0.71, 0.82, 0.73, 0.73, 0.76, 0.90, 0.83, 0.90, 0.90, 1.27…
$ Color  <chr> "Colorless", "Colorless", "Colorless", "Colorless", "Color…
Color
       Colorless     Faint yellow Nearly colorless 
              20               30               27 

Question 8

Use the formal regression output and/or the interactive plot below to answer this question.

  • Recall that the baseline category, is the first category alphabetically, Colorless.

  • The beta values for Intercept and Weight are the SLR model for this baseline category:

  • For Colorless diamonds, the SLR model is (round terms to 2 decimal places):

    • Est. Price = ____ + ____ Weight

Question 9

  • What is the estimated price in dollars of a colorless diamond that weighs 0.75 carats?

    • Round estimate to closest whole dollar.

    • DO NOT include dollar sign.

    • This calculation can be done in the R Console using values found in Question 8.

Questions 10-15

  • The first code block below created the linear model and the interactive plot (run ggPredict in Console) for the diamonds data.

  • The second code block below saves the full model output but only prints the abridged output to avoid text-wrapping.

Questions 10-12

  • Use the formal regression output to answer Questions 10-11 about Faint yellow diamonds.

  • Use the formal regression output and/or the interactive plot to answer Question 12 about about Faint yellow diamonds.

Question 10

The difference in intercept from the baseline Intercept (Colorless) to the Intercept for the Faint yellow category is ____.

Question 11

The difference in slope from the baseline slope (Colorless) to the slope for the Faint yellow category is ____.

  • Hint: The numerical variable in this model is Weight so all slope terms will include Weight in their label.

Question 12

Use the answers from Questions 10 and 11 and/or the interactive plot to answer this question.

For Faint yellow diamonds, the slr model is (round terms to 2 decimal places):

- `Est. Price = ____ + ____ Weight`.

Questions 13-15

  • Use the formal regression output to answer Questions 13-14 about Nearly colorless diamonds.

  • Use the formal regression output and/or the interactive plot to answer Question 15 about about Nearly colorless diamonds.

Question 13

The difference in intercept from the baseline intercept (Colorless) to the intercept for Nearly colorless category is ____.

Question 14

The difference in slope from the baseline slope (Colorless) to the slope for the Nearly colorless category is ____.

Question 15

Use the answers from Questions 13 and 14 and/or the interactive plot to answer this question.

For Nearly colorless diamonds, the slr model is (round terms to 2 decimal places):

-   `Est. Price = ____ + ____ Weight`.

Question 16

Select the correct text to fill in the blanks to complete these sentences.

Based on all of the P-values (Sig column) in the model output, we can determine that:

  • The model intercepts for each the three diamond color categories are ____.

  • The model slopes for each the three diamond color categories are ____.

    • Copy and paste the correct phrase from these options:

      • not significantly different from each other

      • show some suggestive differences from each other

      • significantly different from each other

Interactive Model Plot

Code
```{r}
#|label: Questions 8-15 - Categorical Regression Interaction Model Plot

# mlr interaction model
diamonds_int_lm <- lm(Price ~ Weight + Color + Weight*Color, data=diamonds)

# create interactive plot of model
# copy and paste this into console to view interactive plot
ggPredict(diamonds_int_lm, interactive=T)
```

Abridged Model Output

Code
```{r}
#|label: Questions 8-15 - Categorical Regression Interaction Model Abridged Output

# abridged formatted regression output
diamonds_int_ols <- ols_regress(Price ~ Weight + Color + Weight*Color, data=diamonds, iterm=T)

(model_out <- tibble(diamonds_int_ols$mvars,      # create temp dataset
                     diamonds_int_ols$betas,
                     diamonds_int_ols$std_errors,
                     diamonds_int_ols$tvalues,
                     diamonds_int_ols$pvalues) |>
    
    mutate(`model` = `diamonds_int_ols$mvars`,   # rename and round columns
           `Beta` = round(`diamonds_int_ols$betas`,2),
           `Std. Error` = round(`diamonds_int_ols$std_errors`,2),
           `t` = round(`diamonds_int_ols$tvalues`,2),
           `Sig` = round(`diamonds_int_ols$pvalues`,4)) |>
    
    select(6:10))                             # select output columns
```
# A tibble: 6 × 5
  model                          Beta `Std. Error`      t    Sig
  <chr>                         <dbl>        <dbl>  <dbl>  <dbl>
1 (Intercept)                  -4447.         319. -14.0  0     
2 Weight                       10476.         315.  33.3  0     
3 ColorFaint yellow             3464.         415.   8.34 0     
4 ColorNearly colorless         1219.         411.   2.96 0.0041
5 Weight:ColorFaint yellow     -6671.         413. -16.1  0     
6 Weight:ColorNearly colorless -2737.         401.  -6.83 0     

Formatted Abridged Model Output

  • Below I use R coding to format the output to make it easier to read in the HTML file
  • The values are IDENTICAL to the unformatted output above.
  • Note: Formatted Output will differ in appearance depending on where it is viewed, i.e. slides, html file, or .qmd file.
    • This output will not show up well on a dark screen but will appear in the HTML file.

    • Output on a Quiz is likely to look like this.

Code
```{r}
#|label: Questions 8-15 - Categorical Regression Interaction Model Abridged Fromatted Output

model_out |> kable() |> kable_styling(full_width = F)
```
model Beta Std. Error t Sig
(Intercept) -4446.56 318.59 -13.96 0.0000
Weight 10476.13 314.95 33.26 0.0000
ColorFaint yellow 3464.41 415.46 8.34 0.0000
ColorNearly colorless 1218.59 411.03 2.96 0.0041
Weight:ColorFaint yellow -6670.53 413.27 -16.14 0.0000
Weight:ColorNearly colorless -2737.15 400.98 -6.83 0.0000

When you are done…

  1. Save your changes to this file.(Ctrl + S or Cmd + S)
  2. OPTIONAL: Click Render button to update html file with your changes.
  3. Close R/RStudio on your laptop or close Posit Cloud Browser.