Discussionwk5

Author

Nuobing Fan

library(MASS)
library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
library(car)
Warning: package 'car' was built under R version 4.3.3
Loading required package: carData
Warning: package 'carData' was built under R version 4.3.3
# Load the Boston dataset from MASS
data("Boston")

# View the structure of the dataset
str(Boston)
'data.frame':   506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black  : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
??Boston

Part 1: What is bias of an estimator.? You can read online blogs or even your textbooks to answer this question.

The bias of an estimator is the difference between the expected value of the estimator and the true value of the parameter it aims to estimate. In simpler terms, it reflects how much, on average, the estimator deviates from the actual parameter. An estimator is unbiased if its expected value equals the true parameter, meaning it accurately represents the parameter without systematic error. If the estimator consistently overestimates or underestimates the parameter, it is considered biased.

Part 2: In terms of omitted variable bias, will the bias go away if we increase the same size or add more variables?

No, omitted variable bias (OVB) will not go away by increasing the sample size or adding more variables. OVB arises when a relevant variable that affects both the dependent and independent variables is left out of the model. This creates a systematic bias in the estimated coefficients. Increasing the sample size may reduce the variance but will not address the bias caused by the missing variable. Similarly, adding unrelated variables will not eliminate OVB; the bias remains until the omitted variable is included in the model or accounted for by other means, such as using instrumental variables.

Part 3: Give me 1 distinct example of OVB (on a different dataset).

I use “Boston” here as the dataset and tested p value at previous discussion

??Boston

model <- lm(medv ~ ., data = Boston)

# Get the summary of the model to check p-values
model_summary <- summary(model)

# Extract coefficients and p-values from the model summary
coefficients_df <- as.data.frame(model_summary$coefficients)

# Extract p-values
p_values <- coefficients_df[, "Pr(>|t|)"]

# Display all p-values
print("P-values of all factors:")
[1] "P-values of all factors:"
print(p_values)
 [1] 3.283438e-12 1.086810e-03 7.781097e-04 7.382881e-01 1.925030e-03
 [6] 4.245644e-06 1.979441e-18 9.582293e-01 6.013491e-13 5.070529e-06
[11] 1.111637e-03 1.308835e-12 5.728592e-04 7.776912e-23
# Filter factors with p-values <= 0.05 (exclude intercept)
significant_factors <- rownames(coefficients_df)[p_values <= 0.05 & rownames(coefficients_df) != "(Intercept)"]

# Display names of significant factors
print("Significant factors (p <= 0.05):")
[1] "Significant factors (p <= 0.05):"
print(significant_factors)
 [1] "crim"    "zn"      "chas"    "nox"     "rm"      "dis"     "rad"    
 [8] "tax"     "ptratio" "black"   "lstat"  
# Full model
full_model <- lm(medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + black + lstat, data = Boston)
summary(full_model)

Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat, data = Boston)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5984  -2.7386  -0.5046   1.7273  26.2373 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.341145   5.067492   7.171 2.73e-12 ***
crim         -0.108413   0.032779  -3.307 0.001010 ** 
zn            0.045845   0.013523   3.390 0.000754 ***
chas          2.718716   0.854240   3.183 0.001551 ** 
nox         -17.376023   3.535243  -4.915 1.21e-06 ***
rm            3.801579   0.406316   9.356  < 2e-16 ***
dis          -1.492711   0.185731  -8.037 6.84e-15 ***
rad           0.299608   0.063402   4.726 3.00e-06 ***
tax          -0.011778   0.003372  -3.493 0.000521 ***
ptratio      -0.946525   0.129066  -7.334 9.24e-13 ***
black         0.009291   0.002674   3.475 0.000557 ***
lstat        -0.522553   0.047424 -11.019  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7348 
F-statistic: 128.2 on 11 and 494 DF,  p-value: < 2.2e-16
# Check for multicollinearity
vif_values <- vif(full_model)
print(vif_values)
    crim       zn     chas      nox       rm      dis      rad      tax 
1.789704 2.239229 1.059819 3.778011 1.834806 3.443420 6.861126 7.272386 
 ptratio    black    lstat 
1.757681 1.341559 2.581984 
# You can also filter out high VIF values (>10)
high_vif <- vif_values[vif_values > 10]
print(high_vif) # These variables may have multicollinearity issues
named numeric(0)
# Plot residuals to check for homoscedasticity (constant variance) and normality
plot(full_model) # This will show 4 diagnostic plots: residuals, QQ-plot, etc.

# Extract residuals
residuals <- residuals(full_model)

# Perform Shapiro-Wilk test for normality
shapiro_test <- shapiro.test(residuals)
print(shapiro_test) # p-value > 0.05 means residuals are normally distributed

    Shapiro-Wilk normality test

data:  residuals
W = 0.90131, p-value < 2.2e-16

1. You will choose a dataset, describe the variables in it, and give us the full / correct model (be sure to write out the estimating equation in R markdown., and pay attention to the subscripts as well). Tell us what is your key independent variable that you are interested in studying.

\[y = \beta_0 + \beta_1crim + \beta_2zn + \beta_3chas + \beta_4nox + \beta_5rm + \beta_6dis + \beta_7rad + \beta_8tax + \beta_9ptratio + \beta_10black + \beta_11lstat + \epsilon\]

The Boston dataset contains data on housing values in suburbs of Boston and includes 14 variables. Key variables include:crim-per capita crime rate by town. zn-proportion of residential land zoned for lots over 25,000 sq.ft. indus-proportion of non-retail business acres per town. chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise). nox-nitrogen oxides concentration (parts per 10 million). rm - average number of rooms per dwelling. age - proportion of owner-occupied units built prior to 1940. dis - weighted mean of distances to five Boston employment centres. rad - index of accessibility to radial highways. tax - full-value property-tax rate per $10,000. ptratio - pupil-teacher ratio by town. black - 1000(Bk−0.63)^2 where Bk is the proportion of blacks by town. lstat - lower status of the population (percent). medv - median value of owner-occupied homes in $1000s.

I’m interest in stuying lstat.

2. Now, suppose you were running the short / incorrect model where you omitted a variable “by mistake”. Write the estimating equation out as well.

\[y = \beta_0 + \beta_1crim + \beta_2zn + \beta_3chas + \beta_4nox + \beta_5dis + \beta_6rad + \beta_7tax + \beta_8ptratio + \beta_9black + \beta_10lstat + \epsilon\]

In the real world, you do not miss a variable due to mistake, but rather do not have the entire “data”. Otherwise of course you would run the full model.

3. From the OVB formula, tell us whether the omitted variable will cause bias or not i.e. are the two conditions for OVB met or not?

1) Be sure to list out the 2 conditions for OVB explicitly and translate it for your example.

2) You can run the correlation. functions in R to show if the two conditions are met -

-See if the correlation between omitted variable and y is statistically significant.
-See if the correlation between omitted variable and key x is statistically significant.

OVB occurs if two conditions are met 1.The omitted variable is correlated with the dependent variable 2.The omitted variable is correlated with one or more of the included independent variables. We need to if rm correlated with medv and other variabes like lstat

# Correlation between omitted variable 'rm' and dependent variable 'medv'
cor_rm_medv <- cor(Boston$rm, Boston$medv)
cat("Correlation between rm and medv:", cor_rm_medv, "\n")
Correlation between rm and medv: 0.6953599 
# Correlation between omitted variable 'rm' and key independent variable 'lstat'
cor_rm_lstat <- cor(Boston$rm, Boston$lstat)
cat("Correlation between rm and lstat:", cor_rm_lstat, "\n")
Correlation between rm and lstat: -0.6138083 

Seems both correlations are significantly different from 0, two conditions for OVB are met. Omitted variable bias (OVB) will cause bias in the estimation of the effect of lstat on medv.

4. Furthermore, OVB will be in what direction (positive/negative bias) ? Which case/cell in the 2 by 2 matrix that lists the 2 OVB conditions?

Correlation between rm and medv: 0.6953599 : the omitted variable rm is a determinant of medv (stront positive correlation) Correlation between rm and lstat: -0.6138083 : omitting rm will cause bias in the estimation of the effect of lstat on medv (strong negative correlation)

Since rm has a positive correlation with medv and a negative correlation with lstat, the omission of rm will lead to an inflated negative estimate for lstat in the short model. This means that the bias will be negative, in other words, the estimated effect of lstat on medv will be more negative than the true effect.

Correlation of omitted variable with y: positive, Negative Bias Correlation of omitted variable with key x: Negative

5. Show the two regressions side by side (you can use stargazer command) and confirm the bias is in the direction OVB formula predicted.

# Full model (with rm)
full_model <- lm(medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + black + lstat, data = Boston)

# Short model (without rm)
short_model <- lm(medv ~ crim + zn + chas + nox + dis + rad + tax + ptratio + black + lstat, data = Boston)

# Show both models side-by-side
stargazer(full_model, short_model, type = "text", 
          title = "Comparison of Full and Short Models")

Comparison of Full and Short Models
=======================================================================
                                    Dependent variable:                
                    ---------------------------------------------------
                                           medv                        
                               (1)                       (2)           
-----------------------------------------------------------------------
crim                        -0.108***                 -0.115***        
                             (0.033)                   (0.036)         
                                                                       
zn                          0.046***                  0.064***         
                             (0.014)                   (0.014)         
                                                                       
chas                        2.719***                  2.985***         
                             (0.854)                   (0.925)         
                                                                       
nox                        -17.376***                -19.678***        
                             (3.535)                   (3.823)         
                                                                       
rm                          3.802***                                   
                             (0.406)                                   
                                                                       
dis                         -1.493***                 -1.826***        
                             (0.186)                   (0.198)         
                                                                       
rad                         0.300***                  0.406***         
                             (0.063)                   (0.068)         
                                                                       
tax                         -0.012***                 -0.016***        
                             (0.003)                   (0.004)         
                                                                       
ptratio                     -0.947***                 -1.141***        
                             (0.129)                   (0.138)         
                                                                       
black                       0.009***                   0.007**         
                             (0.003)                   (0.003)         
                                                                       
lstat                       -0.523***                 -0.752***        
                             (0.047)                   (0.044)         
                                                                       
Constant                    36.341***                 70.451***        
                             (5.067)                   (3.815)         
                                                                       
-----------------------------------------------------------------------
Observations                   506                       506           
R2                            0.741                     0.695          
Adjusted R2                   0.735                     0.688          
Residual Std. Error     4.736 (df = 494)          5.134 (df = 495)     
F Statistic         128.206*** (df = 11; 494) 112.589*** (df = 10; 495)
=======================================================================
Note:                                       *p<0.1; **p<0.05; ***p<0.01

The coefficient for lstat in the short model is more negative (-0.752) than in the full model (-0.523), confirming negative bias due to the omission of rm. The R-squared value drops from 0.741 in the full model to 0.695 in the short model, indicating reduced model fit. All coefficients are statistically significant at the 1% level, and the F-statistic is significant in both models, showing at least one predictor is related to medv. The omission of rm leads to an exaggerated negative effect of lstat on medv, highlighting how failing to include relevant variables can distort estimates. Thus we confirming the presence of negative bias as predicted by the OVB formula. The results align with the expected direction of bias.

6. Try to provide some intuition to why does OVB formula work / bias your results in the example in a certain direction.

The OVB formula works because it captures the interplay between omitted variables and the relationships they have with both the dependent and independent variables. In this case, omitting rm distorts the estimated effect of lstat on medv, leading to a stronger negative coefficient than is actually present, thereby demonstrating how important it is to include all relevant variables in a regression model to obtain accurate and unbiased estimates.

ADVANCED BONUS QUESTION (for deeper understanding of OVB) -

1. Try to add/exclude a variable in your multivariate regression that does not impact y (is uncorrelated with y) but is correlated with the key x variable, and show that your point estimate will not change (significantly).

We’ll add the variable age, which represents the proportion of owner-occupied units built prior to 1940, to our model.

# Check correlation between tax and lstat
cor_age_lstat <- cor(Boston$age, Boston$lstat)
cat("Correlation between tax and lstat:", cor_age_lstat, "\n")
Correlation between tax and lstat: 0.6023385 
# Check correlation between tax and medv
cor_age_medv <- cor(Boston$age, Boston$medv)
cat("Correlation between tax and medv:", cor_age_medv, "\n")
Correlation between tax and medv: -0.3769546 
# Full model without age
model_without_age <- lm(medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + black + lstat, data = Boston)
summary(model_without_age)

Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat, data = Boston)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5984  -2.7386  -0.5046   1.7273  26.2373 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.341145   5.067492   7.171 2.73e-12 ***
crim         -0.108413   0.032779  -3.307 0.001010 ** 
zn            0.045845   0.013523   3.390 0.000754 ***
chas          2.718716   0.854240   3.183 0.001551 ** 
nox         -17.376023   3.535243  -4.915 1.21e-06 ***
rm            3.801579   0.406316   9.356  < 2e-16 ***
dis          -1.492711   0.185731  -8.037 6.84e-15 ***
rad           0.299608   0.063402   4.726 3.00e-06 ***
tax          -0.011778   0.003372  -3.493 0.000521 ***
ptratio      -0.946525   0.129066  -7.334 9.24e-13 ***
black         0.009291   0.002674   3.475 0.000557 ***
lstat        -0.522553   0.047424 -11.019  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7348 
F-statistic: 128.2 on 11 and 494 DF,  p-value: < 2.2e-16
# Full model with age
model_with_age <- lm(medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + black + lstat + age, data = Boston)
summary(model_with_age)

Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat + age, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.587  -2.737  -0.506   1.742  26.212 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.636e+01  5.091e+00   7.143 3.30e-12 ***
crim        -1.084e-01  3.281e-02  -3.304 0.001022 ** 
zn           4.593e-02  1.364e-02   3.368 0.000816 ***
chas         2.716e+00  8.562e-01   3.173 0.001605 ** 
nox         -1.743e+01  3.681e+00  -4.735 2.87e-06 ***
rm           3.797e+00  4.158e-01   9.132  < 2e-16 ***
dis         -1.490e+00  1.948e-01  -7.648 1.08e-13 ***
rad          2.999e-01  6.367e-02   4.710 3.22e-06 ***
tax         -1.178e-02  3.378e-03  -3.489 0.000529 ***
ptratio     -9.471e-01  1.296e-01  -7.308 1.10e-12 ***
black        9.282e-03  2.682e-03   3.461 0.000586 ***
lstat       -5.235e-01  5.052e-02 -10.361  < 2e-16 ***
age          6.971e-04  1.320e-02   0.053 0.957898    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.741 on 493 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7343 
F-statistic: 117.3 on 12 and 493 DF,  p-value: < 2.2e-16
# Compare the models side by side
stargazer(model_without_age, model_with_age, type = "text", title = "Comparison of Models With and Without Age")

Comparison of Models With and Without Age
=======================================================================
                                    Dependent variable:                
                    ---------------------------------------------------
                                           medv                        
                               (1)                       (2)           
-----------------------------------------------------------------------
crim                        -0.108***                 -0.108***        
                             (0.033)                   (0.033)         
                                                                       
zn                          0.046***                  0.046***         
                             (0.014)                   (0.014)         
                                                                       
chas                        2.719***                  2.716***         
                             (0.854)                   (0.856)         
                                                                       
nox                        -17.376***                -17.430***        
                             (3.535)                   (3.681)         
                                                                       
rm                          3.802***                  3.797***         
                             (0.406)                   (0.416)         
                                                                       
dis                         -1.493***                 -1.490***        
                             (0.186)                   (0.195)         
                                                                       
rad                         0.300***                  0.300***         
                             (0.063)                   (0.064)         
                                                                       
tax                         -0.012***                 -0.012***        
                             (0.003)                   (0.003)         
                                                                       
ptratio                     -0.947***                 -0.947***        
                             (0.129)                   (0.130)         
                                                                       
black                       0.009***                  0.009***         
                             (0.003)                   (0.003)         
                                                                       
lstat                       -0.523***                 -0.523***        
                             (0.047)                   (0.051)         
                                                                       
age                                                     0.001          
                                                       (0.013)         
                                                                       
Constant                    36.341***                 36.364***        
                             (5.067)                   (5.091)         
                                                                       
-----------------------------------------------------------------------
Observations                   506                       506           
R2                            0.741                     0.741          
Adjusted R2                   0.735                     0.734          
Residual Std. Error     4.736 (df = 494)          4.741 (df = 493)     
F Statistic         128.206*** (df = 11; 494) 117.285*** (df = 12; 493)
=======================================================================
Note:                                       *p<0.1; **p<0.05; ***p<0.01

In the regression analysis, we added the variable age (which is uncorrelated with medv but correlated with lstat). The coefficient for lstat remained at -0.523 in both models (with and without age), indicating that the inclusion of age did not significantly change the point estimate for lstat. This confirms that adding a variable that does not impact the dependent variable does not affect the estimates of other variables.

2. Try to add/exclude a variable in your multivariate regression that impacts y (is correlated with y) but is not correlated with the key x variable, and show that your point estimate will not change (significantly).

# Check correlation between nox and lstat
cor_nox_lstat <- cor(Boston$nox, Boston$lstat)
cat("Correlation between nox and lstat:", cor_nox_lstat, "\n")
Correlation between nox and lstat: 0.5908789 
# Full model without nox
model_without_nox <- lm(medv ~ crim + zn + chas + rm + dis + rad + tax + ptratio + black + lstat, data = Boston)
summary(model_without_nox)

Call:
lm(formula = medv ~ crim + zn + chas + rm + dis + rad + tax + 
    ptratio + black + lstat, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-16.980  -2.869  -0.693   1.732  26.674 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22.605365   4.324998   5.227 2.55e-07 ***
crim        -0.096497   0.033446  -2.885 0.004082 ** 
zn           0.052811   0.013759   3.838 0.000140 ***
chas         2.380299   0.871150   2.732 0.006513 ** 
rm           3.940596   0.414703   9.502  < 2e-16 ***
dis         -1.054766   0.166731  -6.326 5.63e-10 ***
rad          0.282595   0.064772   4.363 1.56e-05 ***
tax         -0.015723   0.003351  -4.692 3.51e-06 ***
ptratio     -0.756520   0.125988  -6.005 3.71e-09 ***
black        0.010239   0.002729   3.753 0.000196 ***
lstat       -0.570699   0.047475 -12.021  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.846 on 495 degrees of freedom
Multiple R-squared:  0.7279,    Adjusted R-squared:  0.7224 
F-statistic: 132.4 on 10 and 495 DF,  p-value: < 2.2e-16
# Full model with nox
model_with_nox <- lm(medv ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + black + lstat, data = Boston)
summary(model_with_nox)

Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + black + lstat, data = Boston)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5984  -2.7386  -0.5046   1.7273  26.2373 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.341145   5.067492   7.171 2.73e-12 ***
crim         -0.108413   0.032779  -3.307 0.001010 ** 
zn            0.045845   0.013523   3.390 0.000754 ***
chas          2.718716   0.854240   3.183 0.001551 ** 
nox         -17.376023   3.535243  -4.915 1.21e-06 ***
rm            3.801579   0.406316   9.356  < 2e-16 ***
dis          -1.492711   0.185731  -8.037 6.84e-15 ***
rad           0.299608   0.063402   4.726 3.00e-06 ***
tax          -0.011778   0.003372  -3.493 0.000521 ***
ptratio      -0.946525   0.129066  -7.334 9.24e-13 ***
black         0.009291   0.002674   3.475 0.000557 ***
lstat        -0.522553   0.047424 -11.019  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7348 
F-statistic: 128.2 on 11 and 494 DF,  p-value: < 2.2e-16
# Compare the models side by side
stargazer(model_without_nox, model_with_nox, type = "text", title = "Comparison of Models With and Without NOx")

Comparison of Models With and Without NOx
=======================================================================
                                    Dependent variable:                
                    ---------------------------------------------------
                                           medv                        
                               (1)                       (2)           
-----------------------------------------------------------------------
crim                        -0.096***                 -0.108***        
                             (0.033)                   (0.033)         
                                                                       
zn                          0.053***                  0.046***         
                             (0.014)                   (0.014)         
                                                                       
chas                        2.380***                  2.719***         
                             (0.871)                   (0.854)         
                                                                       
nox                                                  -17.376***        
                                                       (3.535)         
                                                                       
rm                          3.941***                  3.802***         
                             (0.415)                   (0.406)         
                                                                       
dis                         -1.055***                 -1.493***        
                             (0.167)                   (0.186)         
                                                                       
rad                         0.283***                  0.300***         
                             (0.065)                   (0.063)         
                                                                       
tax                         -0.016***                 -0.012***        
                             (0.003)                   (0.003)         
                                                                       
ptratio                     -0.757***                 -0.947***        
                             (0.126)                   (0.129)         
                                                                       
black                       0.010***                  0.009***         
                             (0.003)                   (0.003)         
                                                                       
lstat                       -0.571***                 -0.523***        
                             (0.047)                   (0.047)         
                                                                       
Constant                    22.605***                 36.341***        
                             (4.325)                   (5.067)         
                                                                       
-----------------------------------------------------------------------
Observations                   506                       506           
R2                            0.728                     0.741          
Adjusted R2                   0.722                     0.735          
Residual Std. Error     4.846 (df = 495)          4.736 (df = 494)     
F Statistic         132.416*** (df = 10; 495) 128.206*** (df = 11; 494)
=======================================================================
Note:                                       *p<0.1; **p<0.05; ***p<0.01

In this scenario, we added the variable nox (which impacts medv but has a correlation of 0.5909 with lstat). The coefficient for lstat changed slightly from -0.5707 (without nox) to -0.5230 (with nox). Although the point estimate for lstat changed, it was not a substantial difference, indicating that adding a variable correlated with the dependent variable but not strongly with lstat has a minor effect on the point estimate of lstat.