prpblck
and income
:# Load the wooldridge package
library(wooldridge)
# Load the DISCRIM dataset
data("discrim")
avg_prpblck <- mean(discrim$prpblck)
sd_prpblck <- sd(discrim$prpblck)
avg_income <- mean(discrim$income)
sd_income <- sd(discrim$income)
cat("Average of prpblck:", avg_prpblck, "\n")
## Average of prpblck: NA
cat("Standard deviation of prpblck:", sd_prpblck, "\n")
## Standard deviation of prpblck: NA
cat("Average of income:", avg_income, "\n")
## Average of income: NA
cat("Standard deviation of income:", sd_income, "\n")
## Standard deviation of income: NA
The code calculates the mean and standard deviation of
prpblck
(proportion of the population that is Black) and
income
(median household income in ZIP code).
model_ii <- lm(psoda ~ prpblck + income, data = discrim)
summary(model_ii)
##
## Call:
## lm(formula = psoda ~ prpblck + income, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29401 -0.05242 0.00333 0.04231 0.44322
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.563e-01 1.899e-02 50.354 < 2e-16 ***
## prpblck 1.150e-01 2.600e-02 4.423 1.26e-05 ***
## income 1.603e-06 3.618e-07 4.430 1.22e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08611 on 398 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.06422, Adjusted R-squared: 0.05952
## F-statistic: 13.66 on 2 and 398 DF, p-value: 1.835e-06
coef_prpblck <- coef(model_ii)[2]
cat("Coefficient on prpblck:", coef_prpblck, "\n")
## Coefficient on prpblck: 0.1149882
The code estimates the model psoda=β0+β1⋅prpblck+β2⋅income+upsoda=β0+β1⋅prpblck+β2⋅income+u using ordinary least squares (OLS) regression.
The summary()
function outputs the regression
results, including the coefficients β1 and β2, as well as R^2 and other
statistics.
psoda
on prpblck
:model_iii <- lm(psoda ~ prpblck, data = discrim)
summary(model_iii)
##
## Call:
## lm(formula = psoda ~ prpblck, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30884 -0.05963 0.01135 0.03206 0.44840
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03740 0.00519 199.87 < 2e-16 ***
## prpblck 0.06493 0.02396 2.71 0.00702 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0881 on 399 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.01808, Adjusted R-squared: 0.01561
## F-statistic: 7.345 on 1 and 399 DF, p-value: 0.007015
Here, a simple regression of psoda
on
prpblck
is run, and the output is compared to the multiple
regression from part (ii). This helps assess the difference in the
effect of prpblck
when controlling for
income
.
model_iv <- lm(log(psoda) ~ prpblck + log(income), data = discrim)
summary(model_iv)
##
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income), data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33563 -0.04695 0.00658 0.04334 0.35413
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.79377 0.17943 -4.424 1.25e-05 ***
## prpblck 0.12158 0.02575 4.722 3.24e-06 ***
## log(income) 0.07651 0.01660 4.610 5.43e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0821 on 398 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.06809, Adjusted R-squared: 0.06341
## F-statistic: 14.54 on 2 and 398 DF, p-value: 8.039e-07
coef_prpblck_log <- coef(model_iv)[2]
percentage_change <- coef_prpblck_log * 0.20 * 100
cat("Estimated percentage change in psoda:", percentage_change, "%", "\n")
## Estimated percentage change in psoda: 2.431605 %
The model log(psoda)=β0+β1⋅prpblck+β2⋅log(income)+u is estimated using OLS.
The code also calculates the estimated percentage change in
psoda
for a 20-percentage-point increase in
prpblck
based on the coefficient β1
prppov
to the regression:model_v <- lm(log(psoda) ~ prpblck + log(income) + prppov, data = discrim)
summary(model_v)
##
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income) + prppov, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32218 -0.04648 0.00651 0.04272 0.35622
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.46333 0.29371 -4.982 9.4e-07 ***
## prpblck 0.07281 0.03068 2.373 0.0181 *
## log(income) 0.13696 0.02676 5.119 4.8e-07 ***
## prppov 0.38036 0.13279 2.864 0.0044 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08137 on 397 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.08696, Adjusted R-squared: 0.08006
## F-statistic: 12.6 on 3 and 397 DF, p-value: 6.917e-08
The regression from part (iv) is extended by adding
prppov
(proportion of the population in poverty). This
examines how the inclusion of prppov
affects the
coefficient on prpblck
.
log(income)
and
prppov
:correlation <- cor(log(discrim$income), discrim$prppov)
cat("Correlation between log(income) and prppov:", correlation, "\n")
## Correlation between log(income) and prppov: NA
The correlation between the logged value of income
and
prppov
is computed to check for potential multicollinearity
between these two variables.
cat("Since the correlation is", correlation, ", multicollinearity might be a concern depending on how high it is.\n")
## Since the correlation is NA , multicollinearity might be a concern depending on how high it is.
A brief interpretation is provided based on the correlation between
log(income)
and prppov
. High correlation
suggests possible multicollinearity, which can distort the results of
the regression.