hw1

(i) Finding the averages and standard deviations of `prpblck` and `income`:

# Load the wooldridge package
library(wooldridge)

# Load the DISCRIM dataset
data("discrim")

avg_prpblck <- mean(discrim$prpblck)
sd_prpblck <- sd(discrim$prpblck)

avg_income <- mean(discrim$income)
sd_income <- sd(discrim$income)

cat("Average of prpblck:", avg_prpblck, "\n")

## Average of prpblck: NA

cat("Standard deviation of prpblck:", sd_prpblck, "\n")

## Standard deviation of prpblck: NA

cat("Average of income:", avg_income, "\n")

## Average of income: NA

cat("Standard deviation of income:", sd_income, "\n")

## Standard deviation of income: NA

The code calculates the mean and standard deviation of prpblck (proportion of the population that is Black) and income (median household income in ZIP code).

(ii) Estimating the model:

model_ii <- lm(psoda ~ prpblck + income, data = discrim)
summary(model_ii)

## 
## Call:
## lm(formula = psoda ~ prpblck + income, data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29401 -0.05242  0.00333  0.04231  0.44322 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.563e-01  1.899e-02  50.354  < 2e-16 ***
## prpblck     1.150e-01  2.600e-02   4.423 1.26e-05 ***
## income      1.603e-06  3.618e-07   4.430 1.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08611 on 398 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.06422,    Adjusted R-squared:  0.05952 
## F-statistic: 13.66 on 2 and 398 DF,  p-value: 1.835e-06

coef_prpblck <- coef(model_ii)[2]
cat("Coefficient on prpblck:", coef_prpblck, "\n")

## Coefficient on prpblck: 0.1149882

The code estimates the model psoda=β0+β1⋅prpblck+β2⋅income+upsoda=β0+β1⋅prpblck+β2⋅income+u using ordinary least squares (OLS) regression.

The summary() function outputs the regression results, including the coefficients β1 and β2, as well as R^2 and other statistics.

(iii) Comparing the estimate from part (ii) with a simple regression of `psoda` on `prpblck`:

model_iii <- lm(psoda ~ prpblck, data = discrim)
summary(model_iii)

## 
## Call:
## lm(formula = psoda ~ prpblck, data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30884 -0.05963  0.01135  0.03206  0.44840 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03740    0.00519  199.87  < 2e-16 ***
## prpblck      0.06493    0.02396    2.71  0.00702 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0881 on 399 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.01808,    Adjusted R-squared:  0.01561 
## F-statistic: 7.345 on 1 and 399 DF,  p-value: 0.007015

Here, a simple regression of psoda on prpblck is run, and the output is compared to the multiple regression from part (ii). This helps assess the difference in the effect of prpblck when controlling for income.

(iv) Estimating a log-linear model:

model_iv <- lm(log(psoda) ~ prpblck + log(income), data = discrim)
summary(model_iv)

## 
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income), data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33563 -0.04695  0.00658  0.04334  0.35413 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.79377    0.17943  -4.424 1.25e-05 ***
## prpblck      0.12158    0.02575   4.722 3.24e-06 ***
## log(income)  0.07651    0.01660   4.610 5.43e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0821 on 398 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.06809,    Adjusted R-squared:  0.06341 
## F-statistic: 14.54 on 2 and 398 DF,  p-value: 8.039e-07

coef_prpblck_log <- coef(model_iv)[2]
percentage_change <- coef_prpblck_log * 0.20 * 100
cat("Estimated percentage change in psoda:", percentage_change, "%", "\n")

## Estimated percentage change in psoda: 2.431605 %

The model log⁡(psoda)=β0+β1⋅prpblck+β2⋅log⁡(income)+u is estimated using OLS.
The code also calculates the estimated percentage change in psoda for a 20-percentage-point increase in prpblck based on the coefficient β1

(v) Adding `prppov` to the regression:

model_v <- lm(log(psoda) ~ prpblck + log(income) + prppov, data = discrim)
summary(model_v)

## 
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income) + prppov, data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32218 -0.04648  0.00651  0.04272  0.35622 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.46333    0.29371  -4.982  9.4e-07 ***
## prpblck      0.07281    0.03068   2.373   0.0181 *  
## log(income)  0.13696    0.02676   5.119  4.8e-07 ***
## prppov       0.38036    0.13279   2.864   0.0044 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08137 on 397 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.08696,    Adjusted R-squared:  0.08006 
## F-statistic:  12.6 on 3 and 397 DF,  p-value: 6.917e-08

The regression from part (iv) is extended by adding prppov (proportion of the population in poverty). This examines how the inclusion of prppov affects the coefficient on prpblck.

(vi) Correlation between `log(income)` and `prppov`:

correlation <- cor(log(discrim$income), discrim$prppov)
cat("Correlation between log(income) and prppov:", correlation, "\n")

## Correlation between log(income) and prppov: NA

The correlation between the logged value of income and prppov is computed to check for potential multicollinearity between these two variables.

(vii) Evaluating multicollinearity:

cat("Since the correlation is", correlation, ", multicollinearity might be a concern depending on how high it is.\n")

## Since the correlation is NA , multicollinearity might be a concern depending on how high it is.

A brief interpretation is provided based on the correlation between log(income) and prppov. High correlation suggests possible multicollinearity, which can distort the results of the regression.

hw1

Oyuntuya_112035144

2024-10-17

(i) Finding the averages and standard deviations of `prpblck` and `income`:

(ii) Estimating the model:

(iii) Comparing the estimate from part (ii) with a simple regression of `psoda` on `prpblck`:

(iv) Estimating a log-linear model:

(v) Adding `prppov` to the regression:

(vi) Correlation between `log(income)` and `prppov`:

(vii) Evaluating multicollinearity:

hw1

Oyuntuya_112035144

2024-10-17

(i) Finding the averages and standard deviations of prpblck and income:

(ii) Estimating the model:

(iii) Comparing the estimate from part (ii) with a simple regression of psoda on prpblck:

(iv) Estimating a log-linear model:

(v) Adding prppov to the regression:

(vi) Correlation between log(income) and prppov:

(vii) Evaluating multicollinearity:

(i) Finding the averages and standard deviations of `prpblck` and `income`:

(iii) Comparing the estimate from part (ii) with a simple regression of `psoda` on `prpblck`:

(v) Adding `prppov` to the regression:

(vi) Correlation between `log(income)` and `prppov`: