Submit: The knitted HTML output or .Rmd file
Is the change in log(net_assets) after 2022 different for systemic banks compared to non-systemic banks?
Let:
Estimate:
\[ y_{it} = α + β_1 treated_i + β_2 post_t + δ (treated_i × post_t) + u_{it} \]
df <- read.csv("/Users/illia/Downloads/PS1.csv")
head(df)
## nkb bank ROA loans_corp loans_hh net_assets
## 1 46 АТ КБ "ПриватБанк" 4.292320e+00 14502070 42302983 386939574.0
## 2 6 АТ "Ощадбанк" 1.033015e+00 54311100 8914977 235722486.5
## 3 2 АТ "Укрексімбанк" 1.669166e-05 48449783 141701 192835491.9
## 4 274 АБ "УКРГАЗБАНК" 2.710253e-01 44330831 3186752 142780022.3
## 5 593 ПАТ "РОЗРАХУНКОВИЙ ЦЕНТР" 5.368773e-01 0 0 384649.6
## 6 36 АТ "Райффайзен Банк Аваль" 3.586104e+00 39976357 6038606 111547920.0
## ownership GB_TNA L_TNA L_NFC_TNA L_HH_TNA D_TNA TNA
## 1 1 50.47218 14.68060 3.74789 10.93271028 79.99600 386939574.0
## 2 1 44.58017 26.82225 23.04027 3.78197993 78.43384 235722486.5
## 3 1 30.63463 25.19841 25.12493 0.07348282 59.45002 192835491.9
## 4 1 29.98680 33.28027 31.04834 2.23193115 88.64743 142780022.3
## 5 1 30.36400 0.00000 0.00000 0.00000000 25.62168 384649.6
## 6 2 10.57642 41.25130 35.83783 5.41346396 79.03632 111547920.0
## TNA_p CIR sys year gdp
## 1 21.3174614 35.00666 1 2020 -3.8
## 2 12.9865368 74.84880 1 2020 -3.8
## 3 10.6237858 NA 1 2020 -3.8
## 4 7.8661058 65.73606 1 2020 -3.8
## 5 0.0211913 101.73284 0 2020 -3.8
## 6 6.1454517 51.79713 1 2020 -3.8
df <- df %>%
mutate(nkb = as.character(nkb),
year = as.integer(year),
sys = as.integer(sys),
post = as.integer(year >= 2022),
treated = sys,
#add net
ln_net_assets=log(net_assets)
)%>%
arrange(nkb, year)
describe(df)
## vars n mean sd median trimmed
## nkb* 1 335 36.91 21.36 37.00 36.88
## bank* 2 335 43.54 25.31 44.00 43.57
## ROA 3 335 0.79 3.05 0.91 1.07
## loans_corp 4 335 8044764.86 16219438.56 1053898.03 3743779.09
## loans_hh 5 335 2581825.26 8175652.63 48697.62 665238.09
## net_assets 6 335 37575226.57 89277782.81 5156326.64 16042671.76
## ownership 7 335 2.59 0.65 3.00 2.72
## GB_TNA 8 335 19.38 18.18 14.84 16.82
## L_TNA 9 335 29.12 18.59 27.81 28.10
## L_NFC_TNA 10 335 23.79 16.89 23.08 22.53
## L_HH_TNA 11 335 5.33 12.21 0.98 2.28
## D_TNA 12 335 67.88 21.06 75.00 71.35
## TNA 13 335 37575226.57 89277782.81 5156326.64 16042671.76
## TNA_p 14 335 1.50 3.42 0.23 0.66
## CIR 15 334 79.19 60.53 69.62 70.56
## sys 16 335 0.24 0.43 0.00 0.17
## year 17 335 2021.90 1.41 2022.00 2021.88
## gdp 18 335 -4.31 12.67 2.90 -2.50
## post 19 335 0.57 0.50 1.00 0.59
## treated 20 335 0.24 0.43 0.00 0.17
## ln_net_assets 21 335 15.73 1.88 15.46 15.66
## mad min max range skew kurtosis
## nkb* 28.17 1.00 73.00 72.00 0.01 -1.23
## bank* 32.62 1.00 87.00 86.00 0.00 -1.22
## ROA 1.27 -21.09 13.40 34.49 -2.00 12.28
## loans_corp 1463850.97 0.00 90004245.09 90004245.09 2.76 7.48
## loans_hh 72199.09 0.00 79836630.95 79836630.95 5.58 37.83
## net_assets 6412014.43 183562.69 771835030.30 771651467.61 4.56 26.33
## ownership 0.00 1.00 3.00 2.00 -1.32 0.49
## GB_TNA 16.47 0.00 87.39 87.39 1.14 0.88
## L_TNA 19.81 0.00 79.11 79.11 0.43 -0.42
## L_NFC_TNA 18.08 0.00 73.66 73.66 0.58 -0.20
## L_HH_TNA 1.45 0.00 78.93 78.93 3.88 15.84
## D_TNA 14.34 0.00 96.72 96.72 -1.41 1.41
## TNA 6412014.43 183562.69 771835030.30 771651467.61 4.56 26.33
## TNA_p 0.28 0.01 23.39 23.39 4.06 19.31
## CIR 25.86 13.02 674.89 661.87 6.09 52.20
## sys 0.00 0.00 1.00 1.00 1.22 -0.51
## year 1.48 2020.00 2024.00 4.00 0.09 -1.30
## gdp 3.85 -28.80 5.50 34.30 -1.28 -0.09
## post 0.00 0.00 1.00 1.00 -0.28 -1.93
## treated 0.00 0.00 1.00 1.00 1.22 -0.51
## ln_net_assets 1.87 12.12 20.46 8.34 0.36 -0.61
## se
## nkb* 1.17
## bank* 1.38
## ROA 0.17
## loans_corp 886162.59
## loans_hh 446683.62
## net_assets 4877766.34
## ownership 0.04
## GB_TNA 0.99
## L_TNA 1.02
## L_NFC_TNA 0.92
## L_HH_TNA 0.67
## D_TNA 1.15
## TNA 4877766.34
## TNA_p 0.19
## CIR 3.31
## sys 0.02
## year 0.08
## gdp 0.69
## post 0.03
## treated 0.02
## ln_net_assets 0.10
df_table <- df %>%
mutate(nkb = as.character(nkb),
year = as.integer(year),
sys = as.integer(sys),
post = as.integer(year >= 2022),
treated = sys,
#add net
ln_net_assets=log(net_assets)
)%>%
arrange(nkb, year)
write.csv(df_table, "HW1_Prepared_Data.csv", row.names = FALSE)
model_did <- lm(ln_net_assets ~ treated + post + treated:post, data = df)
summary(model_did)
##
## Call:
## lm(formula = ln_net_assets ~ treated + post + treated:post, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0579 -0.8007 0.0591 0.8225 3.2493
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.8542 0.1182 125.639 <2e-16 ***
## treated 3.1072 0.2508 12.389 <2e-16 ***
## post 0.1598 0.1579 1.012 0.312
## treated:post 0.2712 0.3263 0.831 0.406
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.251 on 331 degrees of freedom
## Multiple R-squared: 0.5597, Adjusted R-squared: 0.5557
## F-statistic: 140.3 on 3 and 331 DF, p-value: < 2.2e-16
We now compare pooled OLS, between, and first-difference estimators.
data("wagepan", package = "wooldridge")
df_w <- wagepan
\[ y_{it} = \beta_0 + \beta_1 educ_{it} + \beta_2 exper_{it} +\beta_3 hours_{it}+ a_i + u_{it} \] where:
df_b <- read.csv("/Users/illia/Downloads/wagepan.csv")
head(df_b)
## nr year agric black bus construc ent exper fin hisp poorhlth hours manuf
## 1 13 1980 0 0 1 0 0 1 0 0 0 2672 0
## 2 13 1981 0 0 0 0 0 2 0 0 0 2320 0
## 3 13 1982 0 0 1 0 0 3 0 0 0 2940 0
## 4 13 1983 0 0 1 0 0 4 0 0 0 2960 0
## 5 13 1984 0 0 0 0 0 5 0 0 0 3071 0
## 6 13 1985 0 0 1 0 0 6 0 0 0 2864 0
## married min nrthcen nrtheast occ1 occ2 occ3 occ4 occ5 occ6 occ7 occ8 occ9 per
## 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0
## 2 0 0 0 1 0 0 0 0 0 0 0 0 1 1
## 3 0 0 0 1 0 0 0 0 0 0 0 0 1 0
## 4 0 0 0 1 0 0 0 0 0 0 0 0 1 0
## 5 0 0 0 1 0 0 0 0 1 0 0 0 0 1
## 6 0 0 0 1 0 1 0 0 0 0 0 0 0 0
## pro pub rur south educ tra trad union lwage d81 d82 d83 d84 d85 d86 d87
## 1 0 0 0 0 14 0 0 0 1.197540 0 0 0 0 0 0 0
## 2 0 0 0 0 14 0 0 1 1.853060 1 0 0 0 0 0 0
## 3 0 0 0 0 14 0 0 0 1.344462 0 1 0 0 0 0 0
## 4 0 0 0 0 14 0 0 0 1.433213 0 0 1 0 0 0 0
## 5 0 0 0 0 14 0 0 0 1.568125 0 0 0 1 0 0 0
## 6 0 0 0 0 14 0 0 0 1.699891 0 0 0 0 1 0 0
## expersq
## 1 1
## 2 4
## 3 9
## 4 16
## 5 25
## 6 36
# 1. Pooled OLS
model_pooled <- plm(lwage ~ educ + exper + hours, data = df_w, model = "pooling")
# 2. Between Estimator
model_between <- plm(lwage ~ educ + exper + hours, data = df_w, model = "between")
# 3. First Differences
# Зверни увагу: educ тут може зникнути, якщо вона не змінюється з часом!
model_fd <- plm(lwage ~ educ + exper + hours, data = df_w, model = "fd")
# 4. Звіт (stargazer)
stargazer(model_pooled, model_between, model_fd, type = "text")
##
## ========================================================================================
## Dependent variable:
## ---------------------------------------------------------------------------
## lwage
## (1) (2) (3)
## ----------------------------------------------------------------------------------------
## educ 0.110*** 0.097***
## (0.005) (0.011)
##
## exper 0.059*** 0.036***
## (0.003) (0.012)
##
## hours -0.0001*** 0.00000 -0.0002***
## (0.00001) (0.00004) (0.00001)
##
## Constant 0.098 0.269 0.080***
## (0.066) (0.199) (0.007)
##
## ----------------------------------------------------------------------------------------
## Observations 4,360 545 3,815
## R2 0.146 0.134 0.059
## Adjusted R2 0.146 0.129 0.058
## F Statistic 248.918*** (df = 3; 4356) 27.938*** (df = 3; 541) 237.729*** (df = 1; 3813)
## ========================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
m_be <- plm(lwage ~ educ + exper + hours,
data=df_w,
index = c("nr","year"),
model="between")
summary(m_be)
## Oneway (individual) effect Between Model
##
## Call:
## plm(formula = lwage ~ educ + exper + hours, data = df_w, model = "between",
## index = c("nr", "year"))
##
## Balanced Panel: n = 545, T = 8, N = 4360
## Observations used in estimation: 545
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.20409033 -0.24853220 0.00038719 0.25961118 1.67324865
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 2.6861e-01 1.9857e-01 1.3527 0.176716
## educ 9.6904e-02 1.1007e-02 8.8042 < 2.2e-16 ***
## exper 3.6441e-02 1.1640e-02 3.1307 0.001838 **
## hours 1.3105e-06 4.1122e-05 0.0319 0.974589
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 83.06
## Residual Sum of Squares: 71.918
## R-Squared: 0.13414
## Adj. R-Squared: 0.12934
## F-statistic: 27.9383 on 3 and 541 DF, p-value: < 2.22e-16
m_fd <- plm(
lwage ~ educ + exper + hours,
data = df_w,
index = c("nr", "year"),
model = "fd"
)
summary(m_fd)
## Oneway (individual) effect First-Difference Model
##
## Call:
## plm(formula = lwage ~ educ + exper + hours, data = df_w, model = "fd",
## index = c("nr", "year"))
##
## Balanced Panel: n = 545, T = 8, N = 4360
## Observations used in estimation: 3815
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -4.534461 -0.135898 -0.024686 0.114997 4.661021
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 8.0454e-02 7.0220e-03 11.457 < 2.2e-16 ***
## hours -2.2272e-04 1.4445e-05 -15.418 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 751.19
## Residual Sum of Squares: 707.11
## R-Squared: 0.058688
## Adj. R-Squared: 0.058441
## F-statistic: 237.729 on 1 and 3813 DF, p-value: < 2.22e-16
stargazer and report all regressions# 1. Завантажуємо пакет (якщо він ще не завантажений)
library(stargazer)
# 2. Виводимо порівняльну таблицю
stargazer(model_pooled, m_be, model_fd,
type = "text",
column.labels = c("Pooled OLS", "Between", "First Diff"),
title = "Comparison of Panel Estimators for Wages",
digits = 4)
##
## Comparison of Panel Estimators for Wages
## ===========================================================================================
## Dependent variable:
## ------------------------------------------------------------------------------
## lwage
## Pooled OLS Between First Diff
## (1) (2) (3)
## -------------------------------------------------------------------------------------------
## educ 0.1097*** 0.0969***
## (0.0046) (0.0110)
##
## exper 0.0591*** 0.0364***
## (0.0029) (0.0116)
##
## hours -0.0001*** 0.000001 -0.0002***
## (0.00001) (0.00004) (0.00001)
##
## Constant 0.0977 0.2686 0.0805***
## (0.0657) (0.1986) (0.0070)
##
## -------------------------------------------------------------------------------------------
## Observations 4,360 545 3,815
## R2 0.1463 0.1341 0.0587
## Adjusted R2 0.1458 0.1293 0.0584
## F Statistic 248.9177*** (df = 3; 4356) 27.9383*** (df = 3; 541) 237.7293*** (df = 1; 3813)
## ===========================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
In a simple 2×2 Difference-in-Differences setup without additional controls, the DiD coefficient from the regression
\[ y_{it} = \alpha + \beta_1 treated_i + \beta_2 post_t + \delta(treated_i \times post_t) + u_{it} \]
must be identical (up to rounding) to the manual DiD computed from group means.
Using the dataset from Part A, compute the mean of log(net_assets) for each treatment–period cell:
Fill in the table below:
| Pre (year < 2022) | Post (year ≥ 2022) | |
|---|---|---|
| Systemic Banks (T = 1) | ||
| Non-Systemic Banks (T = 0) |
Using the values from the table above, compute:
\[ \widehat{DiD} = (\bar{Y}_{T,post} - \bar{Y}_{T,pre}) - (\bar{Y}_{C,post} - \bar{Y}_{C,pre}) \]
Estimate the DiD regression from Part A:
\[ y_{it} = \alpha + \beta_1 treated_i + \beta_2 post_t + \delta (treated_i \times post_t) + u_{it} \]
Extract the interaction coefficient \(\hat{\delta}\) and compare it to \(\widehat{DiD}\).
They should be numerically identical (up to rounding).