The Impact of the VAT reform in India - Firm level data

0.Dataset and package

library(did)
#Open the dataset
my_data <- read_dta("final_dataset_wthtaxes.dta")

1. To estimate group-time average treatment effects

1.1 Let’s define the main variables (no controls yet)

lgross This is the log of the gross sales. This needs to be created.
VAT_intro_year_02 This variable indicates in which year the policy was implemented. Notices that I created my own version of this variable.
year This is the year and is the time variable
id This is an id number for each firm.

#Begin by selecting the variables that I am going to use, because the database is too big
my_data_01<-dplyr::select(my_data, year, id, factory_id, state_id, nic3digit, j_salegross12, j_salenet12, j_salestax12, h_valpurch12, VAT_intro_year)

#Now I need to create the dependent variable and others
my_data_02<-dplyr::mutate(my_data_01, 
                          lgross = asinh(j_salegross12), 
                          lnet = asinh(j_salenet12),
                          ltax = asinh(j_salestax12),
                          lvalue = asinh(h_valpurch12))

#I discover there is a possible mistake! The state 35 has a missing value in the variable VAT_intro_year, I am going to replace it with 0 which is what they do in the example they provide
my_data_02<-dplyr::mutate(my_data_02, 
                          VAT_intro_year_02 = ifelse(is.na(VAT_intro_year),0,VAT_intro_year))

my_data_02<-dplyr::select(my_data_02, -VAT_intro_year)

#Filter the firms with nic3digit>900 THIS IS NOT WORKING!!!!!!!!!!!!!! TRY A DIFFERENT WAY TO FILTER?
#my_data_02<-dplyr::filter(my_data_02, nic3digit<900)

#Filter the missing values
my_data_03<-na.exclude(my_data_02)

1.2 Now we can use the did package to estimate the ATT

out <- att_gt(yname = "lgross", 
              gname = "VAT_intro_year_02", 
              idname = "id",
              tname = "year",
              xformla = ~1,
              data = my_data_03,
              est_method = "reg",
              control_group = "notyettreated"  
              )

summary(out)

Call: att_gt(yname = “lgross”, tname = “year”, idname = “id”, gname = “VAT_intro_year_02”, xformla = ~1, data = my_data_03, control_group = “notyettreated”, est_method = “reg”)

Reference: Callaway, Brantly and Pedro H.C. Sant’Anna. “Difference-in-Differences with Multiple Time Periods.” Forthcoming at the Journal of Econometrics https://arxiv.org/abs/1803.09015, 2020.

Group-Time Average Treatment Effects: Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
2003 1999 0.0353 0.0723 -0.1689 0.2395
2003 2000 -0.0216 0.0486 -0.1588 0.1156
2003 2001 0.0352 0.0345 -0.0621 0.1326
2003 2002 0.0052 0.0413 -0.1113 0.1217
2003 2003 0.0733 0.0381 -0.0342 0.1808
2003 2004 0.1349 0.0483 -0.0014 0.2712
2003 2005 0.1550 0.0715 -0.0469 0.3569
2003 2006 0.2358 0.0667 0.0475 0.4242 2003 2007 0.7901 0.4917 -0.5983 2.1785
2003 2008 0.5324 0.4377 -0.7034 1.7683
2003 2009 0.5577 0.4771 -0.7896 1.9050
2003 2010 0.8250 0.6697 -1.0660 2.7159
2003 2011 0.9459 0.7127 -1.0664 2.9581
2003 2012 1.0288 0.8470 -1.3627 3.4203
2005 1999 -0.0682 0.0874 -0.3152 0.1787
2005 2000 0.0066 0.0548 -0.1483 0.1614
2005 2001 0.0230 0.0475 -0.1110 0.1571
2005 2002 0.0158 0.0407 -0.0992 0.1307
2005 2003 0.0062 0.0413 -0.1104 0.1228
2005 2004 -0.0676 0.0454 -0.1958 0.0606
2005 2005 0.0152 0.0449 -0.1114 0.1419
2005 2006 0.1136 0.0429 -0.0075 0.2348
2005 2007 0.1942 0.4266 -1.0103 1.3986
2005 2008 -0.0120 0.3006 -0.8608 0.8368
2005 2009 -0.0624 0.3183 -0.9613 0.8365
2005 2010 0.2276 0.4490 -1.0402 1.4954
2005 2011 0.1544 0.4678 -1.1664 1.4752
2005 2012 0.3250 0.6183 -1.4209 2.0709
2006 1999 0.0534 0.1047 -0.2422 0.3490
2006 2000 -0.0337 0.0733 -0.2405 0.1732
2006 2001 -0.0583 0.0587 -0.2241 0.1074
2006 2002 -0.0360 0.0510 -0.1801 0.1081
2006 2003 -0.0094 0.0486 -0.1466 0.1278
2006 2004 0.0576 0.0580 -0.1062 0.2215
2006 2005 -0.0082 0.0607 -0.1797 0.1634
2006 2006 0.1381 0.0580 -0.0256 0.3018
2006 2007 -0.0843 0.0575 -0.2465 0.0779
2006 2008 -0.3097 0.1149 -0.6341 0.0147
2006 2009 -0.4073 0.1071 -0.7097 -0.1050 2006 2010 -0.0114 0.1539 -0.4458 0.4231
2006 2011 0.0883 0.5062 -1.3409 1.5176
2006 2012 0.2539 0.4584 -1.0404 1.5483
2007 1999 0.0546 0.1681 -0.4201 0.5292
2007 2000 0.0790 0.0430 -0.0425 0.2006
2007 2001 0.0635 0.0406 -0.0512 0.1782
2007 2002 0.0406 0.0317 -0.0490 0.1302
2007 2003 0.0086 0.0312 -0.0795 0.0966
2007 2004 0.0499 0.0316 -0.0393 0.1391
2007 2005 0.0136 0.0605 -0.1572 0.1845
2007 2006 -0.4866 0.1620 -0.9442 -0.0291 2007 2007 0.2756 0.1600 -0.1760 0.7273
2007 2008 0.0652 0.0750 -0.1464 0.2769
2007 2009 0.0201 0.1190 -0.3160 0.3562
2007 2010 0.0824 0.3049 -0.7786 0.9434
2007 2011 0.2331 0.5114 -1.2109 1.6772
2007 2012 0.4052 0.5063 -1.0246 1.8349
— Signif. codes: `’ confidence band does not cover 0

P-value for pre-test of parallel trends assumption: 0 Control Group: Not Yet Treated, Anticipation Periods: 0 Estimation Method: Outcome Regression

1.3. Group-time average treatment effects

ggdid(out)

1.4. Event Studies

A main type of aggregation is into an event study plot.

es <- aggte(out, type = "dynamic")
#summary(es)
ggdid(es)

The figure here is very similar to the group-time average treatment effects. Red dots are pre-treatment periods, blue dots are post-treatment periods. The difference is that the x-axis is in event time.

1.5. Overall Effect of Participating in the Treatment

The event study above reported an overall effect of participating in the treatment. This was computed by averaging the average effects computed at each length of exposure.

In many cases, a more general purpose overall treatment effect parameter is given by computing the average treatment effect for each group, and then averaging across groups. This sort of procedure provides an average treatment effect parameter with a very similar interpretation to the Average Treatment Effect on the Treated (ATT) in the two period and two group case.

To compute this overall average treatment effect parameter, use

group_effects <- aggte(out, type = "group")
summary(group_effects)

Call: aggte(MP = out, type = “group”)

Reference: Callaway, Brantly and Pedro H.C. Sant’Anna. “Difference-in-Differences with Multiple Time Periods.” Forthcoming at the Journal of Econometrics https://arxiv.org/abs/1803.09015, 2020.

Overall ATT:
ATT Std. Error [95% Conf. Int.] 0.0917 0.2208 -0.341 0.5244

Group Effects: group ATT Std. Error [95% Simult. Conf. Band] 2003 0.5279 0.3687 -0.1636 1.2193 2005 0.1194 0.4050 -0.6400 0.8789 2006 -0.0475 0.1645 -0.3560 0.2610 2007 0.1803 0.2917 -0.3667 0.7272 — Signif. codes: `*’ confidence band does not cover 0

Control Group: Not Yet Treated, Anticipation Periods: 0 Estimation Method: Outcome Regression

Of particular interest is the Overall ATT in the results. Here, we estimate that increasing the minimum wage decreased teen employment by 3.1% and the effect is marginally statistically significant.

2.1 Let’s define the controls

igrowth_rate_gsdp_constantprices
logitotal_foodgrains_ag_year
logipop
logihighways
logitotalipccrimes
state_election_year
state_aligned_with_center
bjp_seat_share
inc_seat_share

#The following are the control that have been used so far

#_Ieventtime_1 _Ieventtime_2 _Ieventtime_3 _Ieventtime_4 _Ieventtime_5 _Ieventtime_6 _Ieventtime_7 _Ieventtime_8 _Ieventtime_9 _Ieventtime_10 _Ieventtime_11 _Ieventtime_12 _Ieventtime_13 _Ieventtime_14 _Ieventtime_15 _Ieventtime_16 _Ieventtime_17 _Ieventtime_18 _Ieventtime_19 igrowth_rate_gsdp_constantprices logitotal_foodgrains_ag_year  logipop  logihighways logitotalipccrimes state_election_year state_aligned_with_center   bjp_seat_share  inc_seat_share [pw = mult] if unb == 1 & nic3digit < 900 , absorb(STATE year) cluster(state_id ) con

#Begin by selecting the variables that I am going to use, because the database is too big
my_data_21<-dplyr::select(my_data, year, id, factory_id, state_id, nic3digit, j_salegross12, j_salenet12, j_salestax12, h_valpurch12, VAT_intro_year, igrowth_rate_gsdp_constantprices, logitotal_foodgrains_ag_year,  logipop,  logihighways, logitotalipccrimes, state_election_year, state_aligned_with_center,  bjp_seat_share,  inc_seat_share)

#Now I need to create the dependent variable and others
my_data_22<-dplyr::mutate(my_data_21, 
                          lgross = asinh(j_salegross12), 
                          lnet = asinh(j_salenet12),
                          ltax = asinh(j_salestax12),
                          lvalue = asinh(h_valpurch12))

#I discover there is a possible mistake! The state 35 has a missing value in the variable VAT_intro_year, I am going to replace it with 0 which is what they do in the example they provide
my_data_22<-mutate(my_data_22, 
                          VAT_intro_year_02 = ifelse(is.na(VAT_intro_year),0,VAT_intro_year))

my_data_22<-dplyr::select(my_data_22, -VAT_intro_year)

#Filter the firms with nic3digit>900
my_data_22<-dplyr::filter(my_data_22, nic3digit<900)

#Filter the missing values
my_data_23<-na.exclude(my_data_22)

2.2 Now we can use the did package to estimate the ATT

out_2 <- att_gt(yname = "lgross", 
              gname = "VAT_intro_year_02", 
              idname = "id",
              tname = "year",
              xformla = ~igrowth_rate_gsdp_constantprices + logitotal_foodgrains_ag_year  + logipop + logihighways + logitotalipccrimes + state_election_year + state_aligned_with_center + bjp_seat_share +  inc_seat_share,
              data = my_data_23,
              est_method = "reg",
              control_group = "notyettreated"  
              )

# No pre-treatment periods to test 
# Drop group-periods that have variance equal to zero (singularity problems)

summary(out_2)

Call: att_gt(yname = “lgross”, tname = “year”, idname = “id”, gname = “VAT_intro_year_02”, xformla = ~igrowth_rate_gsdp_constantprices + logitotal_foodgrains_ag_year + logipop + logihighways + logitotalipccrimes + state_election_year + state_aligned_with_center + bjp_seat_share + inc_seat_share, data = my_data_23, control_group = “notyettreated”, est_method = “reg”)

Reference: Callaway, Brantly and Pedro H.C. Sant’Anna. “Difference-in-Differences with Multiple Time Periods.” Forthcoming at the Journal of Econometrics https://arxiv.org/abs/1803.09015, 2020.

Group-Time Average Treatment Effects: Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
2003 1999 0.1514 0.1427 -0.2000 0.5028
2003 2000 -0.0146 0.0814 -0.2150 0.1858
2003 2001 -0.0449 0.0890 -0.2641 0.1743
2003 2002 NA NA NA NA 2003 2003 -0.0150 0.1185 -0.3068 0.2768
2003 2004 0.1690 0.1390 -0.1730 0.5111
2003 2005 NA NA NA NA 2003 2006 NA NA NA NA 2003 2007 NA NA NA NA 2003 2008 NA NA NA NA 2003 2009 NA NA NA NA 2003 2010 NA NA NA NA 2003 2011 NA NA NA NA 2003 2012 NA NA NA NA 2005 1999 NA NA NA NA 2005 2000 NA NA NA NA 2005 2001 NA NA NA NA 2005 2002 NA NA NA NA 2005 2003 NA NA NA NA 2005 2004 NA NA NA NA 2005 2005 NA NA NA NA 2005 2006 NA NA NA NA 2005 2007 NA NA NA NA 2005 2008 NA NA NA NA 2005 2009 NA NA NA NA 2005 2010 NA NA NA NA 2005 2011 NA NA NA NA 2005 2012 NA NA NA NA 2006 1999 -0.0136 0.1487 -0.3796 0.3523
2006 2000 -0.2996 0.3536 -1.1701 0.5709
2006 2001 0.0931 0.1165 -0.1936 0.3798
2006 2002 NA NA NA NA 2006 2003 0.0424 0.0846 -0.1659 0.2507
2006 2004 0.0355 0.0805 -0.1625 0.2336
2006 2005 NA NA NA NA 2006 2006 NA NA NA NA 2006 2007 NA NA NA NA 2006 2008 NA NA NA NA 2006 2009 NA NA NA NA 2006 2010 NA NA NA NA 2006 2011 NA NA NA NA 2006 2012 NA NA NA NA 2007 1999 0.1089 0.2008 -0.3855 0.6033
2007 2000 0.1785 0.1031 -0.0752 0.4322
2007 2001 0.1781 0.1429 -0.1735 0.5298
2007 2002 NA NA NA NA 2007 2003 0.1228 0.0996 -0.1224 0.3680
2007 2004 -0.0137 0.0960 -0.2499 0.2226
2007 2005 NA NA NA NA 2007 2006 NA NA NA NA 2007 2007 NA NA NA NA 2007 2008 NA NA NA NA 2007 2009 NA NA NA NA 2007 2010 NA NA NA NA 2007 2011 NA NA NA NA 2007 2012 NA NA NA NA — Signif. codes: `*’ confidence band does not cover 0

P-value for pre-test of parallel trends assumption: 0.10497 Control Group: Not Yet Treated, Anticipation Periods: 0 Estimation Method: Outcome Regression

2.3. Group-time average treatment effects

ggdid(out_2)

2.4. Event Studies

A main type of aggregation is into an event study plot.

es_2 <- aggte(out, type = "dynamic")
#summary(es_2)
ggdid(es_2)

2.5. Overall Effect of Participating in the Treatment

The event study above reported an overall effect of participating in the treatment. This was computed by averaging the average effects computed at each length of exposure.

To compute this overall average treatment effect parameter, use

group_effects_2 <- aggte(out, type = "group")
summary(group_effects_2)

Call: aggte(MP = out, type = “group”)

Reference: Callaway, Brantly and Pedro H.C. Sant’Anna. “Difference-in-Differences with Multiple Time Periods.” Forthcoming at the Journal of Econometrics https://arxiv.org/abs/1803.09015, 2020.

Overall ATT:
ATT Std. Error [95% Conf. Int.] 0.0917 0.2211 -0.3418 0.5251

Group Effects: group ATT Std. Error [95% Simult. Conf. Band] 2003 0.5279 0.4013 -0.1397 1.1954 2005 0.1194 0.2822 -0.3500 0.5889 2006 -0.0475 0.1512 -0.2990 0.2040 2007 0.1803 0.2619 -0.2555 0.6160 — Signif. codes: `*’ confidence band does not cover 0

Control Group: Not Yet Treated, Anticipation Periods: 0 Estimation Method: Outcome Regression

Of particular interest is the Overall ATT in the results. Here, we estimate that increasing the minimum wage decreased teen employment by 3.1% and the effect is marginally statistically significant.

DiD_taxes

Aguiar28D

The Impact of the VAT reform in India - Firm level data

0.Dataset and package

1. To estimate group-time average treatment effects

1.1 Let’s define the main variables (no controls yet)

1.2 Now we can use the did package to estimate the ATT

1.3. Group-time average treatment effects

1.4. Event Studies

1.5. Overall Effect of Participating in the Treatment

2.1 Let’s define the controls

2.2 Now we can use the did package to estimate the ATT

2.3. Group-time average treatment effects

2.4. Event Studies

2.5. Overall Effect of Participating in the Treatment