In this process we will Import all data we need to do the analysis, and do some cleaning on dataset.
First, we subset start from January 2000, and exclude some state like District of Colombia and Puerto Rico, and then remove state that have missing observation (NA).
ES is one of many sectors that affect by catherina hurricane is August 2005, in this analysis we will explore/analys effect of caterina hurricane is Education Service sector in Lousiana (LA) State, using synthetics different in different.
library(ggplot2)
library(synthdid)
library(dplyr)
library(readxl)
ES_State <- read_excel("ES_State.xls", sheet = "ES")
ES_State<-filter(ES_State,STATE!="DC",STATE!="PR",DATE>="2000-01-01")
#ggplot(ES_State)+geom_line(aes(y=ES,x=DATE,col=STATE))
#COump Dataframe
dumy_date=seq.Date(from = as.Date("2000-01-01"),to = as.Date("2022-01-01"),by = "month")
Fix_df<-data_frame(STATE=rep(unique(ES_State$STATE),length(dumy_date)))
Fix_df<-arrange(Fix_df,STATE)
Fix_df$DATE<-rep(dumy_date,length(unique(ES_State$STATE)))
ES_State<-left_join(Fix_df,ES_State,by=c("STATE","DATE"))
##Remove NA
ES<-ES_State%>%select(STATE,DATE,ES)
ES<-ES %>% group_by(STATE) %>% filter(!any(is.na(ES)))
head(ES)
## # A tibble: 6 × 3
## # Groups: STATE [1]
## STATE DATE ES
## <chr> <dttm> <dbl>
## 1 AL 2000-01-01 00:00:00 29.8
## 2 AL 2000-02-01 00:00:00 28.9
## 3 AL 2000-03-01 00:00:00 29.8
## 4 AL 2000-04-01 00:00:00 29.5
## 5 AL 2000-05-01 00:00:00 30.3
## 6 AL 2000-06-01 00:00:00 32.6
here, we will subset data before January 2008 to focus on effect after catherina hurricane in 2005.
LA_ES<-filter(ES,DATE<="2008-01-01")
LA_ES$DATE<-as.Date(LA_ES$DATE)
#Check NA
sum(is.na(LA_ES))
## [1] 0
to do SDID we need to set treatment using dummy, our treatment period is start from September 2005 until last observation in this case we use January 2008 and LA as treatment state, since we will analysis effect post catherina.
#Generating DUmmy Treatemet
LA_ES$Treat<-ifelse(LA_ES$DATE>"2005-08-01" &
LA_ES$DATE<="2022-01-01" &
LA_ES$STATE=="LA",1,0)
LA_ES<-as.data.frame(LA_ES)
LA_ES$STATE<-factor(LA_ES$STATE)
LA_ES$DATE<-as.Date(LA_ES$DATE)
set_ES<-panel.matrices(LA_ES,unit = 1,time = 2,outcome = 3,treatment = 4)
SDID_ES<-synthdid_estimate(set_ES$Y,set_ES$N0,set_ES$T0)
plot(SDID_ES)
We will use Florida (FL) as comparation before treatment affect,
#Generating DUmmy Treatemet
LA_ES$Treat<-ifelse(LA_ES$DATE>"2005-08-01" &
LA_ES$DATE<="2022-01-01" &
LA_ES$STATE=="FL",1,0)
LA_ES<-as.data.frame(LA_ES)
LA_ES$STATE<-factor(LA_ES$STATE)
LA_ES$DATE<-as.Date(LA_ES$DATE)
set_ES<-panel.matrices(LA_ES,unit = 1,time = 2,outcome = 3,treatment = 4)
SDID_ES<-synthdid_estimate(set_ES$Y,set_ES$N0,set_ES$T0)
plot(SDID_ES)
In Educational Service, SDID for LA doest work well since before treatment LA already fluctuative compare to control which tent to linear. this cause effect of big pre treatment MSE.