Advanced quantitative data analysis

class: center, middle, inverse, title-slide

.title[
# Advanced quantitative data analysis
]
.subtitle[
## Cross-sectional data analysis II
]
.author[
### Mengni Chen
]
.institute[
### Department of Sociology, University of Copenhagen
]

---

#Let's get ready

```r
#install one new package
install.packages("oaxaca") #for Blinder-Oaxaca Decomposition
```

```r
library(tidyverse) #for recoding
library(haven) #for importing data
library(janitor) #for tabulation
library(estimatr) #robust standard error OLS
library(texreg) #export regression result
library(ggplot2) #for plotting 
library(oaxaca) #for Blinder-Oaxaca Decomposition
```
Prepare a dataset of age, sex,  relationship status, life satisfaction
- DV: life satisfaction (sat6)
- IV: 
  - age(age), 
  - sex(sex_gen), 
  - relationship status(relstat), 
  - number of children (nkids), 
  - health (hlt1)
  - religion (sd30)

---
#Prepare data

```r
wave1 <- read_dta("anchor1_50percent_Eng.dta")
# sample size =6201
```

```r
tabyl(wave1,age) #no missing
tabyl(wave1,sex_gen) #no missing
tabyl(wave1,relstat) # 34 cases reporting -7, needs cleaning
tabyl(wave1, nkids) #no missing
tabyl(wave1, hlt1) #10 cases reporting -1 or -2, needs cleaning
tabyl(wave1,sd30) #28 cases reporting -1 or -2, needs cleaning
```

---
#Prepare data

```r
wave1b <- wave1 %>% 
  transmute(
    age, 
    nkids,
    sat6=case_when(sat6<0 ~ as.numeric(NA), #specify when sat should be considered missing
                   TRUE ~ as.numeric(sat6)),
    gender=as_factor(sex_gen) %>% fct_drop(), #treat sex_gen as categorical, and drop unused level
    relstat=as_factor(relstat), #treat relationship status as categorical
    relstat_new1=case_when(
      relstat=="-7 Incomplete data" ~ as.character(NA),#specify when it should be missing
      TRUE ~ as.character(relstat)
                         ) %>% as_factor() %>% fct_drop(),
      #make relstat as a factor, and then drop unused levels in relstat_new1
    health=case_when(
      hlt1<0 ~ as.numeric(NA),#specify when it should be missing
      TRUE ~ hlt1),
    religion=case_when(
      sd30<0 ~ as.character(NA), #specify when it should be missing
      sd30==7 ~ "No",#specify when it should be "no religion"
      sd30 %in% c(1:6) ~ "Yes" #specify when it should be "have religion"
           ) %>% as_factor()%>%fct_relevel("No", "Yes") #use "No" as reference level
           )%>%
  drop_na() #drop all observations with missing values in the sample
# sample size change from 6201 to 6132
```
---
#The relationship between having a partner and life satisfaction
- Generate a new variable for relationship status

```r
wave1c <- wave1b %>% 
  mutate(
    relstat_new2=case_when(
      relstat_new1 %in% c("1 Never married single") ~ "single",
      #treat 'never married single' as 'single'
      relstat_new1 %in% c("2 Never married LAT", 
                     "3 Never married COHAB",
                     "4 Married COHAB",
                     "5 Married noncohabiting") ~ 'partnered',
      #treat the 4 situations as "partnered"
      relstat_new1 %in% c("6 Divorced/separated single",
                     "7 Divorced/separated LAT",
                     "8 Divorced/separated COHAB") ~ 'separated',
      #treat the 3 situations as "separated"
      relstat_new1 %in% c("9 Widowed single",
                     "10 Widowed LAT") ~ 'widowed'
      #treat the 2 situations as "widowed"
                          ) %>% as_factor()# make relstat_new2 as factor
         ) %>% 
  filter(relstat_new2!= "widowed" &  relstat_new2!= "separated") #only 4 widowed and 284 separated, dropping.
# sample size change to 5845 after dropping widowed and separated
```

---
#Exploring the relationship between having a partner and life satisfaction
First, we can estimate the average sat6 across relationship status to have a first look
.pull-left[

```r
tabyl(wave1c,relstat_new2) 
```

```
##  relstat_new2    n   percent
##        single 2435 0.4165954
##     partnered 3410 0.5834046
##     separated    0 0.0000000
##       widowed    0 0.0000000
```

```r
#check the frequency of each category
```
]
.pull-right[

```
## # A tibble: 2 × 2
##   relstat_new2 mean_sat6
##   <fct>            <dbl>
## 1 single            7.51
## 2 partnered         7.78
```
]

---
#Why those partnered are happier than those singles

```r
wave1c$relstat_new2 <- fct_drop(wave1c$relstat_new2) #drop unused levels
compare<- wave1c %>% 
  group_by(relstat_new2) %>% 
  dplyr::summarise(
    mean_sat6=mean(sat6), #calculate the mean of sat6 by marital1
    mean_age=mean(age),
    mean_nkids=mean(nkids),
    mean_health=mean(health),
    prop_religion=sum(religion=="Yes")/n() #the no. of Yes/the total size of the group
          ) 
compare
```

```
## # A tibble: 2 × 6
##   relstat_new2 mean_sat6 mean_age mean_nkids mean_health prop_religion
##   <fct>            <dbl>    <dbl>      <dbl>       <dbl>         <dbl>
## 1 single            7.51     20.3     0.0583        3.90         0.744
## 2 partnered         7.78     29.1     0.969         3.79         0.725
```

```r
#compare to the single group, the partnered group are older, had more children, slightly lower in health and religion.
```
---
#Why those partnered are happier than those singles

```r
OLS_single <- lm_robust(formula = sat6 ~ age + nkids + health + religion, subset=(relstat_new2=="single"), data = wave1c)
OLS_partner <- lm_robust(formula = sat6 ~ age + nkids + health + religion, subset=(relstat_new2=="partnered"), data = wave1c)
#subset=() here is to specify which sample for the OLS

screenreg(list(OLS_single,OLS_partner),include.ci = FALSE, digits = 3, 
                  custom.model.names = c("OLS for single","OLS for partnered"),#rename the models
                  single.row =TRUE #to make the coef. and standard error in the same row
          )
```

```
## 
## =======================================================
##              OLS for single        OLS for partnered   
## -------------------------------------------------------
## (Intercept)     7.038 (0.225) ***     5.947 (0.184) ***
## age            -0.070 (0.006) ***    -0.013 (0.005) ** 
## nkids           0.076 (0.144)         0.060 (0.033)    
## health          0.464 (0.042) ***     0.502 (0.031) ***
## religionYes     0.107 (0.082)         0.337 (0.061) ***
## -------------------------------------------------------
## R^2             0.147                 0.100            
## Adj. R^2        0.146                 0.099            
## Num. obs.    2435                  3410                
## RMSE            1.661                 1.557            
## =======================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
```

```r
#screen reg is to show the result in your screen in a formatted way
```

---
#Why those partnered are happier than those singles

```r
#screen reg is to show the result in your screen in a formatted way
```

---
#Why those partnered are happier than those singles

- Blinder-Oaxaca Decomposition
  - an econometric method
  
  - often used to explain the difference in the means of the dependent variable between two groups 
  
  - A good example is the gender difference in wage: why men's average wage is higher than women's wage
      - because men are more educated, more experienced, working long hours than women? (composition difference)
      - or the return of education, experience, labor is higher for men than women? (discrimination)
- today, we learn the so-call two-fold Blinder-Oaxaca decomposition

---
# OLS regression

`$$y_i= \alpha+\beta*x_i  + e_i$$`
--

- Group A (treated)=Those partnered, the life satisfaction can be modelled as the following
`$$y_i^a= \alpha^a +\beta_1^a*age_i^a +... +\beta_4^a*religion_i^a+ e_i^a$$`
--

- Then, Group A (treated) =Those partnered, the average life satisfaction is

`$$\overline{y^a}= \alpha^a +\beta_1^a*\overline{age_i^a} + ...+\beta_4^a*\overline{religion_i^a}$$`

- Group B (control)=Those singled, the life satisfaction can be modeled as the following
`$$y_i^b= \alpha^b +\beta_1^b*age_i^b + ... + \beta_4^b*religion_i^b + e_i^b$$`

- Then,Group B (control)=Those singled, the average life satisfaction is 
`$$\overline{y^b}= \alpha^b +\beta_1^b*\overline{age_i^b} + ...+\beta_4^b*\overline{religion_i^b}$$`
---
#Blinder-Oaxaca Decomposition
The difference between the average life satisfaction between those singles and partnered can be described:
`$$\overline{y^a}-\overline{y^b}= 
\alpha^a - \alpha^b +
\beta_1^a*\overline{age_i^a}- \beta_1^b*\overline{age_i^b} +
...+
\beta_4^a*\overline{religion_i^a}-\beta_4^b*\overline{religion_i^b}$$`
some transformation
`$$\overline{y^a}-\overline{y^b}=$$` 
`$$\alpha^a - \alpha^b +$$`

`$$\color{red}\beta_\color{red}1^\color{red}b*(\overline{age_i^a}-\overline{age_i^b})+ \overline{age_i^a}*(\beta_1^a-\beta_1^b)+$$` 
`$$...+$$`
`$$\color{red}\beta_\color{red}4^\color{red}b*(\overline{religion_i^a}-\overline{religion_i^b})+ \overline{religion_i^a}*(\beta_4^a-\beta_4^b)$$` 
*note: `$a$` represents "partnered group"; `$b$` represents "single group"

---
#Blinder-Oaxaca Decomposition
<img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f5.png?raw=true" width="100%" style="display: block; margin: auto;" >

---
#Blinder-Oaxaca Decomposition
- explained part: attribute to cross-group differences in the explanatory variables. often called the composition effect, or endowment effect;

- unexplained part: attribute to difference in the intercept + different in the regression coefficients.often called the coefficient effect;
<img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f6.png?raw=true" width="80%" style="display: block; margin: auto;" >

---
#Blinder-Oaxaca Decomposition: coding
- first step: identify group, here is "relstat_new3"

```r
wave1c <- wave1c %>% 
  mutate(
    relstat_new3=case_when(
      relstat_new2 %in% c("single") ~ TRUE, #oaxaca treat those TRUE as the control group,i.e. group B
      relstat_new2 %in% c("partnered") ~ FALSE) #oaxaca treat those FALSE as the treated group, i.e. group A
         ) 
```

- second step: decompose. Remember, to use regression coefficients of Group B (control,i.e. single) as reference, i.e. `$\color{red}\beta^\color{red}b$`, we need to specific group.weights=0

```r
#oaxaca(#formula = y ~ x1 + x2.. | your group variable, data=your data)
result <- oaxaca(formula = sat6 ~ age +  nkids + health + religion | relstat_new3 , group.weights = 0, data = wave1c)
```

```
## oaxaca: oaxaca() performing analysis. Please wait.
```

```
## 
## Bootstrapping standard errors:
```

```
## 1 / 100 (1%)
```

```
## 10 / 100 (10%)
```

```
## 20 / 100 (20%)
```

```
## 30 / 100 (30%)
```

```
## 40 / 100 (40%)
```

```
## 50 / 100 (50%)
```

```
## 60 / 100 (60%)
```

```
## 70 / 100 (70%)
```

```
## 80 / 100 (80%)
```

```
## 90 / 100 (90%)
```

```
## 100 / 100 (100%)
```

---
#Blinder-Oaxaca Decomposition: understand results

```r
result #look at what the result of decomposition looks 
```

```
## $beta
## $beta$beta.A
## (Intercept)         age       nkids      health religionYes 
##  5.94657309 -0.01288393  0.05987523  0.50224933  0.33670445 
## 
## $beta$beta.B
## (Intercept)         age       nkids      health religionYes 
##  7.03838495 -0.07037895  0.07564373  0.46434948  0.10704506 
## 
## $beta$beta.diff
## (Intercept)         age       nkids      health religionYes 
## -1.09181186  0.05749502 -0.01576851  0.03789985  0.22965939 
## 
## $beta$beta.R
##                 (Intercept)         age      nkids    health religionYes
## [1,]  0.0000000    7.038385 -0.07037895 0.07564373 0.4643495   0.1070451
## [2,]  1.0000000    5.946573 -0.01288393 0.05987523 0.5022493   0.3367045
## [3,]  0.5000000    6.492479 -0.04163144 0.06775948 0.4832994   0.2218748
## [4,]  0.5834046    6.401417 -0.03683609 0.06644431 0.4864604   0.2410294
## [5,] -1.0000000    6.109504 -0.02582568 0.19722216 0.4957332   0.2645122
## [6,] -2.0000000    6.705597 -0.03890896 0.14254406 0.4953289   0.2609984
## [7,]  0.0000000    7.038385 -0.07037895 0.07564373 0.4643495   0.1070451
## 
## 
## $call
## oaxaca(formula = sat6 ~ age + nkids + health + religion | relstat_new3, 
##     data = wave1c, group.weights = 0)
## 
## $n
## $n$n.A
## [1] 3410
## 
## $n$n.B
## [1] 2435
## 
## $n$n.pooled
## [1] 5845
## 
## 
## $R
## [1] 100
## 
## $reg
## $reg$reg.A
## 
## Call:
## NULL
## 
## Coefficients:
## (Intercept)          age        nkids       health  religionYes  
##     5.94657     -0.01288      0.05988      0.50225      0.33670  
## 
## 
## $reg$reg.B
## 
## Call:
## NULL
## 
## Coefficients:
## (Intercept)          age        nkids       health  religionYes  
##     7.03838     -0.07038      0.07564      0.46435      0.10705  
## 
## 
## $reg$reg.pooled.1
## 
## Call:
## NULL
## 
## Coefficients:
## (Intercept)          age        nkids       health  religionYes  
##     6.10950     -0.02583      0.19722      0.49573      0.26451  
## 
## 
## $reg$reg.pooled.2
## 
## Call:
## NULL
## 
## Coefficients:
##      (Intercept)               age             nkids            health  
##          6.70560          -0.03891           0.14254           0.49533  
##      religionYes  relstat_new3TRUE  
##          0.26100          -0.54465  
## 
## 
## 
## $threefold
## $threefold$overall
##   coef(endowments)     se(endowments) coef(coefficients)   se(coefficients) 
##        -0.60740111         0.13557500         0.39187202         0.05264004 
##  coef(interaction)    se(interaction) 
##         0.48570318         0.13776063 
## 
## $threefold$variables
##             coef(endowments) se(endowments) coef(coefficients) se(coefficients)
## (Intercept)      0.000000000    0.000000000      -1.0918118635      0.253885841
## age             -0.622738578    0.054574435       1.1658856925      0.144769249
## nkids            0.068858905    0.142748998      -0.0009195598      0.009373424
## health          -0.051432520    0.012033382       0.1478172132      0.193272198
## religionYes     -0.002088922    0.002529319       0.1709005412      0.083773119
##             coef(interaction) se(interaction)
## (Intercept)       0.000000000     0.000000000
## age               0.508736888     0.065393110
## nkids            -0.014354159     0.145256582
## health           -0.004197883     0.005454114
## religionYes      -0.004481668     0.003987870
## 
## 
## $twofold
## $twofold$overall
##      group.weight coef(explained) se(explained) coef(unexplained)
## [1,]    0.0000000      -0.6074011    0.13557500         0.8775752
## [2,]    1.0000000      -0.1216979    0.03378946         0.3918720
## [3,]    0.5000000      -0.3645495    0.07082835         0.6347236
## [4,]    0.5834046      -0.3240396    0.08115709         0.5942137
## [5,]   -1.0000000      -0.1090531    0.02572224         0.3792272
## [6,]   -2.0000000      -0.2744792    0.03222091         0.5446532
## [7,]    0.0000000      -0.6074011    0.13557500         0.8775752
##      se(unexplained) coef(unexplained A) se(unexplained A) coef(unexplained B)
## [1,]      0.14207581        8.775752e-01      1.420758e-01           0.0000000
## [2,]      0.05264004        0.000000e+00      0.000000e+00           0.3918720
## [3,]      0.08205947        4.387876e-01      7.103790e-02           0.1959360
## [4,]      0.09120432        3.655938e-01      8.288768e-02           0.2286199
## [5,]      0.03733074        1.579843e-01      1.587636e-02           0.2212429
## [6,]      0.05173187       -1.841235e-13      1.438947e-13           0.5446532
## [7,]      0.14207581        8.775752e-01      1.420758e-01           0.0000000
##      se(unexplained B)
## [1,]        0.00000000
## [2,]        0.05264004
## [3,]        0.02632002
## [4,]        0.02192960
## [5,]        0.02193859
## [6,]        0.05173187
## [7,]        0.00000000
## 
## $twofold$variables
## $twofold$variables[[1]]
##             group.weight coef(explained) se(explained) coef(unexplained)
## (Intercept)            0     0.000000000   0.000000000       -1.09181186
## age                    0    -0.622738578   0.054574435        1.67462258
## nkids                  0     0.068858905   0.142748998       -0.01527372
## health                 0    -0.051432520   0.012033382        0.14361933
## religionYes            0    -0.002088922   0.002529319        0.16641887
##             se(unexplained) coef(unexplained A) se(unexplained A)
## (Intercept)      0.25388584         -1.09181186        0.25388584
## age              0.20919778          1.67462258        0.20919778
## nkids            0.15458849         -0.01527372        0.15458849
## health           0.18799208          0.14361933        0.18799208
## religionYes      0.08166684          0.16641887        0.08166684
##             coef(unexplained B) se(unexplained B)
## (Intercept)                   0                 0
## age                           0                 0
## nkids                         0                 0
## health                        0                 0
## religionYes                   0                 0
## 
## $twofold$variables[[2]]
##             group.weight coef(explained) se(explained) coef(unexplained)
## (Intercept)            1      0.00000000   0.000000000     -1.0918118635
## age                    1     -0.11400169   0.034992847      1.1658856925
## nkids                  1      0.05450475   0.028104487     -0.0009195598
## health                 1     -0.05563040   0.012540701      0.1478172132
## religionYes            1     -0.00657059   0.004763773      0.1709005412
##             se(unexplained) coef(unexplained A) se(unexplained A)
## (Intercept)     0.253885841                   0                 0
## age             0.144769249                   0                 0
## nkids           0.009373424                   0                 0
## health          0.193272198                   0                 0
## religionYes     0.083773119                   0                 0
##             coef(unexplained B) se(unexplained B)
## (Intercept)       -1.0918118635       0.253885841
## age                1.1658856925       0.144769249
## nkids             -0.0009195598       0.009373424
## health             0.1478172132       0.193272198
## religionYes        0.1709005412       0.083773119
## 
## $twofold$variables[[3]]
##             group.weight coef(explained) se(explained) coef(unexplained)
## (Intercept)          0.5     0.000000000    0.00000000      -1.091811864
## age                  0.5    -0.368370134    0.03213051       1.420254137
## nkids                0.5     0.061681825    0.07286083      -0.008096639
## health               0.5    -0.053531462    0.01198328       0.145718271
## religionYes          0.5    -0.004329756    0.00325111       0.168659707
##             se(unexplained) coef(unexplained A) se(unexplained A)
## (Intercept)      0.25388584        -0.545905932        0.12694292
## age              0.17689505         0.837311290        0.10459889
## nkids            0.08196257        -0.007636859        0.07729424
## health           0.19063092         0.071809665        0.09399604
## religionYes      0.08270265         0.083209437        0.04083342
##             coef(unexplained B) se(unexplained B)
## (Intercept)       -0.5459059318       0.126942920
## age                0.5829428463       0.072384625
## nkids             -0.0004597799       0.004686712
## health             0.0739086066       0.096636099
## religionYes        0.0854502706       0.041886559
## 
## $twofold$variables[[4]]
##             group.weight coef(explained) se(explained) coef(unexplained)
## (Intercept)    0.5834046     0.000000000    0.00000000      -1.091811864
## age            0.5834046    -0.325939128    0.03476201       1.377823130
## nkids          0.5834046     0.060484622    0.08419713      -0.006899436
## health         0.5834046    -0.053881584    0.01194847       0.146068394
## religionYes    0.5834046    -0.004703547    0.00305306       0.169033499
##             se(unexplained) coef(unexplained A) se(unexplained A)
## (Intercept)      0.25388584        -0.454843779        0.14811817
## age              0.18227368         0.697640031        0.12204695
## nkids            0.09407699        -0.006362961        0.09018764
## health           0.19019056         0.059831149        0.10967545
## religionYes      0.08252742         0.069329334        0.04764481
##             coef(unexplained B) se(unexplained B)
## (Intercept)       -0.6369680846       0.105767668
## age                0.6801830986       0.060310200
## nkids             -0.0005364754       0.003904925
## health             0.0862372450       0.080516305
## religionYes        0.0997041652       0.034899494
## 
## $twofold$variables[[5]]
##             group.weight coef(explained) se(explained) coef(unexplained)
## (Intercept)           -1      0.00000000   0.000000000        -1.0918119
## age                   -1     -0.22851506   0.028544991         1.2803991
## nkids                 -1      0.17953241   0.028338986        -0.1259472
## health                -1     -0.05490866   0.012016393         0.1470955
## religionYes           -1     -0.00516180   0.003687676         0.1694918
##             se(unexplained) coef(unexplained A) se(unexplained A)
## (Intercept)      0.25388584         -0.16293056        0.12247469
## age              0.17061320          0.37694667        0.09199642
## nkids            0.02158975         -0.13303722        0.01556796
## health           0.19109404          0.02469249        0.07927063
## religionYes      0.08289376          0.05231292        0.03395066
##             coef(unexplained B) se(unexplained B)
## (Intercept)        -0.928881301       0.142790443
## age                 0.903452391       0.091166905
## nkids               0.007089994       0.008933804
## health              0.122402984       0.114545031
## religionYes         0.117178832       0.050120768
## 
## $twofold$variables[[6]]
##             group.weight coef(explained) se(explained) coef(unexplained)
## (Intercept)           -2     0.000000000   0.000000000       -1.09181186
## age                   -2    -0.344280679   0.030421263        1.39616468
## nkids                 -2     0.129758640   0.028520507       -0.07617345
## health                -2    -0.054863884   0.012282192        0.14705069
## religionYes           -2    -0.005093231   0.003751362        0.16942318
##             se(unexplained) coef(unexplained A) se(unexplained A)
## (Intercept)      0.25388584         -0.75902411        0.10827295
## age              0.17246244          0.75801540        0.09251316
## nkids            0.02057542         -0.08007483        0.01372315
## health           0.19106955          0.02622441        0.07907386
## religionYes      0.08290507          0.05485913        0.03371728
##             coef(unexplained B) se(unexplained B)
## (Intercept)        -0.332787750       0.154066808
## age                 0.638149284       0.084030240
## nkids               0.003901374       0.008849679
## health              0.120826281       0.113495881
## religionYes         0.114564050       0.049932253
## 
## $twofold$variables[[7]]
##             group.weight coef(explained) se(explained) coef(unexplained)
## (Intercept)            0     0.000000000   0.000000000       -1.09181186
## age                    0    -0.622738578   0.054574435        1.67462258
## nkids                  0     0.068858905   0.142748998       -0.01527372
## health                 0    -0.051432520   0.012033382        0.14361933
## religionYes            0    -0.002088922   0.002529319        0.16641887
##             se(unexplained) coef(unexplained A) se(unexplained A)
## (Intercept)      0.25388584         -1.09181186        0.25388584
## age              0.20919778          1.67462258        0.20919778
## nkids            0.15458849         -0.01527372        0.15458849
## health           0.18799208          0.14361933        0.18799208
## religionYes      0.08166684          0.16641887        0.08166684
##             coef(unexplained B) se(unexplained B)
## (Intercept)                   0                 0
## age                           0                 0
## nkids                         0                 0
## health                        0                 0
## religionYes                   0                 0
## 
## 
## 
## $x
## $x$x.mean.A
## (Intercept)         age       nkids      health religionYes 
##   1.0000000  29.1263930   0.9686217   3.7894428   0.7246334 
## 
## $x$x.mean.B
## (Intercept)         age       nkids      health religionYes 
##  1.00000000 20.27802875  0.05831622  3.90020534  0.74414784 
## 
## $x$x.mean.diff
## (Intercept)         age       nkids      health religionYes 
##  0.00000000  8.84836421  0.91030548 -0.11076252 -0.01951441 
## 
## 
## $y
## $y$y.A
## [1] 7.77654
## 
## $y$y.B
## [1] 7.506366
## 
## $y$y.diff
## [1] 0.2701741
## 
## 
## attr(,"class")
## [1] "oaxaca"
```

---
#Blinder-Oaxaca Decomposition: understand results
how to read the result? you need to know what is [list in R](https://www.youtube.com/watch?v=X8lNTDeiKiE&t=171s)

```r
#I want to see the resutl that uses the equally weighted average coefficent to decompose
variable <- result$twofold$variables[[1]]
overall <- result$twofold$overall[1,]
result2<- rbind(variable, overall)[,c(1:5)] 
#rbind is to bind rows of "variable" and "overall" dataset. [,c(1:5)] after rbind is to select column 1 to 5.
result2
```

```
##             group.weight coef(explained) se(explained) coef(unexplained)
## (Intercept)            0     0.000000000   0.000000000       -1.09181186
## age                    0    -0.622738578   0.054574435        1.67462258
## nkids                  0     0.068858905   0.142748998       -0.01527372
## health                 0    -0.051432520   0.012033382        0.14361933
## religionYes            0    -0.002088922   0.002529319        0.16641887
## overall                0    -0.607401115   0.135574997        0.87757520
##             se(unexplained)
## (Intercept)      0.25388584
## age              0.20919778
## nkids            0.15458849
## health           0.18799208
## religionYes      0.08166684
## overall          0.14207581
```
---
#Blinder-Oaxaca Decomposition: visualize results
.pull-left[

```r
plot(result, decomposition = "twofold", group.weight = 0, 
     type="overall", title = "Overall")
```
<img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f2.png?raw=true" width="80%" style="display: block; margin: auto;" >
]
.pull-right[

```r
plot(result, decomposition = "twofold", group.weight =0, 
     type="variables", title="by variable")
```
<img src="https://github.com/fancycmn/25-Session3/blob/main/s5_f3.png?raw=true" width="80%" style="display: block; margin: auto;" >
]
---
#Take home
1. Exploratory analysis: two-fold oaxaca decomposition
2. Two-fold Oaxaca decomposition
    - explained part
    - unexplained part
    - reference set of coefficient in OB decomposition

4. Important code
    - oaxaca(formula = y ~ x1 +  x2 + x3 | your group var , data = your dataset)
    - plot(result, decomposition = "twofold", type="", group.weight =, title = "")

---
class: center, middle
#[Exercise](https://rpubs.com/fancycmn/1348401)