title: “Problem Set 3” author: Madison Burke date: 04-03-2025

Question 1: The central question this paper seeks to address is whether the rise in women’s income following the post-Mao reforms in China influences how households invest in their children’s education and survival, particularly with regard to gender. Specifically, it examines how a greater share of household income earned by women might affect the distribution of resources between sons and daughters.

Question 2: Comparing households with identical total incomes but different proportions of income earned by women is unlikely to yield meaningful results. This is because the variable of interest—women’s share of household income—is not randomly assigned, introducing potential selection bias. For instance, households where women earn more might also hold more progressive values, which could independently influence how they invest in their children. Without randomization, this approach cannot adequately address endogeneity or isolate the causal effect of women’s earnings on child outcomes. Additionally, it fails to account for reverse causality: households that already emphasize equal treatment of sons and daughters may be more likely to support female labor force participation, thereby increasing women’s income share. Without randomization, other confounding variables such as parental preferences, cultural norms, and household characteristics can sway outcomes on survival rates and education investment between genders. These confounding variables would create biased estimates.

Question 3: 3.1) The difference-in-differences approach estimates the causal impact by comparing changes in outcomes over time between a treatment and a control group. The treatment group includes households where women’s share of income rises due to external influences, such as the post-Mao economic reforms. The control group comprises households unaffected by these external changes. This method relies on the assumption that, prior to the treatment, both groups followed similar trends in child survival rates and educational investments. By examining how these outcomes diverge after the treatment, researchers can isolate and estimate the causal effect of increased female income share.

3.2) For this approach to yield an unbiased estimate of the causal effect, the parallel trends assumption must hold. This means that, in the absence of treatment—that is, without the external increase in women’s income—both the treatment and control groups would have experienced similar trends in survival rates and educational investments. This assumption is critical because if the groups were already on different trajectories before the treatment, any observed differences afterward could reflect those pre-existing disparities rather than the effect of the treatment itself. Such baseline differences could stem from variations in household wealth, educational factors, or differences in markets.

Question 4:

library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

rm(list=ls())
data <- read.csv("/Users/madisonlaptop/Desktop/ChinaTea.csv")
head(data)

##   county_id birth_year    slope   tea_sown orchard_sown cashcrop_sown
## 1         1       1970 17.48357 0.04258456    0.3518503     0.1056768
## 2         1       1971 17.48357 0.04258456    0.3518503     0.1056768
## 3         1       1972 17.48357 0.04258456    0.3518503     0.1056768
## 4         1       1973 17.48357 0.04258456    0.3518503     0.1056768
## 5         1       1974 17.48357 0.04258456    0.3518503     0.1056768
## 6         1       1975 17.48357 0.04258456    0.3518503     0.1056768
##   fraction_male  edu_all edu_female edu_male   edu_gap
## 1     0.5001040 6.303439   5.525352 6.506328 0.9809763
## 2     0.5087421 6.402443   5.739398 6.160680 0.4212815
## 3     0.5105572 5.777255   5.777053 6.479817 0.7027642
## 4     0.5209419 5.854408   5.184813 6.177602 0.9927887
## 5     0.4930754 6.369262   5.760632 6.265294 0.5046621
## 6     0.5252955 6.505504   5.144099 6.755950 1.6118505

Question 5:

median_tea <- median(data$tea_sown, na.rm = TRUE)
median_orchard <- median(data$orchard_sown, na.rm = TRUE)
tea <- ifelse(data$tea_sown > median_tea, 1, 0)
orchard <- ifelse(data$orchard_sown > median_orchard, 1, 0)
table(tea)

## tea
##    0    1 
## 1050 1050

table(orchard)

## orchard
##    0    1 
## 1050 1050

5.1)

mean_male_tea <- mean(data$fraction_male[tea == 1], na.rm = TRUE)
mean_male_orchard <- mean(data$fraction_male[orchard == 1], na.rm = TRUE)
print(mean_male_tea)

## [1] 0.5061103

print(mean_male_orchard)

## [1] 0.5106231

The mean fraction of males in countries that have above-median tea sown areas is 0.506 and the mean fraction of males in countries that have above-median orchard sown areas is 0.511.

5.2) No, we cannot interpret the difference in the fraction of males between tea-intensive and orchard-intensive regions as causal evidence that increased female income improves survival rates for girls. Correlation does not imply causation—while the amount of tea or orchard sown may be associated with gender outcomes, this does not mean it directly affects girls’ survival rates. There may also be selection bias, as households in orchard-dominant areas might hold different gender norms or place varying importance on investing in girls’ education. These differences could influence outcomes independently of income effects. Although randomization could address this issue, it is not present in this context. Finally, examining only the fraction of males does not isolate the impact of women’s income on female child survival, making it an inadequate measure for establishing causality.

Question 6 and 6.1:

post <- ifelse(data$birth_year > 1979, 1, 0)
mean_male_pre79 <- mean(data$fraction_male[post == 0], na.rm = TRUE)
mean_male_post79 <- mean(data$fraction_male[post == 1], na.rm = TRUE)
print(mean_male_pre79)

## [1] 0.5104829

print(mean_male_post79)

## [1] 0.5074845

The mean fraction of males born up until 1979 is 0.51 and the fraction of males born after 1979 is 0.507.

6.2) No, we cannot interpret the change in the fraction of males born before and after 1979 as causal evidence. Several confounding factors—such as policy shifts, urbanization, improvements in healthcare, and other social changes—could influence survival rates independently of women’s income. For instance, as highlighted in the article, the introduction of the One Child Policy in China could significantly affect gender ratios and household birth decisions. Additionally, relying on average male birth fractions does not provide a direct measure of female survival rates, as there is no clear causal link between the two.

Question 7

7.1)

library(dplyr)
library(tidyr)
median_tea <- median(data$tea_sown, na.rm = TRUE)
median_orchard <- median(data$orchard_sown, na.rm = TRUE)

data <- data %>%
  mutate(
    tea = ifelse(tea_sown > median_tea, 1, 0),
    orchard = ifelse(orchard_sown > median_orchard, 1, 0),
    post = ifelse(birth_year > 1979, 1, 0)
  )

generate_did_table <- function(df, treatment_var, outcome_var = "fraction_male") {
  table_data <- df %>%
    group_by(!!sym(treatment_var), post) %>%
    summarise(mean_outcome = mean(.data[[outcome_var]], na.rm = TRUE)) %>%
    pivot_wider(names_from = post, values_from = mean_outcome, names_prefix = "post_")

  treated_pre  <- table_data %>% filter(!!sym(treatment_var) == 1) %>% pull(post_0)
  treated_post <- table_data %>% filter(!!sym(treatment_var) == 1) %>% pull(post_1)
  control_pre  <- table_data %>% filter(!!sym(treatment_var) == 0) %>% pull(post_0)
  control_post <- table_data %>% filter(!!sym(treatment_var) == 0) %>% pull(post_1)

  DiD <- (treated_post - treated_pre) - (control_post - control_pre)

  cat("\n----------------------------------------\n")
  cat(paste("Difference-in-Differences Table:", treatment_var))
  cat("\n----------------------------------------\n")
  cat(sprintf("                | Pre-1979 | Post-1979\n"))
  cat("----------------------------------------\n")
  cat(sprintf("Treated group   |  %.4f |  %.4f\n", treated_pre, treated_post))
  cat(sprintf("Control group   |  %.4f |  %.4f\n", control_pre, control_post))
  cat("----------------------------------------\n")
  cat(sprintf("DiD estimate    |          |  %.4f\n", DiD))
  cat("----------------------------------------\n")
}

generate_did_table(data, "tea")

## `summarise()` has grouped output by 'tea'. You can override using the `.groups`
## argument.

## 
## ----------------------------------------
## Difference-in-Differences Table: tea
## ----------------------------------------
##                 | Pre-1979 | Post-1979
## ----------------------------------------
## Treated group   |  0.5109 |  0.5017
## Control group   |  0.5100 |  0.5133
## ----------------------------------------
## DiD estimate    |          |  -0.0125
## ----------------------------------------

generate_did_table(data, "orchard")

## `summarise()` has grouped output by 'orchard'. You can override using the
## `.groups` argument.

## 
## ----------------------------------------
## Difference-in-Differences Table: orchard
## ----------------------------------------
##                 | Pre-1979 | Post-1979
## ----------------------------------------
## Treated group   |  0.5109 |  0.5104
## Control group   |  0.5101 |  0.5046
## ----------------------------------------
## DiD estimate    |          |  0.0050
## ----------------------------------------

7.2) The results are not consistent with the unitary household model, which assumes that only total household income matters, regardless of who earns it. The DiD estimate for tea counties (where women’s income increased) is -0.0125, indicating a decrease in the male fraction and improved survival of girls. In contrast, the DiD for orchard counties (where men’s income increased) is +0.0050, showing no similar improvement. This suggests that the identity of the income earner affects household decisions, supporting a non-unitary model.

Question 8:

generate_did_education_table <- function(df, treatment_var, outcome_var, label) {
  table_data <- df %>%
    group_by(!!sym(treatment_var), post) %>%
    summarise(mean_outcome = mean(.data[[outcome_var]], na.rm = TRUE), .groups = "drop") %>%
    pivot_wider(names_from = post, values_from = mean_outcome, names_prefix = "post_")
  
  treated_pre  <- table_data %>% filter(!!sym(treatment_var) == 1) %>% pull(post_0)
  treated_post <- table_data %>% filter(!!sym(treatment_var) == 1) %>% pull(post_1)
  control_pre  <- table_data %>% filter(!!sym(treatment_var) == 0) %>% pull(post_0)
  control_post <- table_data %>% filter(!!sym(treatment_var) == 0) %>% pull(post_1)
  
  DiD <- (treated_post - treated_pre) - (control_post - control_pre)
  
  cat("\n------------------------------------------------------\n")
  cat(paste("Difference-in-Differences Table:", label))
  cat("\n------------------------------------------------------\n")
  cat(sprintf("                | Pre-1979 | Post-1979\n"))
  cat("------------------------------------------------------\n")
  cat(sprintf("Treated group   |  %.4f |  %.4f\n", treated_pre, treated_post))
  cat(sprintf("Control group   |  %.4f |  %.4f\n", control_pre, control_post))
  cat("------------------------------------------------------\n")
  cat(sprintf("DiD estimate    |          |  %.4f\n", DiD))
  cat("------------------------------------------------------\n")
}

generate_did_education_table(data, "tea", "edu_female", "Girls' Education (Tea Treatment)")

## 
## ------------------------------------------------------
## Difference-in-Differences Table: Girls' Education (Tea Treatment)
## ------------------------------------------------------
##                 | Pre-1979 | Post-1979
## ------------------------------------------------------
## Treated group   |  5.4880 |  5.5913
## Control group   |  5.4940 |  5.3618
## ------------------------------------------------------
## DiD estimate    |          |  0.2355
## ------------------------------------------------------

generate_did_education_table(data, "tea", "edu_male", "Boys' Education (Tea Treatment)")

## 
## ------------------------------------------------------
## Difference-in-Differences Table: Boys' Education (Tea Treatment)
## ------------------------------------------------------
##                 | Pre-1979 | Post-1979
## ------------------------------------------------------
## Treated group   |  6.5092 |  6.6895
## Control group   |  6.4894 |  6.5190
## ------------------------------------------------------
## DiD estimate    |          |  0.1507
## ------------------------------------------------------

8.2) The results show that girls’ education improved more than boys’ in response to increased female income from tea cultivation: the DiD estimate for girls is 0.2355, compared to 0.1507 for boys. This suggests that when women control more resources, households allocate relatively more toward educating daughters. As a researcher advising the Chinese government, I would recommend targeting cash transfers to women, as doing so is more likely to improve educational outcomes for girls. Additionally, unconditional transfers tied to regions or sectors that employ women may help reduce gender disparities in education by shifting intra-household priorities.

Question 9: Utility Functions: Uw (E1 , s1 ) = mw (ln(E1 ) + b · s1 ) − h(s1 ) 2 Uh(E1, s1) = mh(ln(E1) + b · s1) − h(s1) 2 Cost Function: h ( s 1 ) = r · s 1 + ψ · s 21 2

9.1) First Order Condition: dU =0m·b−(r+ψ·s1)=0 ds1 Solving for the optimal educational investment: s ∗1 = m · b − r ψ Substituting the given parameters: b=1, r=0.5, ψ=0.6, mw =1.5, mh =1.0

s∗1(wife) = (2·1.5·1−0.5)/0.6 = 4.1667

s∗1(husband) = (2·1.0·1−0.5)/0.6 = 2.5

This shows that the wife would choose a higher level of educational investment compared to the husband. This result reflects the fact that women tend to place a higher value on their children’s education, particularly for daughters.

9.2) According to the results in Question 8, we found that girls’ education increased more than boys’ when income was controlled by women (in tea-intensive counties). Since the optimal level of education depends on the coefficient \(m\), which reflects how much a parent values their child’s future income, the higher investment in girls’ education implies that: m_w > m_h. That is, mothers place a greater weight on their daughters’ future earnings than fathers do.