Econ 3210 — Assignment #1 Ryan Ciacoi-Pop

Use this .rmd template to answer the 3 questions below. Once you have done this, you can knit the document to HTML and upload this document to moodle/e-class. I will only accept HTML documents, no exceptions.

Please read all of the assignment before attempting it.

Introduction

This assignment is meant to be a tutorial on using R markdown to “knit” together text, and statistical input and output. There will be a short video that accompanies this document to help get you started.

Load packages and data

# Load tidyverse for data manipulation, AER for the textbook data
if (!requireNamespace('AER', quietly = TRUE)) install.packages('AER')
if (!requireNamespace('lmtest', quietly = TRUE)) install.packages('lmtest')
if (!requireNamespace('sandwich', quietly = TRUE)) install.packages('sandwich')
library(tidyverse)
library(AER)
library(lmtest)
library(sandwich)

# Load dataset
data('CASchools')
# Make a working data frame and create variables as described
df <- CASchools %>% 
  filter(complete.cases(.)) %>% 
  mutate(score = (math + read) / 2,
         str   = students / teachers,
         D = ifelse(str < 19, 1, 0))

# Quick glimpse
glimpse(df)

## Rows: 420
## Columns: 17
## $ district    <chr> "75119", "61499", "61549", "61457", "61523", "62042", "685…
## $ school      <chr> "Sunol Glen Unified", "Manzanita Elementary", "Thermalito …
## $ county      <fct> Alameda, Butte, Butte, Butte, Butte, Fresno, San Joaquin, …
## $ grades      <fct> KK-08, KK-08, KK-08, KK-08, KK-08, KK-08, KK-08, KK-08, KK…
## $ students    <dbl> 195, 240, 1550, 243, 1335, 137, 195, 888, 379, 2247, 446, …
## $ teachers    <dbl> 10.90, 11.15, 82.90, 14.00, 71.50, 6.40, 10.00, 42.50, 19.…
## $ calworks    <dbl> 0.5102, 15.4167, 55.0323, 36.4754, 33.1086, 12.3188, 12.90…
## $ lunch       <dbl> 2.0408, 47.9167, 76.3226, 77.0492, 78.4270, 86.9565, 94.62…
## $ computer    <dbl> 67, 101, 169, 85, 171, 25, 28, 66, 35, 0, 86, 56, 25, 0, 3…
## $ expenditure <dbl> 6384.911, 5099.381, 5501.955, 7101.831, 5235.988, 5580.147…
## $ income      <dbl> 22.690001, 9.824000, 8.978000, 8.978000, 9.080333, 10.4150…
## $ english     <dbl> 0.000000, 4.583333, 30.000002, 0.000000, 13.857677, 12.408…
## $ read        <dbl> 691.6, 660.5, 636.3, 651.9, 641.8, 605.7, 604.5, 605.5, 60…
## $ math        <dbl> 690.0, 661.9, 650.9, 643.5, 639.9, 605.4, 609.0, 612.5, 61…
## $ score       <dbl> 690.80, 661.20, 643.60, 647.70, 640.85, 605.55, 606.75, 60…
## $ str         <dbl> 17.88991, 21.52466, 18.69723, 17.35714, 18.67133, 21.40625…
## $ D           <dbl> 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0…

Difference-in-means calculations

Compute the sample mean of score for small (D==1) and large (D==0) classes, and the difference.

# Conditional means
Y_1 <- mean(df$score[df$D == 1], na.rm = TRUE)
Y_0 <- mean(df$score[df$D == 0], na.rm = TRUE)
mu_1 <- Y_1 - Y_0

# Print results
Y_1

## [1] 660.0421

Y_0

## [1] 651.2452

mu_1

## [1] 8.796891

Standard error of the difference and t-statistic

Compute the standard error using the formula in the assignment (equation 3.19 style), then compute the t-statistic for testing H0: μ1 = 0.

# Variances of the groups (sample variances)
var_1 <- var(df$score[df$D == 1], na.rm = TRUE) / sum(df$D == 1)
var_0 <- var(df$score[df$D == 0], na.rm = TRUE) / sum(df$D == 0)
se_difference <- sqrt(var_1 + var_0)

# t-stat
t_stat <- mu_1 / se_difference

# Print
var_1

## [1] 3.032092

var_0

## [1] 1.102584

se_difference

## [1] 2.03339

t_stat

## [1] 4.326218

Question 1

Interpret the t-statistic above:

S-statistic measures how many standard errors the sample difference in means, not at zero. Greater absolute value means a stronger evidence against the null hypothesis. Using a conventional significance threshold the t-statistic reported above is great enough in absolute value to reject the null that there is no difference .

Question 2

State the assumption required about the relationship between u and D to interpret the difference in means as causal.

To interpret the difference we must know that D: \[E(u\mid D=1) = E(u\mid D=0).\] Also with D, there should be no systematic differences that affect score. Small vs large class must be as good as a random selection with respect to unobserved determinants of scores.

Question 3

Using the same methods as above, test whether schools with smaller class sizes are in higher income districts. What do you think this tells you about the causal interpretation of the difference in means comparison in test scores between schools with smaller and larger classes above?

# Compute mean income by D
income_Y1 <- mean(df$income[df$D == 1], na.rm = TRUE)
income_Y0 <- mean(df$income[df$D == 0], na.rm = TRUE)

income_diff <- income_Y1 - income_Y0

# Compute standard error for income difference (same formula)
var_inc_1 <- var(df$income[df$D == 1], na.rm = TRUE) / sum(df$D == 1)
var_inc_0 <- var(df$income[df$D == 0], na.rm = TRUE) / sum(df$D == 0)
se_inc_diff <- sqrt(var_inc_1 + var_inc_0)

t_income <- income_diff / se_inc_diff

# Print
income_Y1

## [1] 17.48257

income_Y0

## [1] 14.24516

income_diff

## [1] 3.237412

se_inc_diff

## [1] 0.9064769

t_income

## [1] 3.571422

# Also show a quick regression
coeftest(lm(income ~ D, data = df), vcov = sandwich)

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 14.24516    0.29342 48.5482 < 2.2e-16 ***
## D            3.23741    0.90338  3.5836 0.0003788 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This shows the sample means of income by class-size group, their difference, and a t-statistic for that difference. In the California schools data the mean income for small-class schools (D=1) is higher than for the large-class group (D=0), and the difference is statistically significant. This suggests that smaller-class schools are located in higher-income districts on average. Because district income is related to unobserved determinants of student performance this indicates that the simple difference-in-means estimate of the effect of small classes on score is confounded by income.

Put your code here

(You can add additional checks or plots below if you like.)