Part I: Basic fixed effects model

The goal of this question is to gain a fundamental understanding of implementing the fixed effects model. You will be working with the rent.dta dataset, which includes data on rental prices and town characteristics for the years 1980 and 1990. Specifically, the focus is determining the impact of the student population proportion on rental prices. To accomplish this, the fixed effects model will be utilized and presented as:

ln (rent_it ) =β_0+δ_0 y90_t+β_1 ln (po〖pulation〗_it ) +β_2 ln (inc_it ) +β_3 st〖udentpop〗_it+α_i+u_it,

where population is city population, inc is the average income, and studentpop is the percentage of the city population that consists of students.

#Reading in and viewing data
rent_df <- read_dta("rent.dta")

str(rent_df)
## tibble [128 × 16] (S3: tbl_df/tbl/data.frame)
##  $ city        : num [1:128] 1 1 2 2 3 3 4 4 5 5 ...
##   ..- attr(*, "label")= chr "city label, 1 to 64"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ year        : num [1:128] 80 90 80 90 80 90 80 90 80 90 ...
##   ..- attr(*, "label")= chr "80 or 90"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ population  : num [1:128] 75211 77759 106743 141865 36608 ...
##   ..- attr(*, "label")= chr "city population"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ enroll      : num [1:128] 15303 18017 22462 29769 11847 ...
##   ..- attr(*, "label")= chr "# college students enrolled"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ rent        : num [1:128] 197 342 323 496 216 351 267 588 475 925 ...
##   ..- attr(*, "label")= chr "average rent"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ rnthsg      : num [1:128] 13475 15660 14580 26895 7026 ...
##   ..- attr(*, "label")= chr "renter occupied units"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ tothsg      : num [1:128] 26167 29467 37277 55540 13482 ...
##   ..- attr(*, "label")= chr "occupied housing units"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ inc         : num [1:128] 11537 19568 19841 31885 11455 ...
##   ..- attr(*, "label")= chr "per capita income"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ lenroll     : num [1:128] 9.64 9.8 10.02 10.3 9.38 ...
##   ..- attr(*, "label")= chr "ln(enroll)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ lnpopulation: num [1:128] 11.2 11.3 11.6 11.9 10.5 ...
##   ..- attr(*, "label")= chr "ln(population)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ lnrent      : num [1:128] 5.28 5.83 5.78 6.21 5.38 ...
##   ..- attr(*, "label")= chr "ln(rent)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ ltothsg     : num [1:128] 10.17 10.29 10.53 10.92 9.51 ...
##   ..- attr(*, "label")= chr "ln(tothsg)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ lrnthsg     : num [1:128] 9.51 9.66 9.59 10.2 8.86 ...
##   ..- attr(*, "label")= chr "ln(rnthsg)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ lninc       : num [1:128] 9.35 9.88 9.9 10.37 9.35 ...
##   ..- attr(*, "label")= chr "ln(inc)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ studentpop  : num [1:128] 20.3 23.2 21 21 32.4 ...
##   ..- attr(*, "label")= chr "percent of population students"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ y90         : num [1:128] 0 1 0 1 0 1 0 1 0 1 ...
##   ..- attr(*, "label")= chr "=1 if year == 90"
##   ..- attr(*, "format.stata")= chr "%9.0g"
summary(rent_df)
##       city            year      population         enroll           rent      
##  Min.   : 1.00   Min.   :80   Min.   : 25728   Min.   : 5645   Min.   :186.0  
##  1st Qu.:16.75   1st Qu.:80   1st Qu.: 40927   1st Qu.:11827   1st Qu.:233.5  
##  Median :32.50   Median :85   Median : 62094   Median :16912   Median :324.0  
##  Mean   :32.50   Mean   :85   Mean   : 84645   Mean   :18459   Mean   :331.4  
##  3rd Qu.:48.25   3rd Qu.:90   3rd Qu.:101221   3rd Qu.:22752   3rd Qu.:401.5  
##  Max.   :64.00   Max.   :90   Max.   :632910   Max.   :74047   Max.   :925.0  
##      rnthsg           tothsg            inc           lenroll      
##  Min.   :  4062   Min.   :  7130   Min.   : 9262   Min.   : 8.639  
##  1st Qu.:  8218   1st Qu.: 14669   1st Qu.:13508   1st Qu.: 9.378  
##  Median : 11062   Median : 22685   Median :17416   Median : 9.736  
##  Mean   : 16699   Mean   : 35667   Mean   :18912   Mean   : 9.704  
##  3rd Qu.: 21224   3rd Qu.: 38863   3rd Qu.:23213   3rd Qu.:10.032  
##  Max.   :137242   Max.   :560011   Max.   :56307   Max.   :11.212  
##   lnpopulation       lnrent         ltothsg          lrnthsg      
##  Min.   :10.16   Min.   :5.226   Min.   : 8.872   Min.   : 8.309  
##  1st Qu.:10.62   1st Qu.:5.453   1st Qu.: 9.594   1st Qu.: 9.014  
##  Median :11.04   Median :5.781   Median :10.029   Median : 9.311  
##  Mean   :11.11   Mean   :5.746   Mean   :10.118   Mean   : 9.469  
##  3rd Qu.:11.53   3rd Qu.:5.995   3rd Qu.:10.568   3rd Qu.: 9.963  
##  Max.   :13.36   Max.   :6.830   Max.   :13.236   Max.   :11.830  
##      lninc          studentpop          y90     
##  Min.   : 9.134   Min.   : 9.941   Min.   :0.0  
##  1st Qu.: 9.511   1st Qu.:16.983   1st Qu.:0.0  
##  Median : 9.765   Median :23.629   Median :0.5  
##  Mean   : 9.788   Mean   :27.415   Mean   :0.5  
##  3rd Qu.:10.052   3rd Qu.:35.541   3rd Qu.:1.0  
##  Max.   :10.939   Max.   :71.210   Max.   :1.0

I.1

Run the pooled OLS version [meaning that there are no city fixed effects and thus all observations are “pooled” as if they were all in one city] of the fixed effects model above (no need to report individual coefficients). Report how you interpret the coefficient of studentpop (β_3).

#Run the pooled OLS
model_rent_ols <- rent_df %>% lm(lnrent~ y90 + lnpopulation + lninc + studentpop,.)
summary(model_rent_ols)
## 
## Call:
## lm(formula = lnrent ~ y90 + lnpopulation + lninc + studentpop, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24233 -0.07824 -0.01642  0.04389  0.48082 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.568807   0.534881  -1.063   0.2897    
## y90           0.262227   0.034763   7.543 8.78e-12 ***
## lnpopulation  0.040686   0.022515   1.807   0.0732 .  
## lninc         0.571446   0.053098  10.762  < 2e-16 ***
## studentpop    0.005044   0.001019   4.949 2.40e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1259 on 123 degrees of freedom
## Multiple R-squared:  0.8613, Adjusted R-squared:  0.8568 
## F-statistic: 190.9 on 4 and 123 DF,  p-value: < 2.2e-16

For a marginal percentage increase in the student population, there is an estimated 0.5% increase in rent prices.

I.2

Based on the results in #1, interpret the coefficient of y90 (δ_0).

The coefficient of y90 suggests that rent in year 1990, as compared to 1980, has a premium of 26%. This change iss essentially the result of the just time variation, or trend.

I.3

3. Do you think that the estimated standard errors in the results in #1 are valid? Explain. Hint: Estimated standard errors can be biased if there is serial correlation in the error term. Consider factors included in the error term and whether you expect the presence of serial correlation or not.

No, because we should expect serial correlation in observations across the same city being subject to similar time trends. For example, just as rent generally is subject to inflationary trends, the population may be increasing or decreasing non-randomly over time.

I.4

Estimate the demeaned model of the fixed effects model. That is, estimate: *(rent_it ) =δ_0 Δy90_t+β_1 (po〖pulation〗_it ) +β_2 (inc_it ) +β_3 Δst〖udentpop〗_it+u_it, where ΔX_it=(X_it-X ̅_i ) or demeaned values. Note that the intercept is dropped out in this model.*

#Create DF with mean data by city
citymean_rent_df <- rent_df %>% aggregate(. ~ city, mean)

#Merge initial data with mean data by city then create demeaned variables
demeaned_rent_df <- merge(rent_df, citymean_rent_df, by = "city", all.x = TRUE)
demeaned_rent_df$delta_lnrent <- demeaned_rent_df$lnrent.x - demeaned_rent_df$lnrent.y
demeaned_rent_df$delta_y90 <- demeaned_rent_df$y90.x - demeaned_rent_df$y90.y
demeaned_rent_df$delta_lnpopulation <- demeaned_rent_df$lnpopulation.x - demeaned_rent_df$lnpopulation.y
demeaned_rent_df$delta_lninc <- demeaned_rent_df$lninc.x - demeaned_rent_df$lninc.y
demeaned_rent_df$delta_studentpop <- demeaned_rent_df$studentpop.x - demeaned_rent_df$studentpop.y
head(demeaned_rent_df)
##   city year.x population.x enroll.x rent.x rnthsg.x tothsg.x inc.x lenroll.x
## 1    1     80        75211    15303    197    13475    26167 11537  9.635804
## 2    1     90        77759    18017    342    15660    29467 19568  9.799071
## 3    2     80       106743    22462    323    14580    37277 19841 10.019580
## 4    2     90       141865    29769    496    26895    55540 31885 10.301223
## 5    3     80        36608    11847    216     7026    13482 11455  9.379830
## 6    3     90        42099    10265    351     9557    16894 21202  9.236495
##   lnpopulation.x lnrent.x ltothsg.x lrnthsg.x   lninc.x studentpop.x y90.x
## 1       11.22805 5.283204 10.172255  9.508592  9.353314     20.34676     0
## 2       11.26137 5.834811 10.291026  9.658865  9.881651     23.17031     1
## 3       11.57818 5.777652 10.526132  9.587406  9.895506     21.04307     0
## 4       11.86263 6.206576 10.924859 10.199696 10.369891     20.98403     1
## 5       10.50802 5.375278  9.509110  8.857373  9.346182     32.36178     0
## 6       10.64778 5.860786  9.734714  9.165030  9.961851     24.38300     1
##   year.y population.y enroll.y rent.y rnthsg.y tothsg.y   inc.y lenroll.y
## 1     85      76485.0  16660.0  269.5  14567.5  27817.0 15552.5  9.717438
## 2     85      76485.0  16660.0  269.5  14567.5  27817.0 15552.5  9.717438
## 3     85     124304.0  26115.5  409.5  20737.5  46408.5 25863.0 10.160401
## 4     85     124304.0  26115.5  409.5  20737.5  46408.5 25863.0 10.160401
## 5     85      39353.5  11056.0  283.5   8291.5  15188.0 16328.5  9.308163
## 6     85      39353.5  11056.0  283.5   8291.5  15188.0 16328.5  9.308163
##   lnpopulation.y lnrent.y ltothsg.y lrnthsg.y   lninc.y studentpop.y y90.y
## 1       11.24471 5.559007 10.231640  9.583728  9.617483     21.75853   0.5
## 2       11.24471 5.559007 10.231640  9.583728  9.617483     21.75853   0.5
## 3       11.72041 5.992114 10.725495  9.893551 10.132699     21.01355   0.5
## 4       11.72041 5.992114 10.725495  9.893551 10.132699     21.01355   0.5
## 5       10.57790 5.618032  9.621912  9.011201  9.654016     28.37239   0.5
## 6       10.57790 5.618032  9.621912  9.011201  9.654016     28.37239   0.5
##   delta_lnrent delta_y90 delta_lnpopulation delta_lninc delta_studentpop
## 1   -0.2758036      -0.5        -0.01665831  -0.2641683      -1.41177559
## 2    0.2758036       0.5         0.01665831   0.2641683       1.41177559
## 3   -0.2144618      -0.5        -0.14222574  -0.2371926       0.02951622
## 4    0.2144618       0.5         0.14222574   0.2371926      -0.02951622
## 5   -0.2427540      -0.5        -0.06987858  -0.3078346       3.98938847
## 6    0.2427540       0.5         0.06987858   0.3078346      -3.98938847
demeaned_rent_df_model <-demeaned_rent_df %>% 
  mutate(across(.cols=c(delta_lnrent:delta_studentpop),~ifelse(y90.x == 0, NA, .)))
demeaned_rent_df_model <- na.omit(demeaned_rent_df_model)
head(demeaned_rent_df_model)
##    city year.x population.x enroll.x rent.x rnthsg.x tothsg.x inc.x lenroll.x
## 2     1     90        77759    18017    342    15660    29467 19568  9.799071
## 4     2     90       141865    29769    496    26895    55540 31885 10.301223
## 6     3     90        42099    10265    351     9557    16894 21202  9.236495
## 8     4     90        46209    18173    588    10617    17926 29044  9.807693
## 10    5     90       110330    18205    925    15112    40257 56307  9.809451
## 12    6     90       132605    15192    630    26972    50199 35103  9.628524
##    lnpopulation.x lnrent.x ltothsg.x lrnthsg.x   lninc.x studentpop.x y90.x
## 2        11.26137 5.834811 10.291026  9.658865  9.881651     23.17031     1
## 4        11.86263 6.206576 10.924859 10.199696 10.369891     20.98403     1
## 6        10.64778 5.860786  9.734714  9.165030  9.961851     24.38300     1
## 8        10.74093 6.376727  9.794007  9.270212 10.276567     39.32784     1
## 10       11.61123 6.829794 10.603039  9.623244 10.938574     16.50050     1
## 12       11.79513 6.445720 10.823750 10.202555 10.466042     11.45658     1
##    year.y population.y enroll.y rent.y rnthsg.y tothsg.y   inc.y lenroll.y
## 2      85      76485.0  16660.0  269.5  14567.5  27817.0 15552.5  9.717438
## 4      85     124304.0  26115.5  409.5  20737.5  46408.5 25863.0 10.160401
## 6      85      39353.5  11056.0  283.5   8291.5  15188.0 16328.5  9.308163
## 8      85      41424.5  15999.0  427.5   9256.5  15983.5 21863.0  9.670963
## 10     85      86232.0  13190.0  700.0  10454.0  30797.0 43803.5  9.409143
## 12     85     125577.5  13707.5  453.0  26267.0  48627.5 25697.0  9.519799
##    lnpopulation.y lnrent.y ltothsg.y lrnthsg.y   lninc.y studentpop.y y90.y
## 2        11.24471 5.559007 10.231640  9.583728  9.617483     21.75853   0.5
## 4        11.72041 5.992114 10.725495  9.893551 10.132699     21.01355   0.5
## 6        10.57790 5.618032  9.621912  9.011201  9.654016     28.37239   0.5
## 8        10.62491 5.981988  9.671872  9.122162  9.935472     38.52991   0.5
## 10       11.32414 6.496554 10.285618  9.144084 10.644974     14.82877   0.5
## 12       11.73911 6.033060 10.791422 10.175708 10.082205     10.88350   0.5
##    delta_lnrent delta_y90 delta_lnpopulation delta_lninc delta_studentpop
## 2     0.2758036       0.5         0.01665831   0.2641683       1.41177559
## 4     0.2144618       0.5         0.14222574   0.2371926      -0.02951622
## 6     0.2427540       0.5         0.06987858   0.3078346      -3.98938847
## 8     0.3947392       0.5         0.11601686   0.3410950       0.79792404
## 10    0.3332396       0.5         0.28709126   0.2936001       1.67172527
## 12    0.4126594       0.5         0.05601978   0.3838367       0.57308197
#Model effects
model_rent_demean <- demeaned_rent_df_model %>% lm(delta_lnrent~0+ delta_y90 + delta_lnpopulation + delta_lninc + delta_studentpop,.)
summary(model_rent_demean)
## 
## Call:
## lm(formula = delta_lnrent ~ 0 + delta_y90 + delta_lnpopulation + 
##     delta_lninc + delta_studentpop, data = .)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.093486 -0.031080 -0.007192  0.027591  0.118915 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## delta_y90          0.385521   0.036824  10.469 3.66e-15 ***
## delta_lnpopulation 0.072246   0.088343   0.818  0.41671    
## delta_lninc        0.309961   0.066477   4.663 1.79e-05 ***
## delta_studentpop   0.011203   0.004132   2.711  0.00873 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04506 on 60 degrees of freedom
## Multiple R-squared:  0.9765, Adjusted R-squared:  0.975 
## F-statistic: 624.1 on 4 and 60 DF,  p-value: < 2.2e-16

How do you compare the estimated coefficient of studentpop (β_3) with the estimated coefficient of studentpop in #1?

The demeaned coefficient for student population shows a larger effect than in question 1 (1.1% vs 0.5%). After fixing for city effects, this model suggests that a percentage change in student population has 1.1% effect on rent prices.

I.5

5. Do you think that the estimated coefficient of studentpop estimates the causal impact of student population on rental prices?

I think the estimated coefficient in this model may still be confounded by characteristic related to the city that are omitted. For example, if local policies shift in terms of how many student can reside in a single residence, or if local universities build new dorms that mollify demand for local housing then those time factors by city have not been controlled for.

I.6

Estimate the fixed effects model using PLM in R

#Run panel model with fixed effects by city
rent_model_plm <- rent_df %>% plm(lnrent~ y90 + lnpopulation + lninc + studentpop,., index = "city")
summary(rent_model_plm)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = lnrent ~ y90 + lnpopulation + lninc + studentpop, 
##     data = ., index = "city")
## 
## Balanced Panel: n = 64, T = 2, N = 128
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -0.118915 -0.029559  0.000000  0.029559  0.118915 
## 
## Coefficients:
##               Estimate Std. Error t-value  Pr(>|t|)    
## y90          0.3855214  0.0368245 10.4692 3.661e-15 ***
## lnpopulation 0.0722456  0.0883426  0.8178  0.416714    
## lninc        0.3099605  0.0664771  4.6627 1.788e-05 ***
## studentpop   0.0112033  0.0041319  2.7114  0.008726 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    10.383
## Residual Sum of Squares: 0.24368
## R-Squared:      0.97653
## Adj. R-Squared: 0.95032
## F-statistic: 624.146 on 4 and 60 DF, p-value: < 2.22e-16

Produced the same results using the fixed effects PLM model as with demeaned model in I.4

Part II: Applying the fixed effects model

Davis (2004) estimates marginal willingness to pay (MWTP) for changes in health risk. In particular, he focuses on a sudden and sharp increase in the incidence of pediatric leukemia only in Churchill County, whereas Lyon County, which shares similar characteristics, did not experience the shock to health risks. Although MWTP for health risk is not directly observed in the market, if the level of risk varies across locations, and if households are mobile, then demand will be capitalized into property values. That is, housing values are a function of the value of individual component of the houses (i.e., square footage, number of bedrooms, year built) as well as local environmental amenities (i.e., air quality and health risks). Thus, the gradient of this function with respect to health risk is equal to household MWTP for an incremental change in risk. This is known as the hedonic pricing method.

Key Variables The data (davis2004.dta) contain 10,204 house transactions data in Churchill County and Lyon County between 1990 and 2002. lnprice = log(real sales price) cancerRisk = cancer risk (0 before 2000 when the first case of leukemia was recorded and then increases to 1 in 2002 in Churchill County).
houseID = the unique identifier for a house. churchill = A dummy for Churchill County, = 0 for Lyon County

II.1 What are the general issues in applying housing values in the hedonic pricing method for estimating the value of health risks, e.g., a simple OLS of regressing house prices onto health risks?

Estimating the MWTP of health risks by virtue of housing values will be prone to several internal validity concerns. First, the assumption that housing is liquid/mobile is reliant on macroeconomic conditions that could confound results - information is likely asymetrical and prone to the nonrandomized interests of those who choose a location. Additionally, there is the challenge of distinguishing the impact of health as compared to the varied amenities offered by a specific location - for example more industrial locations may have lower values because of the impact on the beauty of the area, but industry can also contribute to poor health outcomes due to air pollution. It becomes unclear which is the causal effect. I also suspect potential reverse causality, are individuals with poor health priced out of the better location because of the impact of their health on earning potentials. Therefore, worse health outcomes would accrue to lower value areas.

#Reading in and viewing data
davis_df <- read_dta("davis2004.dta")

str(davis_df)
## tibble [10,204 × 24] (S3: tbl_df/tbl/data.frame)
##  $ houseID   : num [1:10204] 2e+08 2e+08 2e+08 2e+08 2e+08 ...
##   ..- attr(*, "label")= chr "id that uniquely identifies house"
##   ..- attr(*, "format.stata")= chr "%12.0g"
##  $ churchill : num [1:10204] 1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "label")= chr "dummy for Churchill County (1=churchill county, 0=Lyon County)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ year      : num [1:10204] 1995 1991 1992 1994 1998 ...
##   ..- attr(*, "label")= chr "year of sale"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ month     : num [1:10204] 5 12 3 12 11 3 11 9 12 10 ...
##   ..- attr(*, "label")= chr "month of sale"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ day       : num [1:10204] 19 27 19 30 4 29 22 25 18 15 ...
##   ..- attr(*, "label")= chr "day of sale"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ lotsize   : num [1:10204] 0.19 0 0.16 0.22 0.22 ...
##   ..- attr(*, "label")= chr "lot size in acres"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ floorsize : num [1:10204] 11.52 8.88 14.77 17.23 17.23 ...
##   ..- attr(*, "label")= chr "floor space of house in square foot"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ age       : num [1:10204] 20 49 8 9 13 9 6 11 14 5 ...
##   ..- attr(*, "label")= chr "age of house in years"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ price     : num [1:10204] 88456 70167 126822 142022 129887 ...
##   ..- attr(*, "label")= chr "real sales price"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ cancerRisk: num [1:10204] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "cancer risk (0 before 2000, than linear increase to 1 in 2002)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ class     : num [1:10204] 2 1 2 2.5 2.5 2 2 2 2 2 ...
##   ..- attr(*, "label")= chr "mean class (range 1-5)"
##   ..- attr(*, "format.stata")= chr "%10.0g"
##  $ lnprice   : num [1:10204] 11.4 11.2 11.8 11.9 11.8 ...
##   ..- attr(*, "label")= chr "log(real sales price)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ group     : num [1:10204] 218 177 180 213 260 204 176 222 261 163 ...
##   ..- attr(*, "label")= chr "group(churchill year month)"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ agesq     : num [1:10204] 400 2401 64 81 169 ...
##   ..- attr(*, "label")= chr "age of house in years, squared"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ lotsizesq : num [1:10204] 0.0361 0 0.0256 0.0484 0.0484 ...
##   ..- attr(*, "label")= chr "lot size in acres, squared"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ cl_1      : num [1:10204] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "class==     0.0000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ cl_2      : num [1:10204] 0 1 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "class==     1.0000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ cl_3      : num [1:10204] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "class==     1.5000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ cl_4      : num [1:10204] 1 0 1 0 0 1 1 1 1 1 ...
##   ..- attr(*, "label")= chr "class==     2.0000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ cl_5      : num [1:10204] 0 0 0 1 1 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "class==     2.5000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ cl_6      : num [1:10204] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "class==     3.0000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ cl_7      : num [1:10204] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "class==     3.5000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ cl_8      : num [1:10204] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "class==     4.0000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
##  $ cl_9      : num [1:10204] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "class==     4.5000"
##   ..- attr(*, "format.stata")= chr "%8.0g"
summary(davis_df)
##     houseID            churchill           year          month       
##  Min.   :200101119   Min.   :0.0000   Min.   :1990   Min.   : 1.000  
##  1st Qu.:200832208   1st Qu.:0.0000   1st Qu.:1994   1st Qu.: 4.000  
##  Median :301710306   Median :0.0000   Median :1997   Median : 6.000  
##  Mean   :266042635   Mean   :0.3524   Mean   :1997   Mean   : 6.505  
##  3rd Qu.:302001507   3rd Qu.:1.0000   3rd Qu.:2000   3rd Qu.: 9.000  
##  Max.   :302150133   Max.   :1.0000   Max.   :2002   Max.   :12.000  
##       day           lotsize          floorsize          age        
##  Min.   : 1.00   Min.   :  0.000   Min.   : 1.08   Min.   :  0.00  
##  1st Qu.: 9.00   1st Qu.:  0.150   1st Qu.:12.28   1st Qu.:  0.00  
##  Median :17.00   Median :  0.210   Median :14.40   Median :  4.00  
##  Mean   :16.87   Mean   :  1.242   Mean   :15.10   Mean   : 12.38  
##  3rd Qu.:25.00   3rd Qu.:  1.000   3rd Qu.:17.22   3rd Qu.: 18.00  
##  Max.   :31.00   Max.   :557.580   Max.   :49.26   Max.   :106.00  
##      price          cancerRisk          class         lnprice     
##  Min.   :  2252   Min.   :0.00000   Min.   :0.00   Min.   : 7.72  
##  1st Qu.: 92095   1st Qu.:0.00000   1st Qu.:2.00   1st Qu.:11.43  
##  Median :111338   Median :0.00000   Median :2.00   Median :11.62  
##  Mean   :119966   Mean   :0.05065   Mean   :2.12   Mean   :11.62  
##  3rd Qu.:137984   3rd Qu.:0.00000   3rd Qu.:2.50   3rd Qu.:11.83  
##  Max.   :895760   Max.   :1.00000   Max.   :4.50   Max.   :13.71  
##      group           agesq           lotsizesq              cl_1        
##  Min.   :  1.0   Min.   :    0.0   Min.   :     0.00   Min.   :0.00000  
##  1st Qu.: 82.0   1st Qu.:    0.0   1st Qu.:     0.02   1st Qu.:0.00000  
##  Median :134.5   Median :   16.0   Median :     0.04   Median :0.00000  
##  Mean   :142.0   Mean   :  468.6   Mean   :    55.78   Mean   :0.04263  
##  3rd Qu.:204.0   3rd Qu.:  324.0   3rd Qu.:     1.00   3rd Qu.:0.00000  
##  Max.   :306.0   Max.   :11236.0   Max.   :310895.47   Max.   :1.00000  
##       cl_2              cl_3              cl_4             cl_5       
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.00000   Median :0.0000   Median :0.0000  
##  Mean   :0.07595   Mean   :0.03802   Mean   :0.4243   Mean   :0.2659  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
##       cl_6             cl_7              cl_8              cl_9         
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.000000  
##  Median :0.0000   Median :0.00000   Median :0.00000   Median :0.000000  
##  Mean   :0.1345   Mean   :0.01098   Mean   :0.00539   Mean   :0.002352  
##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.000000  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.00000   Max.   :1.000000

II.2 Run the following model with using the samples from both Churchill and Lyon County.

ln (price) =α+βcancerRisk+ε

#Model II.2 price as a function of cancer risk
davis_df$lnprice <- log(davis_df$price)
model_2.2 <- davis_df %>% lm(lnprice~cancerRisk,.)
summary(model_2.2, vcov. = vcovCL(), cluster = davis_df$group)
## 
## Call:
## lm(formula = lnprice ~ cancerRisk, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9049 -0.1911  0.0010  0.2171  2.0807 
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 11.624693   0.004014 2895.770  < 2e-16 ***
## cancerRisk  -0.108468   0.019709   -5.503 3.81e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3928 on 10202 degrees of freedom
## Multiple R-squared:  0.00296,    Adjusted R-squared:  0.002862 
## F-statistic: 30.29 on 1 and 10202 DF,  p-value: 3.815e-08

Cancer risk decreases home value by -10.8% when not factoring for county effects

II.3 Run the following model with using the samples for Churchill County only.

ln (price) =α+βcancerRisk+ε

#Model II.2 price as a function of cancer risk
model_2.3 <- lm(lnprice~cancerRisk,data = subset(davis_df, churchill ==1))
summary(model_2.3, vcov. = vcovCL(), cluster = davis_df$group)
## 
## Call:
## lm(formula = lnprice ~ cancerRisk, data = subset(davis_df, churchill == 
##     1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8684 -0.1957  0.0292  0.2407  1.9814 
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 11.588258   0.007488 1547.617  < 2e-16 ***
## cancerRisk  -0.063982   0.021824   -2.932  0.00339 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4077 on 3594 degrees of freedom
## Multiple R-squared:  0.002386,   Adjusted R-squared:  0.002108 
## F-statistic: 8.595 on 1 and 3594 DF,  p-value: 0.003392

Cancer risk decreased home values in churchill county by 6.3%

Run the following model with using the samples from both Churchill and Lyon County.

ln (price) =α+β_1 cancerRisk+β_2 churchill+ε

#Model II.2 price as a function of cancer risk
model_2.4 <- davis_df %>% lm(lnprice~cancerRisk + churchill,.)
summary(model_2.4, vcov. = vcovCL(), cluster = davis_df$group)
## 
## Call:
## lm(formula = lnprice ~ cancerRisk + churchill, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8684 -0.1916 -0.0004  0.2191  2.0644 
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 11.641041   0.004823 2413.537  < 2e-16 ***
## cancerRisk  -0.063982   0.020987   -3.049   0.0023 ** 
## churchill   -0.052783   0.008667   -6.090 1.17e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3921 on 10201 degrees of freedom
## Multiple R-squared:  0.006572,   Adjusted R-squared:  0.006377 
## F-statistic: 33.74 on 2 and 10201 DF,  p-value: 2.475e-15

Churchill county has lower priced housing by about 5%, which is further aggravated by the cancer risk which reduces value by the same 6.3% as in #3.

Explain what the role of county fixed effects (i.e., the dummy for Churchill County) is, and why the estimated β_1= β_in Question 3.

The dummy fixed effects controls for time invariant differences between counties, by controlling for the county level fixed effects in question #4’s model we have separated the impact of cancer risk on housing value versus county level attributes on value. Hence, the coefficient for cancer risk in both #3 and #4 are showing the impact of just cancer risk on housing value and not the county itself. In 3 this is done be regressing on only 1 county and in 4 by controlling for county effects,

II. 5 Run the following model with using the samples from both Churchill and Lyon County.

ln (price) =α+β_1 cancerRisk+βX+β_2 Churchill+μ_y+γ_m+ε. **where, μ_y is year fixed effects, γ_m is month fixed effects, and X includes a set of house characteristics (lotsize lotsizesq floorsize age agesq cl_*).**

model_2.5 <- plm(lnprice ~ 
                   cancerRisk + lotsize + lotsizesq + floorsize + age + agesq + 
                   cl_1 + cl_2 + cl_3 + cl_4 + cl_5 + cl_6 + cl_7 + cl_8 + cl_9  + churchill,
                 data = davis_df,
                 index = c("month", "year"),
                 model = "within")
## Warning in pdata.frame(data, index): duplicate couples (id-time) in resulting pdata.frame
##  to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")
summary(model_2.5, vcov. = vcovCL(), cluster = davis_df$group)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = lnprice ~ cancerRisk + lotsize + lotsizesq + floorsize + 
##     age + agesq + cl_1 + cl_2 + cl_3 + cl_4 + cl_5 + cl_6 + cl_7 + 
##     cl_8 + cl_9 + churchill, data = davis_df, model = "within", 
##     index = c("month", "year"))
## 
## Unbalanced Panel: n = 12, T = 689-1057, N = 10204
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -3.2780766 -0.0883443  0.0028484  0.0952478  2.3193355 
## 
## Coefficients: (1 dropped because of singularities)
##               Estimate  Std. Error  t-value  Pr(>|t|)    
## cancerRisk -1.9633e-01  1.3404e-02 -14.6468 < 2.2e-16 ***
## lotsize     1.2301e-02  6.7392e-04  18.2522 < 2.2e-16 ***
## lotsizesq  -2.0209e-05  1.5182e-06 -13.3107 < 2.2e-16 ***
## floorsize   4.4615e-02  6.7446e-04  66.1490 < 2.2e-16 ***
## age        -6.3259e-03  3.9723e-04 -15.9251 < 2.2e-16 ***
## agesq       1.5228e-05  6.0134e-06   2.5324  0.011343 *  
## cl_1       -4.4766e-01  5.1686e-02  -8.6613 < 2.2e-16 ***
## cl_2       -5.8332e-01  5.2584e-02 -11.0932 < 2.2e-16 ***
## cl_3       -5.9703e-01  5.2642e-02 -11.3413 < 2.2e-16 ***
## cl_4       -4.7461e-01  5.0670e-02  -9.3666 < 2.2e-16 ***
## cl_5       -3.8008e-01  5.0365e-02  -7.5464 4.857e-14 ***
## cl_6       -2.6385e-01  5.0230e-02  -5.2529 1.527e-07 ***
## cl_7       -1.9697e-01  5.4265e-02  -3.6299  0.000285 ***
## cl_8       -1.0497e-01  5.9021e-02  -1.7785  0.075350 .  
## churchill   7.4555e-02  6.4162e-03  11.6199 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    1576
## Residual Sum of Squares: 587.25
## R-Squared:      0.62738
## Adj. R-Squared: 0.62643
## F-statistic: 1142.34 on 15 and 10177 DF, p-value: < 2.22e-16

After accounting for the time (by month and year), county level fixed effects, and obersvable home attributes, cancer risk has a -19.6% effect on house value.

Explain the roles of year fixed effects and month fixed effects.

The fixed effects for month would control for seasonality in housing price as it is likely that certain months result in higher value on the home. Likewise the year fixed effect would account for the time trend in home values. We would expect housing values to mostly increase YoY, depending on macro conditions.

Run the following model with using the samples from both Churchill and Lyon County and with houses that were sold multiple times in the dataset.

ln (price) =α+β_1 cancerRisk+μ_y+γ_m+π_h+ε, where π_his the house fixed effects. Explain the role of house fixed effects, and why the county dummy and other house characteristics are not included into the model, whereas year and month fixed effects remain in the model.

davis_df_multiple <- davis_df %>% 
  group_by(houseID) %>% 
  mutate(salefreq = n())

davis_df_pi <- subset(davis_df_multiple, davis_df_multiple$salefreq > 1)

model_2.6 <- plm(lnprice ~ 
                   cancerRisk,
                 data = davis_df_pi,
                 index = c("month", "year", "houseID"),
                 model = "within")
## Warning in pdata.frame(data, index): duplicate couples (id-time) in resulting pdata.frame
##  to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")
summary(model_2.6, vcov. = vcovCL(), cluster = davis_df$group)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = lnprice ~ cancerRisk, data = davis_df_pi, model = "within", 
##     index = c("month", "year", "houseID"))
## 
## Unbalanced Panel: n = 12, T = 309-528, N = 4922
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -2.6351815 -0.1928940  0.0050042  0.2263949  2.0733092 
## 
## Coefficients:
##             Estimate Std. Error t-value  Pr(>|t|)    
## cancerRisk -0.129788   0.026108 -4.9712 6.881e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    763.4
## Residual Sum of Squares: 759.58
## R-Squared:      0.0050089
## Adj. R-Squared: 0.0025767
## F-statistic: 24.7127 on 1 and 4909 DF, p-value: 6.881e-07

The house fixed effects would account for variations specific to house characteristics across those with multiple observations (lotsize lotsizesq floorsize age agesq cl), but also because the house will not be changing counties, pi would also cover the effect of the county itself as a characteristic of the house, so no need for Churchill dummy variable. However, pi cannot absorb the impact of cross-county and cross-house time variant factors like the general real estate market, more lucrative selling months, or just general trend in prices.

Part III: Evaluation of Papers

1. Kuziemko and Werker (2006)

Kuziemko and Werker engaged in cross-country analysis of the foreign aid benefit from serving on the UN security council. The underlying theory is that the flow of aid would increase to developing nations when they hold a temporary seat on the security council as a sort of vote purchase. Any pooled OLS would be biased by both the country specific characteristics both in terms of liklihood of serving on the council and other factors of leverage for aid. For example, developing countries with valuable natural resources or a key strategic location may have a greater ability to extract aid from the US. So country fixed effects most certainly needed to be controlled for. Timing is also a key component as certain years have a more heated geopolitical climate which may result in larger aid. Another concern for internal bias is that security council seats are not randomly assigned with some specific geographic stratification. Point being that countries that already have some power to get aid may also have power to attain seats. There is also the risk of reverse causality, wherein security council members have some influence as a voting member over UN aid decisions. I also suspect that countries may be selected as a response to some sort of need for more attention or aid, wherein the aid itself is why the country has a higher profile for selection. Some of the timing concerns were controlled for by looking at the years before and after serving a council term to fix any potential timing trends and then by creating a proxy variable to suss out the effect of significant years that may result in more aid because of the challenges faced.

In the case where US Aid is the response variable, I wonder to what extend US political factors may play a role. For example, does the given party in power effect the level of foreign aid as a priority, or if there are certain strings attached around cultural values. This would suggest omitted variables

The external application of the studies findings are marked by several distinctive characteristics, namely term limits are set so unlike a congressman who might face political blowback from charging rents, nations holding a rotating seat only have a set time for extraction. Also the legal ramifications for bribery may be murkier in relation to the benefits. Finally, I think it’s worth noting that nation representatives have an air of cover as opposed to a singular political leader, so there is some moral hazard here. Finally, the nature of this cross-country study was hyper specific to developing countries, so it can’t tell us as much about the behavior of richer countries when they serve on the council (eg Japan).

2. Brown and Goolsbee (2002)

Brown and Goolsbee conceived of a clever approach to understanding the impact internet disseminated knowledge on the market for life insurance. The hypothesis being that increased access to comparison pricing would lower premiums on term and whole life policies. The graphical representation of the time-series data suggests that for term insurance there was already a downward trend, so there are serial correlation concerns right off the bat to consider. Restricting the study to 5-year term policies also may introduce omitted variable bias, for example younger people are likely looking for longer term policies yet they are likely the most internet savvy, so age could be a covariant. Because the data did not directly address whether insurance sites were checked, the authors proceeded probabalistically based on the relative state internet usage, however, again those who are more internet present may be more likely to search for good deals regardless of the advent of online price comparisons. Further, it may not hold equally in all states that internet usage and insurance price checking mirror eachother.

State fixed effects do address policy differences, which are more local than national and the year fixed effects control for larger macro shocks. I do think though this assumes that state policies are stable which this was likely entering a period of shifting regulation as the internet become more ubiquitous. so it is likely some time variant effects are omitted. Probably the biggest challenge is that individual level usage is unknown, so characteristics at the individual level that may drive individual internet use and impact pricing are not considered.

Personnally, I literally just purchased a life insurance policy today using PolicyGenius, an online market tool. So this feels oddly relevant. The importance of the question however is that it’s an application of the internets ability to lower the cost of information as a means of shifting market pricing. We see this play out in so many places now, from ride sharing to vacation rentals to flight ticket purchasing. I would suspect that given the analysis is based on state level propensity for internet usage the same underlying data could be applied to other markets. One interesting note from the paper that challenges the internal validity but also raises interesting questions, does comparison pricing lead individuals to select lower quality coverage at a higher rate? This could also explain the drop in pricing.