By: Maria Isabel Olivera

I. Introduction

The NY Energy Star program is a New York State program which helps certain NYS homeowners pay for home improvements which make their homes more energy efficient. Eligibility for a subsidy is based on a homeowners income. For this weeks analysis I will be utizing data on the NY Energy Star program which I obtained from Open Data NY. The data for this dataset was provided by NYS contractors who implemented energy-efficiency projects for homeowners throughout the state.

II. Methodology

For this analysis, my dependent variable will be “savings”, which is the dollar amount that homeowners saved on their utility bills during the first year after the completion of their energy-efficiency home renovation project.

My independent variable is “Customer” which is binary. Since not all homeowners were eligible to receive a subsidy through the NY Energy Star program due to their income, some homeowners received assistance while other homeowners paid market price for their projects. Homeowner who received assistance are classified as “Assisted” while those who paid market price where classified as “Market”. I will see if subsized or “Assisted” homeowners saw any difference in their first years savings compared to those homeowners who paid market price.

Finally, because contacting costs and homeowner incomes vary from county to county, I will group my data by County and see if different counties saw differences in savings rates.

III. Results

A. Complete Pooling

library(nlme)
library(dplyr)
library(magrittr)
library(tidyr)
library(lmerTest)
library(ggplot2)
library(texreg)
library(lme4)
library(merTools)
cpooling <- lm(Savings ~ Customer, data = nyestar)
summary(cpooling)

Call:
lm(formula = Savings ~ Customer, data = nyestar)

Residuals:
    Min      1Q  Median      3Q     Max 
-1335.3  -471.1  -235.3   182.9 23648.7 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     613.062      5.308  115.49   <2e-16 ***
CustomerMarket  134.190      6.938   19.34   <2e-16 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 755.7 on 48869 degrees of freedom
Multiple R-squared:  0.007596,  Adjusted R-squared:  0.007576 
F-statistic:   374 on 1 and 48869 DF,  p-value: < 2.2e-16

When we pool all homeowners together, regardless of county, it appears that homeowners who paid market price saved on average $134.00 compared to those homeowners who received subsidies.

B. No Pooling

For the next analysis we have run models for each individual county to see if there are differences in the intercept between counties.

dcoef <- nyestar %>% 
    group_by(County) %>% 
    do(mod = lm(Savings ~ Customer, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()

As you can see there is a large amount of variation between the 62 counties of New York with some counties seeing no increase in savings amongst those customers who paid market price for their home improvements.

The below graph shows the changes in slope between the counties:

dcoef <- nyestar %>% 
    group_by(County) %>% 
    do(mod = lm(Savings ~ Customer, data = .))
coef <- dcoef %>% do(data.frame(Customerc = coef(.$mod)[2]))
ggplot(coef, aes(x = Customerc)) + geom_histogram()

Once again we see a significant amount of variation between counties with regards to changes in the slopes of this models.

C. Random Models

Random Intercept and Slope

m1_lme <- lme(Savings ~ Customer, data = nyestar, random = ~1|County, method = "ML")
summary(m1_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: nyestar 
     AIC      BIC  logLik
  776100 776135.2 -388046

Random effects:
 Formula: ~1 | County
        (Intercept) Residual
StdDev:    248.4022 677.8631

Fixed effects: Savings ~ Customer 
                  Value Std.Error    DF   t-value p-value
(Intercept)    833.6524  32.65232 48808 25.531181  0.0000
CustomerMarket   1.9269   6.65599 48808  0.289498  0.7722
 Correlation: 
               (Intr)
CustomerMarket -0.121

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.5436993 -0.5153256 -0.1947324  0.2347675 34.5550366 

Number of Observations: 48871
Number of Groups: 62 
m2_lme <- lme(Savings ~ Customer, data = nyestar, random = ~ Customer|County, method = "ML")
summary(m2_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: nyestar 
       AIC      BIC    logLik
  775835.7 775888.5 -387911.9

Random effects:
 Formula: ~Customer | County
 Structure: General positive-definite, Log-Cholesky parametrization
               StdDev   Corr  
(Intercept)    245.4908 (Intr)
CustomerMarket 179.7925 -0.289
Residual       675.2749       

Fixed effects: Savings ~ Customer 
                  Value Std.Error    DF   t-value p-value
(Intercept)    818.8808  32.99093 48808 24.821394  0.0000
CustomerMarket  38.6675  26.44093 48808  1.462411  0.1436
 Correlation: 
               (Intr)
CustomerMarket -0.354

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.6601575 -0.5087687 -0.1861220  0.2284160 34.1485973 

Number of Observations: 48871
Number of Groups: 62 
AIC(cpooling, m1_lme, m2_lme)
htmlreg(list(m1_lme, m2_lme))
Statistical models
Model 1 Model 2
(Intercept) 833.65*** 818.88***
(32.65) (32.99)
CustomerMarket 1.93 38.67
(6.66) (26.44)
AIC 776100.00 775835.70
BIC 776135.19 775888.48
Log Likelihood -388046.00 -387911.85
Num. obs. 48871 48871
Num. groups 62 62
p < 0.001, p < 0.01, p < 0.05

Looking at the random intercept and random slope models, while we see variations in both models, the random slope model shows a larger degree of variation between counties and appears to be a slightly better fit.

D. Confidence Intervals with MerTools

fm1 <- lmer(Savings ~ Customer + (1|County), data=nyestar)
library(merTools)
fastdisp(fm1)
lme4::lmer(formula = Savings ~ Customer + (1 | County), data = nyestar)
               coef.est coef.se
(Intercept)    833.74    32.91 
CustomerMarket   1.92     6.66 

Error terms:
 Groups   Name        Std.Dev.
 County   (Intercept) 250.50  
 Residual             677.87  
---
number of obs: 48871, groups: County, 62
AIC = 776086
Customer1 <- draw(fm1, type = 'random')
head(Customer1)
Preds <- predictInterval(fm1, newdata = Customer1,
                         level = 0.95, n.sims = 1000,
                        stat = "mean", type = "linear.prediction", include.resid.var = TRUE)
summary(Preds)
      fit            upr            lwr        
 Min.   :1200   Min.   :2460   Min.   :-119.5  
 1st Qu.:1200   1st Qu.:2460   1st Qu.:-119.5  
 Median :1200   Median :2460   Median :-119.5  
 Mean   :1200   Mean   :2460   Mean   :-119.5  
 3rd Qu.:1200   3rd Qu.:2460   3rd Qu.:-119.5  
 Max.   :1200   Max.   :2460   Max.   :-119.5  

Based on the upper and lower values of our confidence interval, it appears that the effect of the variable “Customer” may not be significant. This is most likely due to the fact that there are other variables (such as Fuel type), which have a greater effect on savings.

LS0tDQp0aXRsZTogIlRoZSBOWSBFbmVyZ3kgU3RhciBQcm9ncmFtIGFuZCBDb3VudHkgTGV2ZWwgVmFyaWF0aW9ucyBpbiBGaXJzdCBZZWFyIFNhdmluZ3MiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KQnk6IE1hcmlhIElzYWJlbCBPbGl2ZXJhDQoNCiMjIEkuIEludHJvZHVjdGlvbg0KDQogIFRoZSBOWSBFbmVyZ3kgU3RhciBwcm9ncmFtIGlzIGEgTmV3IFlvcmsgU3RhdGUgcHJvZ3JhbSB3aGljaCBoZWxwcyBjZXJ0YWluIE5ZUyBob21lb3duZXJzIHBheSBmb3IgaG9tZSBpbXByb3ZlbWVudHMgd2hpY2ggbWFrZSB0aGVpciBob21lcyBtb3JlIGVuZXJneSBlZmZpY2llbnQuIEVsaWdpYmlsaXR5IGZvciBhIHN1YnNpZHkgaXMgYmFzZWQgb24gYSBob21lb3duZXJzIGluY29tZS4gRm9yIHRoaXMgd2Vla3MgYW5hbHlzaXMgSSB3aWxsIGJlIHV0aXppbmcgZGF0YSBvbiB0aGUgTlkgRW5lcmd5IFN0YXIgcHJvZ3JhbSB3aGljaCBJIG9idGFpbmVkIGZyb20gT3BlbiBEYXRhIE5ZLiBUaGUgZGF0YSBmb3IgdGhpcyBkYXRhc2V0IHdhcyBwcm92aWRlZCBieSBOWVMgY29udHJhY3RvcnMgd2hvIGltcGxlbWVudGVkIGVuZXJneS1lZmZpY2llbmN5IHByb2plY3RzIGZvciBob21lb3duZXJzIHRocm91Z2hvdXQgdGhlIHN0YXRlLiAgDQogIA0KIyMgSUkuIE1ldGhvZG9sb2d5DQoNCiAgRm9yIHRoaXMgYW5hbHlzaXMsIG15IGRlcGVuZGVudCB2YXJpYWJsZSB3aWxsIGJlICJzYXZpbmdzIiwgd2hpY2ggaXMgdGhlIGRvbGxhciBhbW91bnQgdGhhdCBob21lb3duZXJzIHNhdmVkIG9uIHRoZWlyIHV0aWxpdHkgYmlsbHMgZHVyaW5nIHRoZSBmaXJzdCB5ZWFyIGFmdGVyIHRoZSBjb21wbGV0aW9uIG9mIHRoZWlyIGVuZXJneS1lZmZpY2llbmN5IGhvbWUgcmVub3ZhdGlvbiBwcm9qZWN0LiANCiAgDQogIE15IGluZGVwZW5kZW50IHZhcmlhYmxlIGlzICJDdXN0b21lciIgd2hpY2ggaXMgYmluYXJ5LiBTaW5jZSBub3QgYWxsIGhvbWVvd25lcnMgd2VyZSBlbGlnaWJsZSB0byByZWNlaXZlIGEgc3Vic2lkeSB0aHJvdWdoIHRoZSBOWSBFbmVyZ3kgU3RhciBwcm9ncmFtIGR1ZSB0byB0aGVpciBpbmNvbWUsIHNvbWUgaG9tZW93bmVycyByZWNlaXZlZCBhc3Npc3RhbmNlIHdoaWxlIG90aGVyIGhvbWVvd25lcnMgcGFpZCBtYXJrZXQgcHJpY2UgZm9yIHRoZWlyIHByb2plY3RzLiBIb21lb3duZXIgd2hvIHJlY2VpdmVkIGFzc2lzdGFuY2UgYXJlIGNsYXNzaWZpZWQgYXMgIkFzc2lzdGVkIiB3aGlsZSB0aG9zZSB3aG8gcGFpZCBtYXJrZXQgcHJpY2Ugd2hlcmUgY2xhc3NpZmllZCBhcyAiTWFya2V0Ii4gSSB3aWxsIHNlZSBpZiBzdWJzaXplZCBvciAiQXNzaXN0ZWQiIGhvbWVvd25lcnMgc2F3IGFueSBkaWZmZXJlbmNlIGluIHRoZWlyIGZpcnN0IHllYXJzIHNhdmluZ3MgY29tcGFyZWQgdG8gdGhvc2UgaG9tZW93bmVycyB3aG8gcGFpZCBtYXJrZXQgcHJpY2UuIA0KICANCiAgRmluYWxseSwgYmVjYXVzZSBjb250YWN0aW5nIGNvc3RzIGFuZCBob21lb3duZXIgaW5jb21lcyB2YXJ5IGZyb20gY291bnR5IHRvIGNvdW50eSwgSSB3aWxsIGdyb3VwIG15IGRhdGEgYnkgQ291bnR5IGFuZCBzZWUgaWYgZGlmZmVyZW50IGNvdW50aWVzIHNhdyBkaWZmZXJlbmNlcyBpbiBzYXZpbmdzIHJhdGVzLiANCg0KIyMgSUlJLiBSZXN1bHRzDQoNCiMjIyBBLiBDb21wbGV0ZSBQb29saW5nDQoNCmBgYHtyfQ0KbGlicmFyeShubG1lKQ0KbGlicmFyeShkcGx5cikNCmxpYnJhcnkobWFncml0dHIpDQpsaWJyYXJ5KHRpZHlyKQ0KbGlicmFyeShsbWVyVGVzdCkNCmxpYnJhcnkoZ2dwbG90MikNCmxpYnJhcnkodGV4cmVnKQ0KbGlicmFyeShsbWU0KQ0KbGlicmFyeShtZXJUb29scykNCg0KY3Bvb2xpbmcgPC0gbG0oU2F2aW5ncyB+IEN1c3RvbWVyLCBkYXRhID0gbnllc3RhcikNCnN1bW1hcnkoY3Bvb2xpbmcpDQpgYGANCiAgV2hlbiB3ZSBwb29sIGFsbCBob21lb3duZXJzIHRvZ2V0aGVyLCByZWdhcmRsZXNzIG9mIGNvdW50eSwgaXQgYXBwZWFycyB0aGF0IGhvbWVvd25lcnMgd2hvIHBhaWQgbWFya2V0IHByaWNlIHNhdmVkIG9uIGF2ZXJhZ2UgJDEzNC4wMCBjb21wYXJlZCB0byB0aG9zZSBob21lb3duZXJzIHdobyByZWNlaXZlZCBzdWJzaWRpZXMuDQogIA0KDQojIyMgQi4gTm8gUG9vbGluZw0KDQogIEZvciB0aGUgbmV4dCBhbmFseXNpcyB3ZSBoYXZlIHJ1biBtb2RlbHMgZm9yIGVhY2ggaW5kaXZpZHVhbCBjb3VudHkgdG8gc2VlIGlmIHRoZXJlIGFyZSBkaWZmZXJlbmNlcyBpbiB0aGUgaW50ZXJjZXB0IGJldHdlZW4gY291bnRpZXMuDQoNCg0KYGBge3J9DQpkY29lZiA8LSBueWVzdGFyICU+JSANCiAgICBncm91cF9ieShDb3VudHkpICU+JSANCiAgICBkbyhtb2QgPSBsbShTYXZpbmdzIH4gQ3VzdG9tZXIsIGRhdGEgPSAuKSkNCmNvZWYgPC0gZGNvZWYgJT4lIGRvKGRhdGEuZnJhbWUoaW50YyA9IGNvZWYoLiRtb2QpWzFdKSkNCmdncGxvdChjb2VmLCBhZXMoeCA9IGludGMpKSArIGdlb21faGlzdG9ncmFtKCkNCmBgYA0KDQogIEFzIHlvdSBjYW4gc2VlIHRoZXJlIGlzIGEgbGFyZ2UgYW1vdW50IG9mIHZhcmlhdGlvbiBiZXR3ZWVuIHRoZSA2MiBjb3VudGllcyBvZiBOZXcgWW9yayB3aXRoIHNvbWUgY291bnRpZXMgc2VlaW5nIG5vIGluY3JlYXNlIGluIHNhdmluZ3MgYW1vbmdzdCB0aG9zZSBjdXN0b21lcnMgd2hvIHBhaWQgbWFya2V0IHByaWNlIGZvciB0aGVpciBob21lIGltcHJvdmVtZW50cy4gDQogIA0KICANCiAgVGhlIGJlbG93IGdyYXBoIHNob3dzIHRoZSBjaGFuZ2VzIGluIHNsb3BlIGJldHdlZW4gdGhlIGNvdW50aWVzOg0KDQpgYGB7cn0NCmRjb2VmIDwtIG55ZXN0YXIgJT4lIA0KICAgIGdyb3VwX2J5KENvdW50eSkgJT4lIA0KICAgIGRvKG1vZCA9IGxtKFNhdmluZ3MgfiBDdXN0b21lciwgZGF0YSA9IC4pKQ0KY29lZiA8LSBkY29lZiAlPiUgZG8oZGF0YS5mcmFtZShDdXN0b21lcmMgPSBjb2VmKC4kbW9kKVsyXSkpDQpnZ3Bsb3QoY29lZiwgYWVzKHggPSBDdXN0b21lcmMpKSArIGdlb21faGlzdG9ncmFtKCkNCmBgYA0KICBPbmNlIGFnYWluIHdlIHNlZSBhIHNpZ25pZmljYW50IGFtb3VudCBvZiB2YXJpYXRpb24gYmV0d2VlbiBjb3VudGllcyB3aXRoIHJlZ2FyZHMgdG8gY2hhbmdlcyBpbiB0aGUgc2xvcGVzIG9mIHRoaXMgbW9kZWxzLg0KDQojIyMgQy4gUmFuZG9tIE1vZGVscw0KDQojIyMjIFJhbmRvbSBJbnRlcmNlcHQgYW5kIFNsb3BlDQpgYGB7cn0NCm0xX2xtZSA8LSBsbWUoU2F2aW5ncyB+IEN1c3RvbWVyLCBkYXRhID0gbnllc3RhciwgcmFuZG9tID0gfjF8Q291bnR5LCBtZXRob2QgPSAiTUwiKQ0Kc3VtbWFyeShtMV9sbWUpDQpgYGANCg0KYGBge3J9DQptMl9sbWUgPC0gbG1lKFNhdmluZ3MgfiBDdXN0b21lciwgZGF0YSA9IG55ZXN0YXIsIHJhbmRvbSA9IH4gQ3VzdG9tZXJ8Q291bnR5LCBtZXRob2QgPSAiTUwiKQ0Kc3VtbWFyeShtMl9sbWUpDQpgYGANCg0KYGBge3J9DQpBSUMoY3Bvb2xpbmcsIG0xX2xtZSwgbTJfbG1lKQ0KYGBgDQoNCmBgYHtyIHJlc3VsdHM9J2FzaXMnfQ0KaHRtbHJlZyhsaXN0KG0xX2xtZSwgbTJfbG1lKSkNCmBgYA0KDQogIExvb2tpbmcgYXQgdGhlIHJhbmRvbSBpbnRlcmNlcHQgYW5kIHJhbmRvbSBzbG9wZSBtb2RlbHMsIHdoaWxlIHdlIHNlZSB2YXJpYXRpb25zIGluIGJvdGggbW9kZWxzLCB0aGUgcmFuZG9tIHNsb3BlIG1vZGVsIHNob3dzIGEgbGFyZ2VyIGRlZ3JlZSBvZiB2YXJpYXRpb24gYmV0d2VlbiBjb3VudGllcyBhbmQgYXBwZWFycyB0byBiZSBhIHNsaWdodGx5IGJldHRlciBmaXQuDQogIA0KDQojIyMgRC4gQ29uZmlkZW5jZSBJbnRlcnZhbHMgd2l0aCBNZXJUb29scw0KDQoNCg0KYGBge3J9DQpmbTEgPC0gbG1lcihTYXZpbmdzIH4gQ3VzdG9tZXIgKyAoMXxDb3VudHkpLCBkYXRhPW55ZXN0YXIpDQoNCmxpYnJhcnkobWVyVG9vbHMpDQpmYXN0ZGlzcChmbTEpDQpgYGANCg0KYGBge3J9DQpDdXN0b21lcjEgPC0gZHJhdyhmbTEsIHR5cGUgPSAncmFuZG9tJykNCmhlYWQoQ3VzdG9tZXIxKQ0KYGBgDQoNCmBgYHtyfQ0KUHJlZHMgPC0gcHJlZGljdEludGVydmFsKGZtMSwgbmV3ZGF0YSA9IEN1c3RvbWVyMSwNCiAgICAgICAgICAgICAgICAgICAgICAgICBsZXZlbCA9IDAuOTUsIG4uc2ltcyA9IDEwMDAsDQogICAgICAgICAgICAgICAgICAgICAgICBzdGF0ID0gIm1lYW4iLCB0eXBlID0gImxpbmVhci5wcmVkaWN0aW9uIiwgaW5jbHVkZS5yZXNpZC52YXIgPSBUUlVFKQ0Kc3VtbWFyeShQcmVkcykNCmBgYA0KDQoNCkJhc2VkIG9uIHRoZSB1cHBlciBhbmQgbG93ZXIgdmFsdWVzIG9mIG91ciBjb25maWRlbmNlIGludGVydmFsLCBpdCBhcHBlYXJzIHRoYXQgdGhlIGVmZmVjdCBvZiB0aGUgdmFyaWFibGUgIkN1c3RvbWVyIiBtYXkgbm90IGJlIHNpZ25pZmljYW50LiBUaGlzIGlzIG1vc3QgbGlrZWx5IGR1ZSB0byB0aGUgZmFjdCB0aGF0IHRoZXJlIGFyZSBvdGhlciB2YXJpYWJsZXMgKHN1Y2ggYXMgRnVlbCB0eXBlKSwgd2hpY2ggaGF2ZSBhIGdyZWF0ZXIgZWZmZWN0IG9uIHNhdmluZ3Mu