Set-up

## ── Attaching packages ────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.7.2     ✔ stringr 1.3.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0
## Warning: package 'tibble' was built under R version 3.4.3
## Warning: package 'stringr' was built under R version 3.4.3
## ── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level,
## NA generated
## 
Read 0.0% of 6141494 rows
Read 2.0% of 6141494 rows
Read 3.7% of 6141494 rows
Read 5.4% of 6141494 rows
Read 7.2% of 6141494 rows
Read 9.0% of 6141494 rows
Read 10.7% of 6141494 rows
Read 12.5% of 6141494 rows
Read 14.2% of 6141494 rows
Read 15.8% of 6141494 rows
Read 17.6% of 6141494 rows
Read 18.6% of 6141494 rows
Read 20.5% of 6141494 rows
Read 22.1% of 6141494 rows
Read 24.1% of 6141494 rows
Read 26.1% of 6141494 rows
Read 26.9% of 6141494 rows
Read 28.7% of 6141494 rows
Read 30.4% of 6141494 rows
Read 32.2% of 6141494 rows
Read 33.9% of 6141494 rows
Read 35.5% of 6141494 rows
Read 37.5% of 6141494 rows
Read 39.2% of 6141494 rows
Read 40.9% of 6141494 rows
Read 42.5% of 6141494 rows
Read 44.3% of 6141494 rows
Read 45.8% of 6141494 rows
Read 46.7% of 6141494 rows
Read 48.5% of 6141494 rows
Read 50.2% of 6141494 rows
Read 51.6% of 6141494 rows
Read 53.4% of 6141494 rows
Read 55.0% of 6141494 rows
Read 56.7% of 6141494 rows
Read 58.3% of 6141494 rows
Read 59.9% of 6141494 rows
Read 61.5% of 6141494 rows
Read 63.2% of 6141494 rows
Read 64.8% of 6141494 rows
Read 66.4% of 6141494 rows
Read 68.1% of 6141494 rows
Read 69.7% of 6141494 rows
Read 71.3% of 6141494 rows
Read 72.9% of 6141494 rows
Read 74.6% of 6141494 rows
Read 75.9% of 6141494 rows
Read 77.7% of 6141494 rows
Read 79.3% of 6141494 rows
Read 80.9% of 6141494 rows
Read 82.6% of 6141494 rows
Read 84.2% of 6141494 rows
Read 85.8% of 6141494 rows
Read 87.6% of 6141494 rows
Read 89.4% of 6141494 rows
Read 91.2% of 6141494 rows
Read 92.8% of 6141494 rows
Read 94.6% of 6141494 rows
Read 95.1% of 6141494 rows
Read 96.7% of 6141494 rows
Read 98.2% of 6141494 rows
Read 99.8% of 6141494 rows
Read 6141494 rows and 55 (of 55) columns from 2.376 GB file in 00:01:20
## 
Read 0.0% of 6141494 rows
Read 1.8% of 6141494 rows
Read 3.4% of 6141494 rows
Read 5.0% of 6141494 rows
Read 6.7% of 6141494 rows
Read 8.3% of 6141494 rows
Read 9.9% of 6141494 rows
Read 11.6% of 6141494 rows
Read 12.9% of 6141494 rows
Read 14.5% of 6141494 rows
Read 16.1% of 6141494 rows
Read 17.7% of 6141494 rows
Read 19.4% of 6141494 rows
Read 20.8% of 6141494 rows
Read 22.5% of 6141494 rows
Read 23.9% of 6141494 rows
Read 25.4% of 6141494 rows
Read 26.7% of 6141494 rows
Read 28.2% of 6141494 rows
Read 29.5% of 6141494 rows
Read 30.8% of 6141494 rows
Read 32.1% of 6141494 rows
Read 33.5% of 6141494 rows
Read 34.7% of 6141494 rows
Read 36.0% of 6141494 rows
Read 37.1% of 6141494 rows
Read 38.3% of 6141494 rows
Read 39.4% of 6141494 rows
Read 40.7% of 6141494 rows
Read 42.0% of 6141494 rows
Read 43.3% of 6141494 rows
Read 44.6% of 6141494 rows
Read 45.9% of 6141494 rows
Read 47.1% of 6141494 rows
Read 48.4% of 6141494 rows
Read 49.5% of 6141494 rows
Read 50.6% of 6141494 rows
Read 51.8% of 6141494 rows
Read 52.9% of 6141494 rows
Read 54.1% of 6141494 rows
Read 55.2% of 6141494 rows
Read 56.3% of 6141494 rows
Read 57.5% of 6141494 rows
Read 58.6% of 6141494 rows
Read 60.1% of 6141494 rows
Read 60.4% of 6141494 rows
Read 61.7% of 6141494 rows
Read 62.9% of 6141494 rows
Read 64.0% of 6141494 rows
Read 65.1% of 6141494 rows
Read 66.3% of 6141494 rows
Read 67.7% of 6141494 rows
Read 69.0% of 6141494 rows
Read 70.2% of 6141494 rows
Read 71.3% of 6141494 rows
Read 72.6% of 6141494 rows
Read 73.8% of 6141494 rows
Read 74.9% of 6141494 rows
Read 76.0% of 6141494 rows
Read 76.7% of 6141494 rows
Read 77.8% of 6141494 rows
Read 79.1% of 6141494 rows
Read 80.3% of 6141494 rows
Read 81.4% of 6141494 rows
Read 82.6% of 6141494 rows
Read 83.9% of 6141494 rows
Read 85.2% of 6141494 rows
Read 86.6% of 6141494 rows
Read 87.9% of 6141494 rows
Read 89.2% of 6141494 rows
Read 90.4% of 6141494 rows
Read 91.5% of 6141494 rows
Read 92.6% of 6141494 rows
Read 93.8% of 6141494 rows
Read 94.9% of 6141494 rows
Read 96.1% of 6141494 rows
Read 97.2% of 6141494 rows
Read 98.5% of 6141494 rows
Read 100.0% of 6141494 rows
Read 6141494 rows and 85 (of 85) columns from 3.292 GB file in 00:01:48
##  [1] "RMSect"                        "rt0000"                       
##  [3] "rt0030"                        "rt0100"                       
##  [5] "rt0130"                        "rt0200"                       
##  [7] "rt0230"                        "rt0300"                       
##  [9] "rt0330"                        "rt0400"                       
## [11] "rt0430"                        "rt0500"                       
## [13] "rt0530"                        "rt0600"                       
## [15] "rt0630"                        "rt0700"                       
## [17] "rt0730"                        "rt0800"                       
## [19] "rt0830"                        "rt0900"                       
## [21] "rt0930"                        "rt1000"                       
## [23] "rt1030"                        "rt1100"                       
## [25] "rt1130"                        "rt1200"                       
## [27] "rt1230"                        "rt1300"                       
## [29] "rt1330"                        "rt1400"                       
## [31] "rt1430"                        "rt1500"                       
## [33] "rt1530"                        "rt1600"                       
## [35] "rt1630"                        "rt1700"                       
## [37] "rt1730"                        "rt1800"                       
## [39] "rt1830"                        "rt1900"                       
## [41] "rt1930"                        "rt2000"                       
## [43] "rt2030"                        "rt2100"                       
## [45] "rt2130"                        "rt2200"                       
## [47] "rt2230"                        "rt2300"                       
## [49] "rt2330"                        "row_mean"                     
## [51] "ACCOM_unshared_house_detached" "ACCOM_unshared_house_semi"    
## [53] "ACCOM_unshared_house_terrace"  "ACCOM_flat"                   
## [55] "ACCOM_shared"                  "ECOACT_active_pt"             
## [57] "ECOACT_active_ft"              "ECOACT_student"               
## [59] "ECOACT_selfemp"                "ECOACT_unemployed"            
## [61] "ECOACT_inactive"               "CENTHEAT_NONE"                
## [63] "CENTHEAT_GAS"                  "CENTHEAT_ELECTRIC"            
## [65] "CENTHEAT_OTHER"                "CENTHEAT_TWOORMORE"           
## [67] "SINGLE"                        "MARRIED"                      
## [69] "SEPARATED"                     "POP_in_households"            
## [71] "POP_in_communal"               "HH_under35"                   
## [73] "HH_35_54"                      "HH_55_64"                     
## [75] "HH_over_64"                    "TENURE_owned"                 
## [77] "TENURE_rented"                 "High_income"                  
## [79] "Middle_income"                 "Low_income"                   
## [81] "daily_usage"
##      PCS      0      1   rowtot
## 1: AL100 1.0000 0.0000 5615.070
## 2: AL108 1.0000 0.0000 5211.093
## 3: AL109 1.0000 0.0000 6143.659
## 4:  AL11 0.6250 0.3750 5849.630
## 5:  AL12 1.0000 0.0000 5216.465
## 6:  AL13 0.9694 0.0307 5119.803
td_lookup <- read.csv("~/Google_Drive/PhD/Data/DEP/Output_data/townsend_scores.csv")
TD_DT <- as.data.table(td_lookup)
TD_DT <- TD_DT[, sum(Total_headcount), by=c("OA_CODE", "PCS", "quint")]


#Then i'll be able to calculate a proportion on the proper groupings
TD_DT <- merge(TD_DT, PCS_count, by="PCS", na.rm=TRUE)
TD_DF <- as.data.frame(TD_DT)%>%
  mutate(prop_PCS = round(V1/PCS_tot, 4))

#need to get z scores added first 
TD_SPREAD <- melt(data=TD_DF, id.vars=c("PCS", "quint", "prop_PCS", "PCS_tot"), na.rm=TRUE)%>%
  filter(variable == "V1")
## Warning: attributes are not identical across measure variables; they will
## be dropped
TD_TEST <- as.data.table(TD_SPREAD)
TD_TEST <- TD_TEST[, sum(prop_PCS), by=c("PCS", "quint")]

TD_WIDE <- dcast(TD_TEST, formula = PCS~quint, value.var="V1")
TD_WIDE[is.na(TD_WIDE)] <- 0

TD_PROP <- merge(TD_WIDE, Weighted_variables[,c(1,50)], by.x="PCS", by.y = "RMSect")
head(TD_PROP)
##      PCS      1      2 3 4   rowtot
## 1: AL100 0.0030 0.9969 0 0 5615.070
## 2: AL108 0.5800 0.4200 0 0 5211.093
## 3: AL109 0.2228 0.7772 0 0 6143.659
## 4:  AL11 0.9000 0.1000 0 0 5849.630
## 5:  AL12 0.0000 1.0000 0 0 5216.465
## 6:  AL13 0.7128 0.2874 0 0 5119.803
TD_PROP <- as.data.frame(TD_PROP)
for(colnum in c(2:4)){
  TD_PROP[,colnum] <- round((TD_PROP[,colnum]*TD_PROP$rowtot),2)
}

Regression outputs

OAC groups

## 
## Call:
## lm(formula = rowtot ~ . - 1, data = GRP_PROP1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -188.866   -3.041   -0.784    1.997  143.425 
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)    
## `1a` 0.9995825  0.0004525  2209.2   <2e-16 ***
## `1b` 1.0002436  0.0003845  2601.2   <2e-16 ***
## `1c` 0.9993516  0.0008722  1145.7   <2e-16 ***
## `2a` 1.0001849  0.0008264  1210.3   <2e-16 ***
## `2b` 0.9997305  0.0009917  1008.1   <2e-16 ***
## `2c` 0.9994937  0.0012436   803.7   <2e-16 ***
## `2d` 0.9997045  0.0006693  1493.8   <2e-16 ***
## `3a` 0.9992983  0.0008922  1120.0   <2e-16 ***
## `3b` 0.9995200  0.0012154   822.4   <2e-16 ***
## `3c` 1.0000115  0.0025822   387.3   <2e-16 ***
## `3d` 1.0001974  0.0008932  1119.7   <2e-16 ***
## `4a` 0.9994329  0.0005110  1955.9   <2e-16 ***
## `4b` 1.0003121  0.0005127  1951.2   <2e-16 ***
## `4c` 1.0004220  0.0006280  1593.2   <2e-16 ***
## `5a` 1.0008356  0.0003835  2609.8   <2e-16 ***
## `5b` 0.9996035  0.0004373  2286.0   <2e-16 ***
## `6a` 1.0002408  0.0004109  2434.5   <2e-16 ***
## `6b` 1.0009161  0.0004083  2451.2   <2e-16 ***
## `7a` 0.9999622  0.0007348  1360.8   <2e-16 ***
## `7b` 1.0018847  0.0039741   252.1   <2e-16 ***
## `7c` 0.9994673  0.0013119   761.9   <2e-16 ***
## `7d` 1.0014103  0.0019407   516.0   <2e-16 ***
## `8a` 1.0000605  0.0007714  1296.5   <2e-16 ***
## `8b` 1.0000875  0.0009665  1034.7   <2e-16 ***
## `8c` 1.0002616  0.0007218  1385.8   <2e-16 ***
## `8d` 1.0003335  0.0007411  1349.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 39.05 on 7677 degrees of freedom
##   (104 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 7.049e+06 on 26 and 7677 DF,  p-value: < 2.2e-16

OAC groups and accomodation census variables

##  [1] "PCS"         "1a"          "1b"          "1c"          "2a"         
##  [6] "2b"          "2c"          "2d"          "3a"          "3b"         
## [11] "3c"          "3d"          "4a"          "4b"          "4c"         
## [16] "5a"          "5b"          "6a"          "6b"          "7a"         
## [21] "7b"          "7c"          "7d"          "8a"          "8b"         
## [26] "8c"          "8d"          "rowtot"      "daily_usage" "detached_pc"
## [31] "semi_pc"     "terrace_pc"  "flat_pc"     "shared_pc"
## 
## Call:
## lm(formula = rowtot ~ . - 1, data = OAC_ACCOM)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -187.905   -4.267   -1.540    3.133  145.836 
## 
## Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)    
## `1a`         9.972e-01  6.600e-04 1510.839  < 2e-16 ***
## `1b`         9.977e-01  5.946e-04 1678.122  < 2e-16 ***
## `1c`         9.966e-01  1.023e-03  973.828  < 2e-16 ***
## `2a`         9.974e-01  1.057e-03  944.028  < 2e-16 ***
## `2b`         9.980e-01  1.244e-03  802.370  < 2e-16 ***
## `2c`         9.974e-01  1.512e-03  659.571  < 2e-16 ***
## `2d`         9.980e-01  9.305e-04 1072.528  < 2e-16 ***
## `3a`         9.971e-01  1.118e-03  891.913  < 2e-16 ***
## `3b`         9.973e-01  1.528e-03  652.589  < 2e-16 ***
## `3c`         9.978e-01  2.784e-03  358.363  < 2e-16 ***
## `3d`         9.981e-01  1.123e-03  888.494  < 2e-16 ***
## `4a`         9.966e-01  7.746e-04 1286.546  < 2e-16 ***
## `4b`         9.978e-01  7.443e-04 1340.686  < 2e-16 ***
## `4c`         9.978e-01  8.092e-04 1233.130  < 2e-16 ***
## `5a`         9.981e-01  6.423e-04 1553.947  < 2e-16 ***
## `5b`         9.968e-01  6.723e-04 1482.677  < 2e-16 ***
## `6a`         9.974e-01  7.160e-04 1392.968  < 2e-16 ***
## `6b`         9.977e-01  7.311e-04 1364.665  < 2e-16 ***
## `7a`         9.972e-01  9.864e-04 1011.019  < 2e-16 ***
## `7b`         9.988e-01  4.067e-03  245.553  < 2e-16 ***
## `7c`         9.965e-01  1.473e-03  676.349  < 2e-16 ***
## `7d`         9.987e-01  2.067e-03  483.203  < 2e-16 ***
## `8a`         9.966e-01  1.011e-03  986.036  < 2e-16 ***
## `8b`         9.969e-01  1.244e-03  801.284  < 2e-16 ***
## `8c`         9.972e-01  9.275e-04 1075.083  < 2e-16 ***
## `8d`         9.972e-01  9.851e-04 1012.208  < 2e-16 ***
## detached_pc  1.939e+01  4.733e+00    4.097 4.23e-05 ***
## semi_pc      1.877e+01  4.647e+00    4.039 5.42e-05 ***
## terrace_pc   1.567e+01  4.701e+00    3.334  0.00086 ***
## flat_pc      1.283e+01  4.888e+00    2.626  0.00867 ** 
## shared_pc   -6.260e+01  5.463e+01   -1.146  0.25183    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 39.07 on 7530 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 5.801e+06 on 31 and 7530 DF,  p-value: < 2.2e-16

IMD Tenth Decile

## 
## Call:
## lm(formula = rowtot ~ . - 1, data = IMD_PROP1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -10.8   -3.5   -3.1   -2.5 8497.3 
## 
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)    
## `0` 1.0005722  0.0005439  1839.7   <2e-16 ***
## `1` 0.9996420  0.0020306   492.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 252.4 on 7225 degrees of freedom
## Multiple R-squared:  0.9982, Adjusted R-squared:  0.9982 
## F-statistic: 2.05e+06 on 2 and 7225 DF,  p-value: < 2.2e-16

Census accomodation

## 
## Call:
## lm(formula = daily_usage ~ . - 1, data = subset1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6033.8  -509.3   -94.3   317.3 11183.3 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## detached_pc  7401.23      48.35 153.074   <2e-16 ***
## semi_pc      5418.60      65.12  83.210   <2e-16 ***
## terrace_pc   5205.67      64.82  80.305   <2e-16 ***
## flat_pc      5473.11      56.75  96.442   <2e-16 ***
## shared_pc    2895.32    1324.19   2.186   0.0288 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 974.6 on 7556 degrees of freedom
## Multiple R-squared:  0.9738, Adjusted R-squared:  0.9738 
## F-statistic: 5.628e+04 on 5 and 7556 DF,  p-value: < 2.2e-16

IMD and accomodation

##  [1] "PCS"         "0"           "1"           "rowtot"      "daily_usage"
##  [6] "detached_pc" "semi_pc"     "terrace_pc"  "flat_pc"     "shared_pc"
## 
## Call:
## lm(formula = rowtot ~ . - 1, data = IMD_ACCOM)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -417.2  -46.2  -12.4   18.6 8040.5 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## `0`         9.391e-01  2.956e-03 317.688   <2e-16 ***
## `1`         9.304e-01  3.699e-03 251.535   <2e-16 ***
## detached_pc 5.483e+02  2.541e+01  21.575   <2e-16 ***
## semi_pc     3.014e+02  2.360e+01  12.769   <2e-16 ***
## terrace_pc  3.138e+02  2.311e+01  13.581   <2e-16 ***
## flat_pc     3.300e+02  2.176e+01  15.169   <2e-16 ***
## shared_pc   2.842e+02  3.373e+02   0.843    0.399    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 246.4 on 7084 degrees of freedom
## Multiple R-squared:  0.9983, Adjusted R-squared:  0.9983 
## F-statistic: 6.033e+05 on 7 and 7084 DF,  p-value: < 2.2e-16

Townsend quantile

## 
## Call:
## lm(formula = rowtot ~ . - 1, data = TD_PROP1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8922 -0.5500 -0.0103  0.5400  5.1876 
## 
## Coefficients:
##      Estimate Std. Error  t value Pr(>|t|)    
## `1` 1.000e+00  2.140e-06 467288.2   <2e-16 ***
## `2` 1.000e+00  3.241e-06 308521.0   <2e-16 ***
## `3` 1.000e+00  2.647e-05  37782.0   <2e-16 ***
## `4` 4.776e+03  4.856e+00    983.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7357 on 7699 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.291e+11 on 4 and 7699 DF,  p-value: < 2.2e-16

Townsend method - Created Townsend score for each OA and then assigned them to quintiles. Then used the proportions of each quintile in each PCS to apportion the overall energy usage.

Have I gone wrong with the methodology somewhere? I’m not sure the results are very clear but also unsure how I would do the regression any other way?