## ── Attaching packages ────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.7.2 ✔ stringr 1.3.0
## ✔ readr 1.1.1 ✔ forcats 0.2.0
## Warning: package 'tibble' was built under R version 3.4.3
## Warning: package 'stringr' was built under R version 3.4.3
## ── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level,
## NA generated
##
Read 0.0% of 6141494 rows
Read 2.0% of 6141494 rows
Read 3.7% of 6141494 rows
Read 5.4% of 6141494 rows
Read 7.2% of 6141494 rows
Read 9.0% of 6141494 rows
Read 10.7% of 6141494 rows
Read 12.5% of 6141494 rows
Read 14.2% of 6141494 rows
Read 15.8% of 6141494 rows
Read 17.6% of 6141494 rows
Read 18.6% of 6141494 rows
Read 20.5% of 6141494 rows
Read 22.1% of 6141494 rows
Read 24.1% of 6141494 rows
Read 26.1% of 6141494 rows
Read 26.9% of 6141494 rows
Read 28.7% of 6141494 rows
Read 30.4% of 6141494 rows
Read 32.2% of 6141494 rows
Read 33.9% of 6141494 rows
Read 35.5% of 6141494 rows
Read 37.5% of 6141494 rows
Read 39.2% of 6141494 rows
Read 40.9% of 6141494 rows
Read 42.5% of 6141494 rows
Read 44.3% of 6141494 rows
Read 45.8% of 6141494 rows
Read 46.7% of 6141494 rows
Read 48.5% of 6141494 rows
Read 50.2% of 6141494 rows
Read 51.6% of 6141494 rows
Read 53.4% of 6141494 rows
Read 55.0% of 6141494 rows
Read 56.7% of 6141494 rows
Read 58.3% of 6141494 rows
Read 59.9% of 6141494 rows
Read 61.5% of 6141494 rows
Read 63.2% of 6141494 rows
Read 64.8% of 6141494 rows
Read 66.4% of 6141494 rows
Read 68.1% of 6141494 rows
Read 69.7% of 6141494 rows
Read 71.3% of 6141494 rows
Read 72.9% of 6141494 rows
Read 74.6% of 6141494 rows
Read 75.9% of 6141494 rows
Read 77.7% of 6141494 rows
Read 79.3% of 6141494 rows
Read 80.9% of 6141494 rows
Read 82.6% of 6141494 rows
Read 84.2% of 6141494 rows
Read 85.8% of 6141494 rows
Read 87.6% of 6141494 rows
Read 89.4% of 6141494 rows
Read 91.2% of 6141494 rows
Read 92.8% of 6141494 rows
Read 94.6% of 6141494 rows
Read 95.1% of 6141494 rows
Read 96.7% of 6141494 rows
Read 98.2% of 6141494 rows
Read 99.8% of 6141494 rows
Read 6141494 rows and 55 (of 55) columns from 2.376 GB file in 00:01:20
##
Read 0.0% of 6141494 rows
Read 1.8% of 6141494 rows
Read 3.4% of 6141494 rows
Read 5.0% of 6141494 rows
Read 6.7% of 6141494 rows
Read 8.3% of 6141494 rows
Read 9.9% of 6141494 rows
Read 11.6% of 6141494 rows
Read 12.9% of 6141494 rows
Read 14.5% of 6141494 rows
Read 16.1% of 6141494 rows
Read 17.7% of 6141494 rows
Read 19.4% of 6141494 rows
Read 20.8% of 6141494 rows
Read 22.5% of 6141494 rows
Read 23.9% of 6141494 rows
Read 25.4% of 6141494 rows
Read 26.7% of 6141494 rows
Read 28.2% of 6141494 rows
Read 29.5% of 6141494 rows
Read 30.8% of 6141494 rows
Read 32.1% of 6141494 rows
Read 33.5% of 6141494 rows
Read 34.7% of 6141494 rows
Read 36.0% of 6141494 rows
Read 37.1% of 6141494 rows
Read 38.3% of 6141494 rows
Read 39.4% of 6141494 rows
Read 40.7% of 6141494 rows
Read 42.0% of 6141494 rows
Read 43.3% of 6141494 rows
Read 44.6% of 6141494 rows
Read 45.9% of 6141494 rows
Read 47.1% of 6141494 rows
Read 48.4% of 6141494 rows
Read 49.5% of 6141494 rows
Read 50.6% of 6141494 rows
Read 51.8% of 6141494 rows
Read 52.9% of 6141494 rows
Read 54.1% of 6141494 rows
Read 55.2% of 6141494 rows
Read 56.3% of 6141494 rows
Read 57.5% of 6141494 rows
Read 58.6% of 6141494 rows
Read 60.1% of 6141494 rows
Read 60.4% of 6141494 rows
Read 61.7% of 6141494 rows
Read 62.9% of 6141494 rows
Read 64.0% of 6141494 rows
Read 65.1% of 6141494 rows
Read 66.3% of 6141494 rows
Read 67.7% of 6141494 rows
Read 69.0% of 6141494 rows
Read 70.2% of 6141494 rows
Read 71.3% of 6141494 rows
Read 72.6% of 6141494 rows
Read 73.8% of 6141494 rows
Read 74.9% of 6141494 rows
Read 76.0% of 6141494 rows
Read 76.7% of 6141494 rows
Read 77.8% of 6141494 rows
Read 79.1% of 6141494 rows
Read 80.3% of 6141494 rows
Read 81.4% of 6141494 rows
Read 82.6% of 6141494 rows
Read 83.9% of 6141494 rows
Read 85.2% of 6141494 rows
Read 86.6% of 6141494 rows
Read 87.9% of 6141494 rows
Read 89.2% of 6141494 rows
Read 90.4% of 6141494 rows
Read 91.5% of 6141494 rows
Read 92.6% of 6141494 rows
Read 93.8% of 6141494 rows
Read 94.9% of 6141494 rows
Read 96.1% of 6141494 rows
Read 97.2% of 6141494 rows
Read 98.5% of 6141494 rows
Read 100.0% of 6141494 rows
Read 6141494 rows and 85 (of 85) columns from 3.292 GB file in 00:01:48
## [1] "RMSect" "rt0000"
## [3] "rt0030" "rt0100"
## [5] "rt0130" "rt0200"
## [7] "rt0230" "rt0300"
## [9] "rt0330" "rt0400"
## [11] "rt0430" "rt0500"
## [13] "rt0530" "rt0600"
## [15] "rt0630" "rt0700"
## [17] "rt0730" "rt0800"
## [19] "rt0830" "rt0900"
## [21] "rt0930" "rt1000"
## [23] "rt1030" "rt1100"
## [25] "rt1130" "rt1200"
## [27] "rt1230" "rt1300"
## [29] "rt1330" "rt1400"
## [31] "rt1430" "rt1500"
## [33] "rt1530" "rt1600"
## [35] "rt1630" "rt1700"
## [37] "rt1730" "rt1800"
## [39] "rt1830" "rt1900"
## [41] "rt1930" "rt2000"
## [43] "rt2030" "rt2100"
## [45] "rt2130" "rt2200"
## [47] "rt2230" "rt2300"
## [49] "rt2330" "row_mean"
## [51] "ACCOM_unshared_house_detached" "ACCOM_unshared_house_semi"
## [53] "ACCOM_unshared_house_terrace" "ACCOM_flat"
## [55] "ACCOM_shared" "ECOACT_active_pt"
## [57] "ECOACT_active_ft" "ECOACT_student"
## [59] "ECOACT_selfemp" "ECOACT_unemployed"
## [61] "ECOACT_inactive" "CENTHEAT_NONE"
## [63] "CENTHEAT_GAS" "CENTHEAT_ELECTRIC"
## [65] "CENTHEAT_OTHER" "CENTHEAT_TWOORMORE"
## [67] "SINGLE" "MARRIED"
## [69] "SEPARATED" "POP_in_households"
## [71] "POP_in_communal" "HH_under35"
## [73] "HH_35_54" "HH_55_64"
## [75] "HH_over_64" "TENURE_owned"
## [77] "TENURE_rented" "High_income"
## [79] "Middle_income" "Low_income"
## [81] "daily_usage"
## PCS 0 1 rowtot
## 1: AL100 1.0000 0.0000 5615.070
## 2: AL108 1.0000 0.0000 5211.093
## 3: AL109 1.0000 0.0000 6143.659
## 4: AL11 0.6250 0.3750 5849.630
## 5: AL12 1.0000 0.0000 5216.465
## 6: AL13 0.9694 0.0307 5119.803
td_lookup <- read.csv("~/Google_Drive/PhD/Data/DEP/Output_data/townsend_scores.csv")
TD_DT <- as.data.table(td_lookup)
TD_DT <- TD_DT[, sum(Total_headcount), by=c("OA_CODE", "PCS", "quint")]
#Then i'll be able to calculate a proportion on the proper groupings
TD_DT <- merge(TD_DT, PCS_count, by="PCS", na.rm=TRUE)
TD_DF <- as.data.frame(TD_DT)%>%
mutate(prop_PCS = round(V1/PCS_tot, 4))
#need to get z scores added first
TD_SPREAD <- melt(data=TD_DF, id.vars=c("PCS", "quint", "prop_PCS", "PCS_tot"), na.rm=TRUE)%>%
filter(variable == "V1")
## Warning: attributes are not identical across measure variables; they will
## be dropped
TD_TEST <- as.data.table(TD_SPREAD)
TD_TEST <- TD_TEST[, sum(prop_PCS), by=c("PCS", "quint")]
TD_WIDE <- dcast(TD_TEST, formula = PCS~quint, value.var="V1")
TD_WIDE[is.na(TD_WIDE)] <- 0
TD_PROP <- merge(TD_WIDE, Weighted_variables[,c(1,50)], by.x="PCS", by.y = "RMSect")
head(TD_PROP)
## PCS 1 2 3 4 rowtot
## 1: AL100 0.0030 0.9969 0 0 5615.070
## 2: AL108 0.5800 0.4200 0 0 5211.093
## 3: AL109 0.2228 0.7772 0 0 6143.659
## 4: AL11 0.9000 0.1000 0 0 5849.630
## 5: AL12 0.0000 1.0000 0 0 5216.465
## 6: AL13 0.7128 0.2874 0 0 5119.803
TD_PROP <- as.data.frame(TD_PROP)
for(colnum in c(2:4)){
TD_PROP[,colnum] <- round((TD_PROP[,colnum]*TD_PROP$rowtot),2)
}
##
## Call:
## lm(formula = rowtot ~ . - 1, data = GRP_PROP1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -188.866 -3.041 -0.784 1.997 143.425
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## `1a` 0.9995825 0.0004525 2209.2 <2e-16 ***
## `1b` 1.0002436 0.0003845 2601.2 <2e-16 ***
## `1c` 0.9993516 0.0008722 1145.7 <2e-16 ***
## `2a` 1.0001849 0.0008264 1210.3 <2e-16 ***
## `2b` 0.9997305 0.0009917 1008.1 <2e-16 ***
## `2c` 0.9994937 0.0012436 803.7 <2e-16 ***
## `2d` 0.9997045 0.0006693 1493.8 <2e-16 ***
## `3a` 0.9992983 0.0008922 1120.0 <2e-16 ***
## `3b` 0.9995200 0.0012154 822.4 <2e-16 ***
## `3c` 1.0000115 0.0025822 387.3 <2e-16 ***
## `3d` 1.0001974 0.0008932 1119.7 <2e-16 ***
## `4a` 0.9994329 0.0005110 1955.9 <2e-16 ***
## `4b` 1.0003121 0.0005127 1951.2 <2e-16 ***
## `4c` 1.0004220 0.0006280 1593.2 <2e-16 ***
## `5a` 1.0008356 0.0003835 2609.8 <2e-16 ***
## `5b` 0.9996035 0.0004373 2286.0 <2e-16 ***
## `6a` 1.0002408 0.0004109 2434.5 <2e-16 ***
## `6b` 1.0009161 0.0004083 2451.2 <2e-16 ***
## `7a` 0.9999622 0.0007348 1360.8 <2e-16 ***
## `7b` 1.0018847 0.0039741 252.1 <2e-16 ***
## `7c` 0.9994673 0.0013119 761.9 <2e-16 ***
## `7d` 1.0014103 0.0019407 516.0 <2e-16 ***
## `8a` 1.0000605 0.0007714 1296.5 <2e-16 ***
## `8b` 1.0000875 0.0009665 1034.7 <2e-16 ***
## `8c` 1.0002616 0.0007218 1385.8 <2e-16 ***
## `8d` 1.0003335 0.0007411 1349.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 39.05 on 7677 degrees of freedom
## (104 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 7.049e+06 on 26 and 7677 DF, p-value: < 2.2e-16
## [1] "PCS" "1a" "1b" "1c" "2a"
## [6] "2b" "2c" "2d" "3a" "3b"
## [11] "3c" "3d" "4a" "4b" "4c"
## [16] "5a" "5b" "6a" "6b" "7a"
## [21] "7b" "7c" "7d" "8a" "8b"
## [26] "8c" "8d" "rowtot" "daily_usage" "detached_pc"
## [31] "semi_pc" "terrace_pc" "flat_pc" "shared_pc"
##
## Call:
## lm(formula = rowtot ~ . - 1, data = OAC_ACCOM)
##
## Residuals:
## Min 1Q Median 3Q Max
## -187.905 -4.267 -1.540 3.133 145.836
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## `1a` 9.972e-01 6.600e-04 1510.839 < 2e-16 ***
## `1b` 9.977e-01 5.946e-04 1678.122 < 2e-16 ***
## `1c` 9.966e-01 1.023e-03 973.828 < 2e-16 ***
## `2a` 9.974e-01 1.057e-03 944.028 < 2e-16 ***
## `2b` 9.980e-01 1.244e-03 802.370 < 2e-16 ***
## `2c` 9.974e-01 1.512e-03 659.571 < 2e-16 ***
## `2d` 9.980e-01 9.305e-04 1072.528 < 2e-16 ***
## `3a` 9.971e-01 1.118e-03 891.913 < 2e-16 ***
## `3b` 9.973e-01 1.528e-03 652.589 < 2e-16 ***
## `3c` 9.978e-01 2.784e-03 358.363 < 2e-16 ***
## `3d` 9.981e-01 1.123e-03 888.494 < 2e-16 ***
## `4a` 9.966e-01 7.746e-04 1286.546 < 2e-16 ***
## `4b` 9.978e-01 7.443e-04 1340.686 < 2e-16 ***
## `4c` 9.978e-01 8.092e-04 1233.130 < 2e-16 ***
## `5a` 9.981e-01 6.423e-04 1553.947 < 2e-16 ***
## `5b` 9.968e-01 6.723e-04 1482.677 < 2e-16 ***
## `6a` 9.974e-01 7.160e-04 1392.968 < 2e-16 ***
## `6b` 9.977e-01 7.311e-04 1364.665 < 2e-16 ***
## `7a` 9.972e-01 9.864e-04 1011.019 < 2e-16 ***
## `7b` 9.988e-01 4.067e-03 245.553 < 2e-16 ***
## `7c` 9.965e-01 1.473e-03 676.349 < 2e-16 ***
## `7d` 9.987e-01 2.067e-03 483.203 < 2e-16 ***
## `8a` 9.966e-01 1.011e-03 986.036 < 2e-16 ***
## `8b` 9.969e-01 1.244e-03 801.284 < 2e-16 ***
## `8c` 9.972e-01 9.275e-04 1075.083 < 2e-16 ***
## `8d` 9.972e-01 9.851e-04 1012.208 < 2e-16 ***
## detached_pc 1.939e+01 4.733e+00 4.097 4.23e-05 ***
## semi_pc 1.877e+01 4.647e+00 4.039 5.42e-05 ***
## terrace_pc 1.567e+01 4.701e+00 3.334 0.00086 ***
## flat_pc 1.283e+01 4.888e+00 2.626 0.00867 **
## shared_pc -6.260e+01 5.463e+01 -1.146 0.25183
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 39.07 on 7530 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 5.801e+06 on 31 and 7530 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = rowtot ~ . - 1, data = IMD_PROP1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8 -3.5 -3.1 -2.5 8497.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## `0` 1.0005722 0.0005439 1839.7 <2e-16 ***
## `1` 0.9996420 0.0020306 492.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 252.4 on 7225 degrees of freedom
## Multiple R-squared: 0.9982, Adjusted R-squared: 0.9982
## F-statistic: 2.05e+06 on 2 and 7225 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = daily_usage ~ . - 1, data = subset1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6033.8 -509.3 -94.3 317.3 11183.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## detached_pc 7401.23 48.35 153.074 <2e-16 ***
## semi_pc 5418.60 65.12 83.210 <2e-16 ***
## terrace_pc 5205.67 64.82 80.305 <2e-16 ***
## flat_pc 5473.11 56.75 96.442 <2e-16 ***
## shared_pc 2895.32 1324.19 2.186 0.0288 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 974.6 on 7556 degrees of freedom
## Multiple R-squared: 0.9738, Adjusted R-squared: 0.9738
## F-statistic: 5.628e+04 on 5 and 7556 DF, p-value: < 2.2e-16
## [1] "PCS" "0" "1" "rowtot" "daily_usage"
## [6] "detached_pc" "semi_pc" "terrace_pc" "flat_pc" "shared_pc"
##
## Call:
## lm(formula = rowtot ~ . - 1, data = IMD_ACCOM)
##
## Residuals:
## Min 1Q Median 3Q Max
## -417.2 -46.2 -12.4 18.6 8040.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## `0` 9.391e-01 2.956e-03 317.688 <2e-16 ***
## `1` 9.304e-01 3.699e-03 251.535 <2e-16 ***
## detached_pc 5.483e+02 2.541e+01 21.575 <2e-16 ***
## semi_pc 3.014e+02 2.360e+01 12.769 <2e-16 ***
## terrace_pc 3.138e+02 2.311e+01 13.581 <2e-16 ***
## flat_pc 3.300e+02 2.176e+01 15.169 <2e-16 ***
## shared_pc 2.842e+02 3.373e+02 0.843 0.399
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 246.4 on 7084 degrees of freedom
## Multiple R-squared: 0.9983, Adjusted R-squared: 0.9983
## F-statistic: 6.033e+05 on 7 and 7084 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = rowtot ~ . - 1, data = TD_PROP1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8922 -0.5500 -0.0103 0.5400 5.1876
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## `1` 1.000e+00 2.140e-06 467288.2 <2e-16 ***
## `2` 1.000e+00 3.241e-06 308521.0 <2e-16 ***
## `3` 1.000e+00 2.647e-05 37782.0 <2e-16 ***
## `4` 4.776e+03 4.856e+00 983.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7357 on 7699 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 1.291e+11 on 4 and 7699 DF, p-value: < 2.2e-16
Townsend method - Created Townsend score for each OA and then assigned them to quintiles. Then used the proportions of each quintile in each PCS to apportion the overall energy usage.
Have I gone wrong with the methodology somewhere? I’m not sure the results are very clear but also unsure how I would do the regression any other way?