require(Matrix)
## Loading required package: Matrix
library(PMEDMrcpp)
## Loading required package: Rcpp
## Loading required package: RcppEigen
## Warning: package 'RcppEigen' was built under R version 3.1.1
load('Davison02201.Rdata')
The data have already been processed. The final format is discussed here. All the data wrangling to get from the raw data to this form is omitted. ## Summary File Data The ACS summary files have published estimates and published margin of errors. ### The Estimate Data There are two levels of geography: tracts and block groups. There are 25 tracts, and 60 block groups. Y[[1]] has the tract level data, and Y[[2]] has the block group data.
The tract variables are: POP, HU, UNITS1D, UNITS1A, UNITS2_9, UNITS10_49, UNITS50P, UNITSRV, TENO, INC1, INC2, RACEHHBlack, TENO:INC1, TENO:INC2, TENO:RACEHHBlack, INC1:RACEHHBlack, INC2:RACEHHBlack
and the block group variables are POP, HU, UNITS1D, UNITS1A, UNITS2_9, UNITS10_49, UNITS50P, UNITSRV, TENO, INC1, INC2, RACEHHBlack, TENO:RACEHHBlack
The tract data look like this:
head(Y[[1]])
## POP HU UNITS1D UNITS1A UNITS2_9 UNITS10_49 UNITS50P UNITSRV TENO
## 015401 5245 2322 1770 237 205 94 0 16 1748
## 015612 6935 3281 887 605 970 747 41 31 1077
## 010502 5033 2202 1760 164 91 70 0 117 1510
## 015402 4433 1768 1453 19 110 176 10 0 1258
## 015404 3981 1956 938 113 401 98 406 0 960
## 015502 4592 2131 1385 86 216 444 0 0 1304
## INC1 INC2 RACEHHBlack TENO:INC1 TENO:INC2 TENO:RACEHHBlack
## 015401 359 438 112 173 291 53
## 015612 789 1179 830 139 320 170
## 010502 380 619 212 218 416 197
## 015402 287 544 190 164 380 93
## 015404 457 499 130 135 239 79
## 015502 357 821 225 254 391 67
## INC1:RACEHHBlack INC2:RACEHHBlack
## 015401 11 41
## 015612 321 357
## 010502 26 100
## 015402 34 91
## 015404 30 40
## 015502 8 130
The variances of the tract data look like this.
head(V[[1]])
## POP HU UNITS1D UNITS1A UNITS2_9 UNITS10_49 UNITS50P UNITSRV
## 015401 47893 1375.1 5682 2078.7 5020 911.3 36.95 306.35
## 015612 206762 4802.6 9225 8538.0 17625 13286.6 591.27 515.89
## 010502 27947 10805.9 16142 2733.2 1449 1157.8 36.95 1899.83
## 015402 81980 1609.7 5321 269.4 2914 2534.0 94.60 73.91
## 015404 297340 2861.8 5500 2545.8 21415 1745.7 7243.10 73.91
## 015502 63339 478.9 4472 1038.1 3414 6395.4 36.95 73.91
## TENO INC1 INC2 RACEHHBlack TENO:INC1 TENO:INC2 TENO:RACEHHBlack
## 015401 5774 8249 7173 1038 4041 3443 651.9
## 015612 9579 18283 28588 10680 2568 6434 3405.7
## 010502 10933 4174 8437 2608 2579 6115 2545.8
## 015402 5867 3418 10062 2608 1793 6914 999.3
## 015404 11973 9471 25666 2546 2243 3894 1658.9
## 015502 6537 4593 12962 2425 2600 4694 478.9
## INC1:RACEHHBlack INC2:RACEHHBlack
## 015401 244.3 795.6
## 015612 8879.4 9112.3
## 010502 386.2 1777.1
## 015402 719.1 1998.9
## 015404 563.6 944.9
## 015502 194.0 2902.8
Integer numbers have been rounded. You can recover the 90 percent MOEs by taking the square root and multiplying by 1.645.
head(sqrt(V[[1]])*1.645)
## POP HU UNITS1D UNITS1A UNITS2_9 UNITS10_49 UNITS50P UNITSRV TENO
## 015401 360 61 124 75 116.55 49.66 10 28.79 125
## 015612 748 114 158 152 218.39 189.62 40 37.36 161
## 010502 275 171 209 86 62.62 55.97 10 71.70 172
## 015402 471 66 120 27 88.80 82.81 16 14.14 126
## 015404 897 88 122 83 240.73 68.73 140 14.14 180
## 015502 414 36 110 53 96.11 131.55 10 14.14 133
## INC1 INC2 RACEHHBlack TENO:INC1 TENO:INC2 TENO:RACEHHBlack
## 015401 149.41 139.3 53 104.57 96.52 42
## 015612 222.43 278.1 170 83.36 131.95 96
## 010502 106.27 151.1 84 83.54 128.63 83
## 015402 96.18 165.0 84 69.66 136.78 52
## 015404 160.09 263.5 83 77.91 102.65 67
## 015502 111.49 187.3 81 83.87 112.70 36
## INC1:RACEHHBlack INC2:RACEHHBlack
## 015401 25.71 46.40
## 015612 155.01 157.03
## 010502 32.33 69.35
## 015402 44.11 73.55
## 015404 39.05 50.57
## 015502 22.91 88.63
The object X.allCols contains the microdata. There are 3062 microdata observations.
The data look like this:
head(X.allCols)
## POP HU UNITS1D UNITS1A UNITS2_9 UNITS10_49 UNITS50P UNITSRV TENO INC1
## 3 2 1 1 0 0 0 0 0 1 1
## 18 4 1 0 0 0 1 0 0 0 0
## 31 2 1 0 0 0 1 0 0 0 0
## 33 3 1 1 0 0 0 0 0 1 0
## 42 1 1 0 1 0 0 0 0 1 0
## 49 2 1 1 0 0 0 0 0 1 0
## INC2 RACEHHBlack TENO:INC1 TENO:INC2 TENO:RACEHHBlack INC1:RACEHHBlack
## 3 0 0 1 0 0 0
## 18 0 1 0 0 0 0
## 31 0 0 0 0 0 0
## 33 0 1 0 0 1 0
## 42 1 0 0 1 0 0
## 49 1 0 0 1 0 0
## INC2:RACEHHBlack
## 3 0
## 18 0
## 31 0
## 33 0
## 42 0
## 49 0
The design weights are in the vector wt. They look like:
head(wt)
## [1] 13 23 12 13 19 18
We also need some matrices that map between source regions and target regions. These are called A (for aggregation) matrices. Here is a corner of the tract-block group aggregation matrix, so that you can see what they look like. Rows and colums are reordered for clarity.
A[[1]][order(dimnames(A[[1]])[[1]]), order(dimnames(A[[1]])[[2]])][1:9,1:9]
## 9 x 9 sparse Matrix of class "dgCMatrix"
## 0105011 0105012 0105013 0105014 0105015 0105016 0105021 0105022
## 010501 1 1 1 1 1 1 . .
## 010502 . . . . . . 1 1
## 015401 . . . . . . . .
## 015402 . . . . . . . .
## 015404 . . . . . . . .
## 015405 . . . . . . . .
## 015501 . . . . . . . .
## 015502 . . . . . . . .
## 015608 . . . . . . . .
## 0105023
## 010501 .
## 010502 1
## 015401 .
## 015402 .
## 015404 .
## 015405 .
## 015501 .
## 015502 .
## 015608 .
The block groups are the target regions, so the block group-block group aggregation matrix is just the diagonal ones matrix.
First, some notation.
The “soft” pycnophylactic constraints are then: \[ Y_T=A_Tw'X_T + e_T\] and \[ Y_B=A_Bw'X_B + e_B,\] where \(e_T\) and \(e_B\) are error terms with variance \(V_T\) and \(V_B\)
Instead of weights \(w\), the likelihood is usually specified as a probability \(p=w/N\). If we deal with \(A'pX\) instead of \(A'wX\), then we need to deal with \(Y:=Y/N\) and \(V:=V/N^2\)
Solving for w is a pain. There are \(nJ_B=\) 183720 unknowns. This would be much worse if target regions are pixels. The dual format has \((J_B K_B + J_T K_T)=\) 425+780=1205 unknowns in the dual formulation. Let the dual variables be \(\lambda_B\) and \(\lambda_T\). The dual objective function is \[ \log(\sum_{ib} d_{ib} \exp(-\sum_{k\in (B,T)}X_k \lambda_k'A_k)) + \sum_{k\in(B,T)} \sum \lambda_k\odot (Y_k - .5 nV_k \odot\lambda_k) \] The probability weights $P = N^{-1} w$, can be found from the dual parameters by: \[ p_{ib} = \frac{d_{ib} \exp(-\sum_{k\in (B,T)}X_k \lambda_k'A_k)}{\sum_{ib} d_{ib} \exp(-\sum_{k\in (B,T)}X_k \lambda_k'A_k)} \]
The predictions are: \(A_T w' X_T\) and \(A_B w' X_B\)
The gradient is equal to \(Y_T - A_T w' X_T + V_T\odot \lambda_T\) and \(Y_B - A_B w' X_B +V_B\odot \lambda_B\)