require(Matrix)

## Loading required package: Matrix

library(PMEDMrcpp)

## Loading required package: Rcpp
## Loading required package: RcppEigen

## Warning: package 'RcppEigen' was built under R version 3.1.1

load('Davison02201.Rdata')

Data Description

The data have already been processed. The final format is discussed here. All the data wrangling to get from the raw data to this form is omitted. ## Summary File Data The ACS summary files have published estimates and published margin of errors. ### The Estimate Data There are two levels of geography: tracts and block groups. There are 25 tracts, and 60 block groups. Y[[1]] has the tract level data, and Y[[2]] has the block group data.

The tract variables are: POP, HU, UNITS1D, UNITS1A, UNITS2_9, UNITS10_49, UNITS50P, UNITSRV, TENO, INC1, INC2, RACEHHBlack, TENO:INC1, TENO:INC2, TENO:RACEHHBlack, INC1:RACEHHBlack, INC2:RACEHHBlack

and the block group variables are POP, HU, UNITS1D, UNITS1A, UNITS2_9, UNITS10_49, UNITS50P, UNITSRV, TENO, INC1, INC2, RACEHHBlack, TENO:RACEHHBlack

The tract data look like this:

head(Y[[1]])

##         POP   HU UNITS1D UNITS1A UNITS2_9 UNITS10_49 UNITS50P UNITSRV TENO
## 015401 5245 2322    1770     237      205         94        0      16 1748
## 015612 6935 3281     887     605      970        747       41      31 1077
## 010502 5033 2202    1760     164       91         70        0     117 1510
## 015402 4433 1768    1453      19      110        176       10       0 1258
## 015404 3981 1956     938     113      401         98      406       0  960
## 015502 4592 2131    1385      86      216        444        0       0 1304
##        INC1 INC2 RACEHHBlack TENO:INC1 TENO:INC2 TENO:RACEHHBlack
## 015401  359  438         112       173       291               53
## 015612  789 1179         830       139       320              170
## 010502  380  619         212       218       416              197
## 015402  287  544         190       164       380               93
## 015404  457  499         130       135       239               79
## 015502  357  821         225       254       391               67
##        INC1:RACEHHBlack INC2:RACEHHBlack
## 015401               11               41
## 015612              321              357
## 010502               26              100
## 015402               34               91
## 015404               30               40
## 015502                8              130

Variance

The variances of the tract data look like this.

head(V[[1]])

##           POP      HU UNITS1D UNITS1A UNITS2_9 UNITS10_49 UNITS50P UNITSRV
## 015401  47893  1375.1    5682  2078.7     5020      911.3    36.95  306.35
## 015612 206762  4802.6    9225  8538.0    17625    13286.6   591.27  515.89
## 010502  27947 10805.9   16142  2733.2     1449     1157.8    36.95 1899.83
## 015402  81980  1609.7    5321   269.4     2914     2534.0    94.60   73.91
## 015404 297340  2861.8    5500  2545.8    21415     1745.7  7243.10   73.91
## 015502  63339   478.9    4472  1038.1     3414     6395.4    36.95   73.91
##         TENO  INC1  INC2 RACEHHBlack TENO:INC1 TENO:INC2 TENO:RACEHHBlack
## 015401  5774  8249  7173        1038      4041      3443            651.9
## 015612  9579 18283 28588       10680      2568      6434           3405.7
## 010502 10933  4174  8437        2608      2579      6115           2545.8
## 015402  5867  3418 10062        2608      1793      6914            999.3
## 015404 11973  9471 25666        2546      2243      3894           1658.9
## 015502  6537  4593 12962        2425      2600      4694            478.9
##        INC1:RACEHHBlack INC2:RACEHHBlack
## 015401            244.3            795.6
## 015612           8879.4           9112.3
## 010502            386.2           1777.1
## 015402            719.1           1998.9
## 015404            563.6            944.9
## 015502            194.0           2902.8

Integer numbers have been rounded. You can recover the 90 percent MOEs by taking the square root and multiplying by 1.645.

head(sqrt(V[[1]])*1.645)

##        POP  HU UNITS1D UNITS1A UNITS2_9 UNITS10_49 UNITS50P UNITSRV TENO
## 015401 360  61     124      75   116.55      49.66       10   28.79  125
## 015612 748 114     158     152   218.39     189.62       40   37.36  161
## 010502 275 171     209      86    62.62      55.97       10   71.70  172
## 015402 471  66     120      27    88.80      82.81       16   14.14  126
## 015404 897  88     122      83   240.73      68.73      140   14.14  180
## 015502 414  36     110      53    96.11     131.55       10   14.14  133
##          INC1  INC2 RACEHHBlack TENO:INC1 TENO:INC2 TENO:RACEHHBlack
## 015401 149.41 139.3          53    104.57     96.52               42
## 015612 222.43 278.1         170     83.36    131.95               96
## 010502 106.27 151.1          84     83.54    128.63               83
## 015402  96.18 165.0          84     69.66    136.78               52
## 015404 160.09 263.5          83     77.91    102.65               67
## 015502 111.49 187.3          81     83.87    112.70               36
##        INC1:RACEHHBlack INC2:RACEHHBlack
## 015401            25.71            46.40
## 015612           155.01           157.03
## 010502            32.33            69.35
## 015402            44.11            73.55
## 015404            39.05            50.57
## 015502            22.91            88.63

PUMS data

The object X.allCols contains the microdata. There are 3062 microdata observations.

The data look like this:

head(X.allCols)

##    POP HU UNITS1D UNITS1A UNITS2_9 UNITS10_49 UNITS50P UNITSRV TENO INC1
## 3    2  1       1       0        0          0        0       0    1    1
## 18   4  1       0       0        0          1        0       0    0    0
## 31   2  1       0       0        0          1        0       0    0    0
## 33   3  1       1       0        0          0        0       0    1    0
## 42   1  1       0       1        0          0        0       0    1    0
## 49   2  1       1       0        0          0        0       0    1    0
##    INC2 RACEHHBlack TENO:INC1 TENO:INC2 TENO:RACEHHBlack INC1:RACEHHBlack
## 3     0           0         1         0                0                0
## 18    0           1         0         0                0                0
## 31    0           0         0         0                0                0
## 33    0           1         0         0                1                0
## 42    1           0         0         1                0                0
## 49    1           0         0         1                0                0
##    INC2:RACEHHBlack
## 3                 0
## 18                0
## 31                0
## 33                0
## 42                0
## 49                0

The design weights are in the vector wt. They look like:

head(wt)

## [1] 13 23 12 13 19 18

Geography matrices

We also need some matrices that map between source regions and target regions. These are called A (for aggregation) matrices. Here is a corner of the tract-block group aggregation matrix, so that you can see what they look like. Rows and colums are reordered for clarity.

A[[1]][order(dimnames(A[[1]])[[1]]), order(dimnames(A[[1]])[[2]])][1:9,1:9]

## 9 x 9 sparse Matrix of class "dgCMatrix"
##        0105011 0105012 0105013 0105014 0105015 0105016 0105021 0105022
## 010501       1       1       1       1       1       1       .       .
## 010502       .       .       .       .       .       .       1       1
## 015401       .       .       .       .       .       .       .       .
## 015402       .       .       .       .       .       .       .       .
## 015404       .       .       .       .       .       .       .       .
## 015405       .       .       .       .       .       .       .       .
## 015501       .       .       .       .       .       .       .       .
## 015502       .       .       .       .       .       .       .       .
## 015608       .       .       .       .       .       .       .       .
##        0105023
## 010501       .
## 010502       1
## 015401       .
## 015402       .
## 015404       .
## 015405       .
## 015501       .
## 015502       .
## 015608       .

The block groups are the target regions, so the block group-block group aggregation matrix is just the diagonal ones matrix.

Constraints:

First, some notation.

The matrix names and dimensions of the tract and block group data are $Y_T\ :\ J_T \times K_T$ and $Y_B\ : J_B\times K_B$
The tract and block group variance matrices are $V_T\ :\ J_T \times K_T$ and $V_B\ : J_B\times K_B$
The matrix names and dimensions of the microdata are $X_T\ :\ n \times K_T$ and $X_B\ : N\times K_B$. These matrices are redundant. The only reason for this is that some variables may have tract constraints, but not block group constraints. So not every column of $X_T$ is in $X_B$.
The geographic aggregation matrices are $A_T\ :\ J_T \times J_B$ and $A_B\ : J_B\times J_B$.
The unknown weight matrix is $w: n \times J_B$.

The “soft” pycnophylactic constraints are then: \[ Y_T=A_Tw'X_T + e_T\] and \[ Y_B=A_Bw'X_B + e_B,\] where $e_T$ and $e_B$ are error terms with variance $V_T$ and $V_B$

Renormalizing as a probability, instead of a weight.

Instead of weights $w$, the likelihood is usually specified as a probability $p=w/N$. If we deal with $A'pX$ instead of $A'wX$, then we need to deal with $Y:=Y/N$ and $V:=V/N^2$

Entropy Equation

Solving for w is a pain. There are $nJ_B=$ 183720 unknowns. This would be much worse if target regions are pixels. The dual format has $(J_B K_B + J_T K_T)=$ 425+780=1205 unknowns in the dual formulation. Let the dual variables be $\lambda_B$ and $\lambda_T$. The dual objective function is \[ \log(\sum_{ib} d_{ib} \exp(-\sum_{k\in (B,T)}X_k \lambda_k'A_k)) + \sum_{k\in(B,T)} \sum \lambda_k\odot (Y_k - .5 nV_k \odot\lambda_k) \] The probability weights $P = N^{-1} w$, can be found from the dual parameters by: \[ p_{ib} = \frac{d_{ib} \exp(-\sum_{k\in (B,T)}X_k \lambda_k'A_k)}{\sum_{ib} d_{ib} \exp(-\sum_{k\in (B,T)}X_k \lambda_k'A_k)} \]

The predictions are: $A_T w' X_T$ and $A_B w' X_B$

The gradient is equal to $Y_T - A_T w' X_T + V_T\odot \lambda_T$ and $Y_B - A_B w' X_B +V_B\odot \lambda_B$