This lab will have you replicate the work you did for the previous lab, but instead of using the Phoenix data you will select a city of your choice to use for this lab, subsequent labs, and the final project.

You want to select a dense city that has a lot of census tracts, so I suggest you select from the 50 largest cities in the US. The Census defines Metropolitan Statistical Areas (MSAs) as “one or more adjacent counties or county equivalents that have at least one urban core area of at least 50,000 population, plus adjacent territory that has a high degree of social and economic integration with the core as measured by commuting ties.” There are currently 384 MSAs in the US.

Wikipedia provides a convenient list of MSAs sorted by size.

library( geojsonio )   # read shapefiles
library( sp )          # work with shapefiles
library( sf )          # work with shapefiles - simple features format
library( mclust )      # cluster analysis 
library( tmap )        # theme maps
library( ggplot2 )     # graphing 
library( ggthemes )    # nice formats for ggplots
library( dplyr )       # data wrangling 
library( pander )      # formatting RMD tables
library( tidycensus )

library( cartogram )  # spatial maps w/ tract size bias reduction
library( maptools )   # spatial object manipulation

Step 1: Select Your MSA

You can select a city from the list of large MSAs.

To get Census data on the city you will first need to identify all of the counties that comprise the MSA. You can look this information up through MSA to FIPS crosswalks provided by the National Bureau for Economic Research (NBER): https://www.nber.org/data/cbsa-fips-county-crosswalk.html

I have added the file to GitHub for ease of access.

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

# search for citie names by strings, use the ^ anchor for "begins with" 

grep( "^SAN D", crosswalk$msaname, value=TRUE )

## [1] "SAN DIEGO, CA"

grep( "^CALI", crosswalk$msaname, value=TRUE )

##  [1] "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA"
##  [6] "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA"
## [11] "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA"
## [16] "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA"
## [21] "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA" "CALIFORNIA"

Select all of your county fips. To use them in the TidyCenss package you will need to split the state and county:

these.sdp <- crosswalk$msaname == "CALIFORNIA"
these.fips <- crosswalk$fipscounty[ these.sdp ]
these.fips <- na.omit( these.fips )

Step 2: Download a Shapefile with Population Data

To create a Dorling cartogram we need a shapefile and a population count. We can get both through the Census download that includes simple features.

library( tidycensus )
#census_api_key("404739a4c1d1baf6067b937fe42425b977ef145d", install = TRUE)

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

cbind( these.fips, state.fips, county.fips ) %>% pander()

these.fips	state.fips	county.fips
06003	06	003
06005	06	005
06009	06	009
06011	06	011
06015	06	015
06021	06	021
06023	06	023
06025	06	025
06027	06	027
06031	06	031
06033	06	033
06035	06	035
06043	06	043
06045	06	045
06049	06	049
06051	06	051
06057	06	057
06063	06	063
06069	06	069
06091	06	091
06093	06	093
06103	06	103
06105	06	105
06109	06	109
06990	06	990

sd.pop <-
get_acs( geography = "tract", variables = "B01003_001",
         state = "06", county = county.fips[state.fips=="06"], geometry = TRUE ) %>% 
         select( GEOID, estimate ) %>%
         rename( POP=estimate )

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%

Step 3: Add Census Data

URL <- "https://github.com/DS4PS/cpp-529-master/raw/master/data/ltdb_std_2010_sample.rds"
census.dat <- readRDS(gzcon(url( URL )))

# can merge an sf object and data.frame
sd.pop$GEOID<-sub('.', '', sd.pop$GEOID)
sdp <- merge( sd.pop, census.dat, by.x="GEOID", by.y="tractid" )

# make sure there are no empty polygons
sdp <- sdp[ ! st_is_empty( sdp ) , ]

head(sdp)

## Simple feature collection with 6 features and 177 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -121.0277 ymin: 38.21795 xmax: -119.5425 ymax: 38.93332
## Geodetic CRS:  NAD83
##        GEOID  POP statea countya tracta pnhwht12 pnhblk12 phisp12 pntv12
## 1 6003010000 1344     06     003 010000    69.76     0.00    6.52  19.21
## 2 6005000101 4088     06     005 000101    90.35     0.98    3.44   3.59
## 3 6005000102 2054     06     005 000102    83.05     0.00    7.75   7.21
## 4 6005000301 5152     06     005 000301    52.74    13.74   28.46   1.99
## 5 6005000303 6346     06     005 000303    84.77     0.09    8.13   1.38
## 6 6005000304 5186     06     005 000304    79.95     0.00   15.95   0.23
##   pasian12 phaw12 pindia12 pchina12 pfilip12 pjapan12 pkorea12 pviet12 p15wht12
## 1     4.51   0.00        0     1.42     0.00     0.00        0    1.25    10.54
## 2     0.00   0.00        0     0.00     0.20     0.00        0    0.00     6.32
## 3     0.00   0.00        0     0.00     0.00     0.00        0    0.00    18.31
## 4     2.23   0.05        0     0.00     0.26     0.47        0    0.28    10.63
## 5     2.34   0.00        0     1.89     0.09     0.00        0    0.00    16.18
## 6     1.03   1.03        0     0.00     0.00     0.00        0    0.00    14.53
##   p65wht12 p15blk12 p65blk12 p15hsp12 p65hsp12 p15ntv12 p65ntv12 p15asn12
## 1    13.65 18.75882 10.08296    29.49     6.41    35.17    10.17   20.370
## 2    36.31  0.00000  0.00000    64.23    10.22     0.00     0.00    0.000
## 3    25.79 18.75882 10.08296     4.49    21.15     0.00    23.45   15.255
## 4    13.35  1.25000  2.95000     2.58     2.00     3.33     5.33    0.000
## 5    18.26  0.00000 50.00000    37.30     3.97    16.47     0.00   40.370
## 6    16.53 18.75882 10.08296    17.48     8.09     0.00     0.00    0.000
##   p65asn12 pmex12 pcuban12 ppr12 pruanc12 pitanc12 pgeanc12 piranc12 pscanc12
## 1  7.41000   1.92     0.00  0.50     0.00     8.27    11.19    12.11     6.18
## 2  0.00000   1.81     0.00  0.00     2.74     3.49     9.75    12.67     5.20
## 3 10.63872   3.43     0.00  0.94     6.91     3.98    17.99     5.62     5.72
## 4  2.76000  25.09     0.47  1.66     0.24     3.34     9.63     9.47     2.38
## 5  3.67000   4.80     0.00  0.00     0.00     6.37    14.52     9.78     4.00
## 6  0.00000  11.48     0.00  0.00     0.83     4.62    14.43     6.86     6.40
##   pfb12 pnat12 p10imm12 prufb12 pitfb12 pgefb12 pirfb12 pscfb12 polang12 plep12
## 1  4.76   2.51     1.50    0.00    0.00    0.33     0.0    0.00    12.84   4.13
## 2  5.15   3.90     0.33    2.74    0.00    0.30     0.3    0.00     5.10   1.02
## 3 12.48   9.64     0.00    6.91    0.00    2.58     0.6    0.00     8.38   3.69
## 4  8.98   2.62     1.37    0.00    0.00    0.11     0.0    0.75    23.62   2.36
## 5  4.34   2.58     1.27    0.00    0.15    0.00     0.0    0.00     5.61   0.78
## 6  3.28   0.62     1.14    0.00    0.00    0.00     0.0    0.00     8.26   1.73
##   phs12 pcol12 punemp12 pflabf12 pprof12 pmanuf12 psemp12 pvet12 p65pov12
## 1 35.21  31.71    10.40    60.40   42.80     5.72   14.58  13.27     1.12
## 2 31.53  23.16    19.87    41.52   43.01     1.99   26.47  16.89     0.68
## 3 35.73  24.53    20.74    47.28   12.88     5.98   14.42  21.47     0.00
## 4 60.92   9.93    14.33    46.86   36.16     5.28   10.68  13.60     0.91
## 5 33.27  23.13     9.41    54.13   32.06     5.59   14.88  10.42     0.60
## 6 36.98  10.58    17.68    57.59   22.40     6.15    8.80  15.43     0.00
##   ppov12 pwpov12 pnapov12 pfmpov12  pbpov12 phpov12  papov12 pvac12 pown12
## 1  13.71   10.64    14.01     0.80 25.50688   46.15  11.1100  78.79  80.78
## 2   7.59    5.40    75.52     2.37  0.00000    0.00   0.0000  28.46  89.68
## 3   9.99    7.60     5.52     6.61 25.50688   22.44  14.0471  51.81  81.37
## 4   5.96    6.10     0.00     0.65  0.00000   10.13   0.0000   8.82  74.17
## 5  12.28   11.26     0.00     9.87 50.00000   14.29  65.1400  11.50  68.44
## 6  16.34   10.13     0.00     3.52 25.50688   42.56 100.0000  21.36  73.12
##   pmulti12 p30old12 p18und12 p60up12 p75up12 pmar12 pwds12 pfhh12 p10yrs12
## 1    37.47    45.07    22.56   19.97    5.01  51.00  10.82   4.40    41.30
## 2     0.00    43.00    10.26   51.26   12.54  68.30  24.19   3.60    48.12
## 3     4.13    63.44    16.90   33.20   13.97  67.76  21.89   3.22    57.24
## 4     5.80    43.00    10.60   12.52    2.93  29.93  29.11   1.94    51.79
## 5    11.08    44.51    24.09   22.67    5.96  56.46  22.46  17.16    72.63
## 6     2.62    32.01    20.23   20.05    8.00  51.65  19.04   3.43    78.51
##   ageblk12 agentv12 agewht12 agehsp12 india12 filip12 japan12 korea12 viet12
## 1        0      236      835       78       0       0       0       0     15
## 2       39      143     3594      137       0       8       0       0      0
## 3        0      145     1671      156       0       0       0       0      0
## 4      881      150     3235     1746       0      16      29       0     17
## 5        4       85     3942      378       0       4       0       0      0
## 6        0        9     3098      618       0       0       0       0      0
##   pop12 nhwht12 nhblk12 ntv12 hisp12 asian12 haw12 china12 a15wht12 a65wht12
## 1  1197     835       0   230     78      54     0      17       88      114
## 2  3978    3594      39   143    137       0     0       0      227     1305
## 3  2012    1671       0   145    156       0     0       0      306      431
## 4  6134    3235     843   122   1746     137     3       0      344      432
## 5  4650    3942       4    64    378     109     0      88      638      720
## 6  3875    3098       0     9    618      40    40       0      450      512
##   a15blk12 a65blk12 a15hsp12 a65hsp12 a15ntv12 a65ntv12 ageasn12 a15asn12
## 1        0        0       23        5       83       24       54       11
## 2        0        0       88       14        0        0        8        0
## 3        0        0        7       33        0       34        0        0
## 4       11       26       45       35        5        8      145        0
## 5        0        2      141       15       14        0      109       44
## 6        0        0      108       50        0        0       40        0
##   a65asn12 mex12 pr12 cuban12 geanc12 iranc12 itanc12 ruanc12 fb12 nat12 itfb12
## 1        4    23    6       0     134     145      99       0   57    30      0
## 2        0    72    0       0     388     504     139     109  205   155      0
## 3        0    69   19       0     362     113      80     139  251   194      0
## 4        4  1539  102      29     591     581     205      15  551   161      0
## 5        4   223    0       0     675     455     296       0  202   120      7
## 6        0   445    0       0     559     266     179      32  127    24      0
##   rufb12 ag5up12 irfb12 gefb12 scanc12 n10imm12 olang12 lep12 scfb12 ag25up12
## 1      0    1137      0      4      74       18     146    47      0      801
## 2    109    3922     12     12     207       13     200    40      0     3527
## 3    139    1922     12     52     115        0     161    71      0     1598
## 4      0    5973      0      7     146       84    1411   141     46     4854
## 5      0    4352      0      0     186       59     244    34      0     3234
## 6      0    3704      0      0     248       44     306    64      0     2750
##   dfmpov12 hh12 hinc12  hincb12 hincw12  hinch12 incpc12 ag18cv12 vet12
## 1      250  385  59931 49940.72   73162 38750.00   27135      927   123
## 2     1306 1968  56136 49940.72   56917 55629.36   35314     3570   603
## 3      590  746  47875 49940.72   48438 43098.00   23797     1672   359
## 4      619  755  60391 53409.00   59613 88661.00   10165     5484   746
## 5     1317 1885  56120 49940.72   57109 72727.00   29481     3530   368
## 6     1107 1410  54045 49940.72   53500 22286.00   24651     3091   477
##   empclf12 dpov12 npov12 dbpov12 nbpov12 dnapov12 nnapov12 dwpov12 nwpov12
## 1      542   1160    159       0       0      207       29     827      88
## 2     1360   3978    302      39       0      143      108    3594     194
## 3      652   2012    201       0       0      145        8    1671     127
## 4      777   1979    118      47       0       25        0    1802     110
## 5     1915   4650    571       4       2       85        0    3942     444
## 6     1625   3868    632       0       0        9        0    3091     313
##   dhpov12 nhpov12 hhb12 hhw12 hhh12 hs12 col12 clf12 unemp12 dflabf12 flabf12
## 1      78      36     0   307     6  282   254   596      62      452     273
## 2     137       0    27  1829    27 1112   817  1631     324     2016     837
## 3     156      35     0   640    63  571   392   786     163      789     373
## 4      79       8    14   667    53 2957   482   900     129      828     388
## 5     378      54     0  1737    80 1076   748  2105     198     1888    1022
## 6     618     263     0  1220   136 1017   291  1974     349     1554     895
##   prof12 manuf12 semp12 hha12  hinca12 n65pov12 nfmpov12 napov12 dapov12
## 1    232      31     79    11 36079.55       13        2       6      54
## 2    585      27    360     0 79531.26       27       31       0       8
## 3     84      39     94     0 79531.26        0       39       0       0
## 4    281      41     83     0 79531.26       18        4       0       4
## 5    614     107    285    35 53015.00       28      130      71     109
## 6    364     100    143     0 79531.26        0       39      40      40
##   family12 hu12 vac12 ohu12 own12 rent12 dmulti12 mrent12 mhmval12 multi12
## 1      250 1815  1430   385   311     74     1815     696   371300     680
## 2     1306 2751   783  1968  1765    203     2751     925   277600       0
## 3      590 1548   802   746   607    139     1548     839   273100      64
## 4      619  828    73   755   560    195      828     724   278200      48
## 5     1317 2130   245  1885  1290    595     2130    1058   281900     236
## 6     1107 1793   383  1410  1031    379     1793    1112   229100      47
##   h30old12 h10yrs12 a18und12 a60up12 a75up12 ag15up12 X12.Mar wds12 fhh12
## 1      818      159      270     239      60      998     509   108    11
## 2     1183      947      408    2039     499     3663    2502   886    47
## 3      982      427      340     668     281     1681    1139   368    19
## 4      356      391      650     768     180     5716    1711  1664    12
## 5      948     1369     1120    1054     277     3700    2089   831   226
## 6      574     1107      784     777     310     3282    1695   625    38
##                         geometry
## 1 MULTIPOLYGON (((-120.0724 3...
## 2 MULTIPOLYGON (((-120.632 38...
## 3 MULTIPOLYGON (((-120.6093 3...
## 4 MULTIPOLYGON (((-121.0274 3...
## 5 MULTIPOLYGON (((-121.0272 3...
## 6 MULTIPOLYGON (((-121.0114 3...

#head(sd) %>% pander()

Data Dictionary

This exercise uses Census data from the 2012 American Communities Survey made available through the Diversity and Disparities Project.

dd.URL <- "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/data-dictionary.csv"
data.dictionary <- read.csv( dd.URL, stringsAsFactors=F ) 
data.dictionary %>% pander()

LABEL	VARIABLE
tractid	GEOID
pnhwht12	Percent white, non-Hispanic
pnhblk12	Percent black, non-Hispanic
phisp12	Percent Hispanic
pntv12	Percent Native American race
pfb12	Percent foreign born
polang12	Percent speaking other language at home, age 5 plus
phs12	Percent with high school degree or less
pcol12	Percent with 4-year college degree or more
punemp12	Percent unemployed
pflabf12	Percent female labor force participation
pprof12	Percent professional employees
pmanuf12	Percent manufacturing employees
pvet12	Percent veteran
psemp12	Percent self-employed
hinc12	Median HH income, total
incpc12	Per capita income
ppov12	Percent in poverty, total
pown12	Percent owner-occupied units
pvac12	Percent vacant units
pmulti12	Percent multi-family units
mrent12	Median rent
mhmval12	Median home value
p30old12	Percent structures more than 30 years old
p10yrs12	Percent HH in neighborhood 10 years or less
p18und12	Percent 17 and under, total
p60up12	Percent 60 and older, total
p75up12	Percent 75 and older, total
pmar12	Percent currently married, not separated
pwds12	Percent widowed, divorced and separated
pfhh12	Percent female-headed families with children

Step 4: Transform the Shapefile into A Dorling Cartogram

# convert sf map object to an sp version

sd.sp <- as_Spatial( sdp )

class( sd.sp )

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
plot( sd.sp )

# project map and remove empty tracts
sd.sp <- spTransform( sd.sp, CRS("+init=epsg:3395"))
sd.sp <- sd.sp[ sd.sp$POP != 0 & (! is.na( sd.sp$POP )) , ]

# convert census tract polygons to dorling cartogram
# no idea why k=0.03 works, but it does - default is k=5
sd.sp$pop.w <- sd.sp$POP / 9000 # max(msp.sp$POP)   # standardizes it to max of 1.5
sd_dorling <- cartogram_dorling( x=sd.sp, weight="pop.w", k=0.05 )
plot( sd_dorling )
axis(side=2)

tm_shape( sd_dorling ) + 
  tm_polygons( size="POP", col="hinc12", n=7, style="quantile", palette="Spectral" )

Use a bounding box: improve the aestetics.

# user-defined bounding box to move slocer to subjects 
bb <- st_bbox( c( xmin =  -14000000, xmax = -13000000, 
                  ymax = 5165123, ymin = 4156123 ), 
                  crs = st_crs("+init=epsg:3395"))

tm_shape( sd_dorling, bbox = bb) + 
  tm_polygons( col="hinc12", n=10, style="quantile", palette="Spectral" ) +
  tm_layout( "Dorling Cartogram", title.position=c("right","top") )

Clustering

We will use the same set of variables as last week. The data is transformed into z-score so that they are all on similar scales.

keep.these <- c("pnhwht12", "pnhblk12", "phisp12", "pntv12", "pfb12", "polang12", 
"phs12", "pcol12", "punemp12", "pflabf12", "pprof12", "pmanuf12", 
"pvet12", "psemp12", "hinc12", "incpc12", "ppov12", "pown12", 
"pvac12", "pmulti12", "mrent12", "mhmval12", "p30old12", "p10yrs12", 
"p18und12", "p60up12", "p75up12", "pmar12", "pwds12", "pfhh12")

d1 <- sd_dorling@data
d2 <- select( d1, keep.these )
d3 <- apply( d2, 2, scale )
head( d3[,1:6] ) %>% pander()

pnhwht12	pnhblk12	phisp12	pntv12	pfb12	polang12
0.2577	-0.5154	-0.814	2.338	-0.7104	-0.4819
1.051	-0.2854	-0.9353	0.09011	-0.6705	-0.8322
0.7698	-0.5154	-0.7655	0.6112	0.07956	-0.6837
-0.398	2.709	0.05019	-0.1402	-0.2786	0.006068
0.8361	-0.4943	-0.7505	-0.228	-0.7533	-0.8091
0.6504	-0.5154	-0.4425	-0.3935	-0.8618	-0.6892

Prepare Data for Clustering

We transform all of the variables to z scorse so they are on the same scale while clustering. This ensures that each census variable has equal weight. Z-scores typically range from about -3 to +3 with a mean of zero.

keep.these <- c("pnhwht12", "pnhblk12", "phisp12", "pntv12", "pfb12", "polang12", 
"phs12", "pcol12", "punemp12", "pflabf12", "pprof12", "pmanuf12", 
"pvet12", "psemp12", "hinc12", "incpc12", "ppov12", "pown12", 
"pvac12", "pmulti12", "mrent12", "mhmval12", "p30old12", "p10yrs12", 
"p18und12", "p60up12", "p75up12", "pmar12", "pwds12", "pfhh12")

d2 <- select( d1, keep.these )
d3 <- apply( d2, 2, scale )

head( d3[,1:6] ) %>% pander()

pnhwht12	pnhblk12	phisp12	pntv12	pfb12	polang12
0.2577	-0.5154	-0.814	2.338	-0.7104	-0.4819
1.051	-0.2854	-0.9353	0.09011	-0.6705	-0.8322
0.7698	-0.5154	-0.7655	0.6112	0.07956	-0.6837
-0.398	2.709	0.05019	-0.1402	-0.2786	0.006068
0.8361	-0.4943	-0.7505	-0.228	-0.7533	-0.8091
0.6504	-0.5154	-0.4425	-0.3935	-0.8618	-0.6892

Perform Cluster Analysis

For more details on cluster analysis visit the mclust tutorial.

# library( mclust )
set.seed( 1234 )
fit <- Mclust( d3 )
sd_dorling$cluster <- as.factor( fit$classification )
summary( fit )

## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VVE (ellipsoidal, equal orientation) model with 2 components: 
## 
##  log-likelihood   n  df       BIC       ICL
##       -5105.704 209 556 -13181.75 -13182.02
## 
## Clustering table:
##   1   2 
## 162  47

## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VVE (ellipsoidal, equal orientation) model with 7 components: 
## 
##  log-likelihood   n  df       BIC       ICL
##       -15370.41 771 861 -36464.47 -36533.72
## 
## Clustering table:
##   1   2   3   4   5   6   7 
##  74 151 112 237  89  63  45

Some visuals of model fit statistics (doesn’t work well with a lot of variables):

plot( fit, what = "classification" )

Identifying Neighborhood Clusters

Build the charts to compare census characteristics across the groups.

df.pct <- sapply( d2, ntile, 100 )
d4 <- as.data.frame( df.pct )
d4$cluster <- as.factor( paste0("GROUP-",fit$classification) )

num.groups <- length( unique( fit$classification ) )

stats <- 
d4 %>% 
  group_by( cluster ) %>% 
  summarise_each( funs(mean) )

t <- data.frame( t(stats), stringsAsFactors=F )
names(t) <- paste0( "GROUP.", 1:num.groups )
t <- t[-1,]



for( i in 1:num.groups )
{
  z <- t[,i]
  plot( rep(1,30), 1:30, bty="n", xlim=c(-75,100), 
        type="n", xaxt="n", yaxt="n",
        xlab="Percentile", ylab="",
        main=paste("GROUP",i) )
  abline( v=seq(0,100,25), lty=3, lwd=1.5, col="gray90" )
  segments( y0=1:30, x0=0, x1=100, col="gray70", lwd=2 )
  text( -0.2, 1:30, data.dictionary$VARIABLE[-1], cex=0.85, pos=2 )
  points( z, 1:30, pch=19, col="firebrick", cex=1.5 )
  axis( side=1, at=c(0,50,100), col.axis="gray", col="gray" )
}

Note that in order to add variables together we need to make sure that they all have the same cardinality, i.e. positive scores should represent high scores on each scale, negative scores should represent low scores.

We can determine this by looking at the correlation structure and checking to make sure we have all positive correlations:

Transitivity Index

library( corrplot )

d3 <- as.data.frame(d3)

df.dim1 <- dplyr::select( d3, pown12, pmulti12, p10yrs12, pwds12, pfhh12 )

corrplot( cor(df.dim1, use="complete.obs"), 
          order = "hclust", tl.col='black', tl.cex=.75 )

We can see that home ownership goes in the opposite direction as all of the other variables. We can fix it by flipping the sign of home ownership, but note that this means that high on the “community stability” scale now means low home ownership, higher rates of transitivity, higher rates of multi-family homes. So we would have to rename the variable to “community instability”.

Alternatively, we can flip the signs of the other variables so high is good on the scale (which is slightly easier to interpret).

# flip the signs 
df.dim1$pmulti12  <-  - df.dim1$pmulti12
df.dim1$p10yrs12  <-  - df.dim1$p10yrs12
df.dim1$pwds12    <-  - df.dim1$pwds12
df.dim1$pfhh12    <-  - df.dim1$pfhh12

corrplot( cor(df.dim1, use="complete.obs"), 
          order = "hclust", tl.col='black', tl.cex=.75 )

Diversity Index

Need to flip percent white - low proportion of white population corresponds with higher levels of diversity.

df.dim2 <- d3[ c("pnhwht12", "pnhblk12", "phisp12", "pfb12", "polang12") ]

# Check direction
# Need to flip percent white 

corrplot( cor(df.dim2, use="complete.obs"), 
          order = "hclust", tl.col='black', tl.cex=.75 )

# flip the signs 
df.dim2$pnhblk12  <-  - df.dim2$pnhblk12
df.dim2$pfb12     <-  - df.dim2$pfb12
df.dim2$phisp12   <-  - df.dim2$phisp12
df.dim2$polang12  <-  - df.dim2$polang12

corrplot( cor(df.dim2, use="complete.obs"), 
          order = "hclust", tl.col='black', tl.cex=.75 )

Human Capital

df.dim3 <- select( d3, pcol12, phs12, pprof12, hinc12, mhmval12 )

# Check direction
# Need to flip high school graduation rates 

corrplot( cor(df.dim3, use="complete.obs"), 
          order = "hclust", tl.col='black', tl.cex=.75 )

# flip the signs 
df.dim3$pcol12    <-  - df.dim3$pcol12
df.dim3$pprof12   <-  - df.dim3$pprof12
df.dim3$hinc12    <-  - df.dim3$hinc12
df.dim3$mhmval12  <-  - df.dim3$mhmval12

corrplot( cor(df.dim3, use="complete.obs"), 
          order = "hclust", tl.col='black', tl.cex=.75 )

dim1 <- d3$pown12 - d3$pmulti12 - d3$p10yrs12 - d3$pwds12 - d3$pfhh12
dim2 <- - d3$pnhwht12 + d3$pnhblk12 + d3$phisp12 + d3$pfb12 + d3$polang12
dim3 <- d3$pcol12 - d3$phs12 + d3$pprof12 + d3$hinc12 + d3$mhmval12

df.nhood.metrics <- data.frame( dim1, dim2, dim3 )
summary( df.nhood.metrics )

##       dim1               dim2             dim3        
##  Min.   :-11.3690   Min.   :-4.642   Min.   :-9.4133  
##  1st Qu.: -1.8842   1st Qu.:-3.342   1st Qu.:-2.3401  
##  Median :  0.4666   Median :-1.955   Median :-0.3951  
##  Mean   :  0.0000   Mean   : 0.000   Mean   : 0.0000  
##  3rd Qu.:  2.6991   3rd Qu.: 2.663   3rd Qu.: 2.3319  
##  Max.   :  6.4182   Max.   :11.568   Max.   :11.1834

corrplot( cor( df.nhood.metrics, use="complete.obs" ), 
          order = "hclust", tl.col='black', tl.cex=.75 )

I arbitrarily selected three census variables to compare methods. Here we are using % of population 18 and under, % of female labor force participation, and household income.

# cluster with data set of three indices
fit2 <- Mclust( df.nhood.metrics )
summary( fit2 )

## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 2
## components: 
## 
##  log-likelihood   n df       BIC     ICL
##       -1594.944 209 19 -3291.393 -3317.4
## 
## Clustering table:
##   1   2 
## 115  94

sd_dorling$cluster2 <- as.factor( fit2$classification )

# cluster with dataset of three census variables 
d33 <- data.frame( d3$p18und12, d3$pflabf12, d3$hinc12 )
fit3 <- Mclust( d33 )
summary( fit3 )

## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VVE (ellipsoidal, equal orientation) model with 2 components: 
## 
##  log-likelihood   n df      BIC       ICL
##       -828.9165 209 16 -1743.31 -1771.503
## 
## Clustering table:
##   1   2 
##  25 184

sd_dorling$cluster3 <- as.factor( fit3$classification )



tmap_mode("plot")
tmap_style("cobalt")

tm1 <- 
tm_shape( sd_dorling ) + 
  tm_polygons( col="cluster", palette="Accent"  )

tm2 <- 
tm_shape( sd_dorling ) + 
  tm_polygons( col="cluster2", palette="Accent"  )

tm3 <- 
tm_shape( sd_dorling ) + 
  tm_polygons( col="cluster3", palette="Accent"  )


tmap_arrange( tm1, tm2, tm3 )

# only 3 neighborhood indices
plot( fit2, what = "classification" )

Part Two

Part 2 explores how the input data impacts your clusters.

You will use your full model with 30 census variables as the reference point, then do the following:

Question 1:

Compare that set of groups to the groups identified by the model using only the three indices above. Are they identifying the same groups? Which group is missing?

ANSWER: What three indices?

Question 2:

Select three census variables from the full list of 30, and create a new clustering model. How many groups are generated? Are they similar groups to the full model? Are our inferences significantly different if we have a limited amount of data to draw from?

Report your three variables and describe the groups that are identified.

ANSWER:

Reflection (not graded, but please answer):

Reflect on how the data we use in these models defines how we are defining and describing the neighborhoods (groups) within these models?

Are these clusters valid constructs? Do we believe that they are accurately describing a concrete reality about the world? Would knowing which group that a tract is part of help us predict something about the tract, such as outcomes that children would experience living there, or how the tract might change over time?

How do we know which is the right set of groups to use?