Introduction

A number of datasets have been identified that can be used to calculate dissimilarity scores using. These data sets are derived mainly from the 2001 and 2011 censuses, and the Scottish Neighbourhood Statistics (SNS) website. Where possible, I have used data relating to 2001 and 2011 for comparability with census variables. The tables have been prepared mainly for the 2196 datazones that comparise a definition of Greater Glasgow based on health boards. However in principle all tables could be prepared for the whole of Scotland.

Census-based variables: country of origin, ethnic group, and religion

Country of origin, ethnic group and religion.

These tables were first found and used as part of the dissimilarity inference app, and are available to select within the latest version of the Shiny app.

Country of origin

require(readr)

## Loading required package: readr

## Warning: package 'readr' was built under R version 3.1.3

require(dplyr)

## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

require(tidyr)

## Loading required package: tidyr

dta_coo <- read_csv("data/derived/coo.csv")
dta_coo

## Source: local data frame [4,392 x 10]
## 
##    [EMPTY]  datazone year scotland england northern_ireland wales
## 1        1 S01000758 2001      943     174                5     2
## 2        2 S01001423 2001      923      37                5     1
## 3        3 S01001424 2001     1123      35                7     0
## 4        4 S01001425 2001     1400      66                0     1
## 5        5 S01001426 2001      899      40                4     0
## 6        6 S01001427 2001      851      24                4     0
## 7        7 S01001428 2001      959      31                1     1
## 8        8 S01001429 2001      578      16                3     0
## 9        9 S01001430 2001      877      43                1     2
## 10      10 S01001431 2001      962      19                8     0
## ..     ...       ...  ...      ...     ...              ...   ...
## Variables not shown: rep_ireland (int), other_eu (int), elsewhere (int)

dta_coo %>% group_by(year) %>% tally

## Source: local data frame [2 x 2]
## 
##   year    n
## 1 2001 2196
## 2 2011 2196

dta_coo %>% select(-1) %>% gather(key="coo", value="count", -datazone, -year) %>% 
    group_by(year, coo) %>% summarise(count=sum(count))

## Source: local data frame [14 x 3]
## Groups: year
## 
##    year              coo   count
## 1  2001         scotland 1782533
## 2  2001          england   79618
## 3  2001 northern_ireland   11064
## 4  2001            wales    3354
## 5  2001      rep_ireland   10865
## 6  2001         other_eu   12218
## 7  2001        elsewhere   45347
## 8  2011         scotland 1583216
## 9  2011          england   77173
## 10 2011 northern_ireland   10225
## 11 2011            wales    3096
## 12 2011      rep_ireland    8319
## 13 2011         other_eu   30997
## 14 2011        elsewhere   75648

The ethnic group table:

dta_eg <- read_csv("data/derived/ethnicity.csv")
dta_eg %>% group_by(year) %>% tally

## Source: local data frame [2 x 2]
## 
##   year    n
## 1 2001 2196
## 2 2011 2196

dta_eg %>% gather(key= "eg", value="count", -datazone, -year) %>% xtabs(count ~ year + eg, data= . )

##       eg
## year   wt_scot wt_nonscot     acb   asian   mixed   other
##   2001 1772730     117872    2826   43615    4476    3480
##   2011 1560354     128296   17580   71020    5968    5456

Religion table:

dta_rel <- read_csv("data/derived/rel.csv")
dta_rel

## Source: local data frame [4,392 x 8]
## 
##     datazone year cos catholic other_christian none other_religion
## 1  S01000758 2001 624      134              90  276              7
## 2  S01001423 2001 302      364              48  230              7
## 3  S01001424 2001 414      365              55  265              9
## 4  S01001425 2001 525      494              36  355              8
## 5  S01001426 2001 368      296              45  201             14
## 6  S01001427 2001 341      268              30  179              5
## 7  S01001428 2001 403      268              26  177              2
## 8  S01001429 2001 282      143              23  110              3
## 9  S01001430 2001 318      343              28  197             15
## 10 S01001431 2001 425      335              37  168             13
## ..       ...  ... ...      ...             ...  ...            ...
## Variables not shown: not_announced (int)

dta_rel %>% gather(key= "rel", value="count", -datazone, -year) %>% xtabs(count ~ year + rel, data= . )

##       rel
## year      cos catholic other_christian   none other_religion not_announced
##   2001 729342   543572           88379 409128          13317        124651
##   2011 527455   491980           69727 513576          67839        118097

Additional census variables

These variables were identified more recently.

Highest qualification

 dta_qual <- read_csv("data/derived/highest_qual.csv")
dta_qual

## Source: local data frame [4,396 x 7]
## 
##     datazone year none lvl_1 lvl_2 lvl_3 lvl_4
## 1  S01005669 2001  163   123    52    30    86
## 2  S01005669 2011  129    99    56    47   102
## 3  S01005740 2001  139   132    77    44   170
## 4  S01005740 2011  122   125    83    56   234
## 5  S01004599 2001  184    90    53    34    84
## 6  S01004599 2011  196   174   104    84   165
## 7  S01005843 2001  133   100    70    32    93
## 8  S01005843 2011  127   116    74    60   143
## 9  S01004837 2001  319   156    71    31    45
## 10 S01004837 2011  250   198    80    77   129
## ..       ...  ...  ...   ...   ...   ...   ...

dta_qual %>% gather(key="hq", value="count", -datazone, -year) %>% xtabs(count ~ year + hq, data = .)

##       hq
## year     none  lvl_1  lvl_2  lvl_3  lvl_4
##   2001 475472 304377 197507  92691 222539
##   2011 442367 327327 209479 148518 347752

Industry

dta_ind <- read_csv("data/derived/industry.csv")
dta_ind

## Source: local data frame [4,396 x 16]
## 
##     datazone year mining_and_quarrying manufacturing
## 1  S01000758 2001                    1            42
## 2  S01001423 2001                    0            54
## 3  S01001424 2001                    0            48
## 4  S01001425 2001                    1            31
## 5  S01001426 2001                    3            39
## 6  S01001427 2001                    1            36
## 7  S01001428 2001                    2            31
## 8  S01001429 2001                    1            20
## 9  S01001430 2001                    1            25
## 10 S01001431 2001                    3            62
## ..       ...  ...                  ...           ...
## Variables not shown: electricity_gas_water_supply (int), construction
##   (int), wholesale_retail_trade_repairs (int), hotels_restaurants (int),
##   transport_storage_communications (int), financial_intermediaries (int),
##   real_estate_renting_business_activities (int),
##   public_admin_defence_social_security (int), education (int),
##   health_and_social_work (int), other (int), fish_agg_hunt_forestry (int)

dta_ind %>% gather(key="industry", value="count", -datazone, -year) %>% xtabs(count ~ industry + year, data = .)

##                                          year
## industry                                    2001   2011
##   mining_and_quarrying                      2127   2389
##   manufacturing                           101695  63270
##   electricity_gas_water_supply              7689  13281
##   construction                             55249  65686
##   wholesale_retail_trade_repairs          108903 127302
##   hotels_restaurants                       36990  46463
##   transport_storage_communications         56492  71350
##   financial_intermediaries                 35953  41057
##   real_estate_renting_business_activities  84442  50038
##   public_admin_defence_social_security     50887  96897
##   education                                54316  69915
##   health_and_social_work                   94241 128723
##   other                                    38164  39642
##   fish_agg_hunt_forestry                    5519   4249

Economic activity

dta_econ <- read_csv("data/derived/economic_activity.csv")
dta_econ

## Source: local data frame [4,396 x 17]
## 
##     datazone year all_people_16_74 employees_part_time employees_full_time
## 1  S01005669 2001              454                  46                 185
## 2  S01005669 2011              405                  54                 150
## 3  S01005740 2001              562                  50                 245
## 4  S01005740 2011              588                  65                 231
## 5  S01004599 2001              445                  31                 160
## 6  S01004599 2011              685                  75                 268
## 7  S01005843 2001              428                  37                 193
## 8  S01005843 2011              493                  52                 210
## 9  S01004837 2001              622                  53                 226
## 10 S01004837 2011              698                  78                 312
## ..       ...  ...              ...                 ...                 ...
## Variables not shown: self_employed (int), unemployed (int),
##   full_time_students (int), retired (int), student (int),
##   looking_after_family_home (int), permanently_sick_disabled (int), other
##   (int), unemployed_16_24 (int), unemployed_50_over (int),
##   unemployed_never_worked (int), unemployed_long_term_unemployed (int)

dta_econ %>% gather(key="econ", value="count", -datazone, -year) %>% xtabs(count ~ econ + year, data = .)

##                                  year
## econ                                 2001    2011
##   all_people_16_74                1292586 1346000
##   employees_part_time              127870  169698
##   employees_full_time              506129  527201
##   self_employed                     63882   81275
##   unemployed                        57797   75792
##   full_time_students                40238   54548
##   retired                          173140  188719
##   student                           57490   77029
##   looking_after_family_home         74628   49932
##   permanently_sick_disabled        128566   92269
##   other                             62846   29537
##   unemployed_16_24                  16928   22526
##   unemployed_50_over                 9155   13380
##   unemployed_never_worked            6499   11810
##   unemployed_long_term_unemployed   19893   30384

Socioeconomic class

dta_sec <- read_csv("data/derived/sec_by_dz.csv")
dta_sec

## Source: local data frame [4,396 x 8]
## 
##    [EMPTY]  datazone year   I  II III IV   X
## 1        1 S01000758 2001 206 265  19 29 197
## 2        2 S01001423 2001 153 234  19 26 138
## 3        3 S01001424 2001  69 269  47 53 305
## 4        4 S01001425 2001  62 227  39 53 303
## 5        5 S01001426 2001  80 222  29 34 157
## 6        6 S01001427 2001 104 281  38 34 142
## 7        7 S01001428 2001  69 259  40 53 242
## 8        8 S01001429 2001  48 142  12 30 194
## 9        9 S01001430 2001 203 206  15 32 212
## 10      10 S01001431 2001 126 350  39 43 136
## ..     ...       ...  ... ... ... ... .. ...

dta_sec %>% gather(key="sec", value = "count", -datazone, -year) %>% xtabs(count ~ year + sec, data = .)

##       sec
## year   [EMPTY]       I      II     III      IV       X
##   2001 2421100  166910  404828   71653   89276  468475
##   2011 7243506  107357  444293  395829  181644  216877

Tables from the SNS and elsewhere

A number of tables refer to houses rather than individuals or households. These are mainly taken from the SNS website.

Dwellings by size (number of rooms) - the earliest available year is 2006, unfortunately.

dta_dwelsize <- read_csv("data/derived/dwellings_by_size.csv")
dta_dwelsize

## Source: local data frame [520,400 x 4]
## 
##     datazone year num_of_rooms count
## 1  S01000001 2006            1    10
## 2  S01000002 2006            1     0
## 3  S01000003 2006            1     0
## 4  S01000004 2006            1     0
## 5  S01000005 2006            1     0
## 6  S01000006 2006            1     8
## 7  S01000007 2006            1     0
## 8  S01000008 2006            1     0
## 9  S01000009 2006            1     0
## 10 S01000010 2006            1     0
## ..       ...  ...          ...   ...

dta_dwelsize %>% xtabs(count ~ num_of_rooms + year, data = .)

##             year
## num_of_rooms   2006   2007   2008   2009   2010   2011   2012   2013
##           1   17833  18433  18685  18664  19231  19901  21172  21638
##           2  294549 300883 301083 301275 301326 301410 301936 302018
##           3  706754 723064 727002 730865 733858 736361 739883 741849
##           4  643915 659167 661959 664419 666438 668329 670383 673791
##           5  386144 401899 405449 407842 410329 412425 414283 414947
##           6  160801 171001 175114 177686 180564 183067 185410 188535
##           7   70557  76233  78622  80085  81610  82888  84125  84796
##           8   31688  34093  35170  35797  36610  37253  37891  38617
##           9   13731  14637  15025  15330  15638  15887  16133  16355
##           10  12456  12890  13149  13326  13512  13707  13876  14007

Dwellings by council tax band

dta_band <- read_csv("data/derived/dwellings_by_band.csv")
dta_band

## Source: local data frame [24,200 x 10]
## 
##     datazone year  A   B   C   D   E   F   G H
## 1  S01000758 2003  5  10  21  33 121  94 113 7
## 2  S01001423 2003  1  30  57  94  88  73   6 0
## 3  S01001424 2003 58 159 262  59  31   1   0 0
## 4  S01001425 2003 29  50 204 103   9  10   3 0
## 5  S01001426 2003  0   2  88  83  91  13   5 0
## 6  S01001427 2003  0   0   3 130 186   7   0 0
## 7  S01001428 2003  0 128 198  11  54   3   0 0
## 8  S01001429 2003 70 114  94  35  27   5   3 0
## 9  S01001430 2003  0   2  43  35  76 107 105 0
## 10 S01001431 2003  0   0   0  97 201  34   2 0
## ..       ...  ... .. ... ... ... ... ... ... .

dta_band %>% gather(key="band", value="count", -datazone, -year) %>% xtabs(count ~ year + band, data = .)

##       band
## year        A      B      C      D      E      F      G      H
##   2003 223434 197661 132589  93880  85733  40777  26766   2503
##   2004 221129 198234 135587  96333  88716  42991  27630   2620
##   2005 217624 198617 137526  98331  91249  45242  28620   2673
##   2006 214868 198765 139307 100253  93068  47065  29436   2722
##   2007 213218 201133 139328 102105  94806  48613  29961   2749
##   2008 209969 201780 140745 104123  95663  49760  30451   2795
##   2009 207133 202348 142549 105506  96486  50551  30790   2828
##   2010 204677 202655 143860 106820  97318  51244  31070   2886
##   2011 203279 202934 145337 108107  98046  51983  31425   2963
##   2012 203264 202835 147147 108884  98756  52623  31816   3002
##   2013 200679 203048 148119 110412  99134  53226  32310   3021

Dwellings by type

dta_type <- read_csv("data/derived/dwellings_by_type.csv")
dta_type

## Source: local data frame [208,160 x 4]
## 
##     datazone year type count
## 1  S01000001 2006 flat   269
## 2  S01000002 2006 flat     0
## 3  S01000003 2006 flat    46
## 4  S01000004 2006 flat    52
## 5  S01000005 2006 flat   255
## 6  S01000006 2006 flat    95
## 7  S01000007 2006 flat    12
## 8  S01000008 2006 flat    88
## 9  S01000009 2006 flat    12
## 10 S01000010 2006 flat    16
## ..       ...  ...  ...   ...

dta_type %>% xtabs(count ~ year + type, data = .)

##       type
## year   detached   flat semidetached terraced
##   2006   490978 933084       481828   503681
##   2007   500771 940350       486071   506800
##   2008   509049 945163       489286   509695
##   2009   514480 948153       492320   512411
##   2010   519525 951654       494987   514959
##   2011   524418 954533       497263   517103
##   2012   529359 959024       499347   519579
##   2013   534294 961141       501564   521992

Demographic variables

I have data on population, by five year age groups and gender, at datazone level, based on a combination of the 2001 and 2011 censuses, and small area population estimates (SAPE) for non-census years. I have combined into a single, very large file, as well as used this file to group people into different categories based on age and gender classifications.

demo_big <- read_csv("data/derived/populations_by_age_year_sex.csv")
demo_big

## Source: local data frame [4,085,140 x 7]
## 
##     datazone year  sex age_range lower_age upper_age count
## 1  S01000001 1996 male       0_0         0         0     2
## 2  S01000002 1996 male       0_0         0         0     0
## 3  S01000003 1996 male       0_0         0         0    13
## 4  S01000004 1996 male       0_0         0         0     0
## 5  S01000005 1996 male       0_0         0         0    10
## 6  S01000006 1996 male       0_0         0         0     1
## 7  S01000007 1996 male       0_0         0         0     4
## 8  S01000008 1996 male       0_0         0         0     6
## 9  S01000009 1996 male       0_0         0         0     5
## 10 S01000010 1996 male       0_0         0         0     6
## ..       ...  ...  ...       ...       ...       ...   ...

demo_big %>% xtabs(count ~ year + age_range + sex, data = .)

## , , sex = female
## 
##       age_range
## year      0_0    0_4    1_4  10_12  10_14  10_15  13_14  15_15  15_19
##   1996  28495      0 123077  94510      0      0  62150  32318      0
##   1997  28992      0 118815  95842      0      0  61049  31265      0
##   1998  28040      0 116433  95610      0      0  62429  30561      0
##   1999  27451      0 113954  94350      0      0  64017  30393      0
##   2000  25835      0 111922  93579      0      0  63629  32125      0
##   2001      0 134514      0      0 157287      0      0      0 156338
##   2002      0 131069      0      0      0 188933      0      0      0
##   2003      0 128948      0      0      0 188062      0      0      0
##   2004      0 128617      0      0      0 186891      0      0      0
##   2005      0 129659      0      0      0 185283      0      0      0
##   2006      0 131068      0      0      0 182827      0      0      0
##   2007      0 134052      0      0      0 179978      0      0      0
##   2008      0 137820      0      0      0 176353      0      0      0
##   2009      0 140964      0      0      0 173332      0      0      0
##   2010      0 143656      0      0      0 169400      0      0      0
##   2011      0 145766      0      0      0 166683      0      0      0
##       age_range
## year    16_19  20_24  25_29  30_34  35_39  40_44  45_49    5_9  50_54
##   1996 120822 170987 198148 208113 194993 171653 178707 157866 150416
##   1997 124534 161928 192860 208070 198364 175847 171353 157868 160918
##   1998 126513 156062 185245 206727 201876 180496 168388 156698 166541
##   1999 126653 154856 177069 203848 204986 185035 167774 155695 170435
##   2000 125008 155625 169648 199590 207731 189629 168467 153159 174284
##   2001      0 157271 163191 197420 208336 193734 170544 150108 176989
##   2002 124736 159045 152332 191954 208554 197772 175329 145823 169908
##   2003 126357 160256 147171 185233 207594 201927 179977 143278 167460
##   2004 129045 161118 146876 178091 205836 206066 185256 141242 167362
##   2005 128221 164408 149237 171218 201996 209485 189987 138514 168226
##   2006 127791 168046 153891 163497 199628 210261 194619 136209 170649
##   2007 128046 171506 160211 156046 195444 210810 198548 133626 175257
##   2008 128597 174164 164884 152061 188718 209627 202651 131789 179686
##   2009 128790 176188 168375 152297 180864 206824 206174 131027 184353
##   2010 128754 177415 172523 154953 173843 203033 209099 132146 188853
##   2011 126492 180665 176112 159172 166275 200441 209974 133542 193527
##       age_range
## year    55_59  60_64  65_69  70_74  75_79  80_84 85_101  85_89 90_101
##   1996 141327 136725 130752 119202  93274  70234  61401      0      0
##   1997 141149 136071 130969 117576  97171  68085  62292      0      0
##   1998 142735 136893 130651 116942 100500  65328  63411      0      0
##   1999 144492 137680 129767 116571 103461  62795  64133      0      0
##   2000 145221 137906 129230 116918 100346  66141  65021      0      0
##   2001 147164 137082 129107 116864  99466  68634      0  42580  22888
##   2002 158653 136906 128750 117883  98146  72395      0  41440  23367
##   2003 164311 138713 129682 118134  97724  74937      0  39426  23644
##   2004 168876 141004 130619 117861  97855  77304      0  38112  24121
##   2005 172797 141859 130840 117675  98599  75369      0  40752  24566
##   2006 175422 144542 129719 117673  98732  74907      0  43228  24784
##   2007 168594 155044 129905 117522 100080  74272      0  45468  24192
##   2008 165859 160388 131669 118326 100765  74167      0  47133  23638
##   2009 165060 164333 133497 119075 100775  74604      0  48537  23643
##   2010 165873 167917 134380 119364 101222  75654      0  47745  25955
##   2011 168046 170514 137089 118572 101768  76306      0  48100  27556
## 
## , , sex = male
## 
##       age_range
## year      0_0    0_4    1_4  10_12  10_14  10_15  13_14  15_15  15_19
##   1996  30016      0 128495  98326      0      0  64181  33876      0
##   1997  30648      0 124250  99859      0      0  63419  32101      0
##   1998  29525      0 122393 100604      0      0  64741  31551      0
##   1999  28762      0 120144  99644      0      0  66608  31623      0
##   2000  27576      0 117880  98994      0      0  66682  33199      0
##   2001      0 142360      0      0 165583      0      0      0 160935
##   2002      0 137399      0      0      0 198649      0      0      0
##   2003      0 134880      0      0      0 197427      0      0      0
##   2004      0 134505      0      0      0 195527      0      0      0
##   2005      0 135541      0      0      0 193739      0      0      0
##   2006      0 137389      0      0      0 191119      0      0      0
##   2007      0 141148      0      0      0 188333      0      0      0
##   2008      0 145152      0      0      0 184729      0      0      0
##   2009      0 148025      0      0      0 181974      0      0      0
##   2010      0 149864      0      0      0 178816      0      0      0
##   2011      0 151975      0      0      0 175519      0      0      0
##       age_range
## year    16_19  20_24  25_29  30_34  35_39  40_44  45_49    5_9  50_54
##   1996 123431 169460 193276 198514 187139 168162 176466 165674 146191
##   1997 127334 160124 186130 197710 190198 171565 168794 165907 157294
##   1998 129691 153381 178086 195554 192441 174603 165810 164004 163547
##   1999 129006 152526 168959 192661 194100 178079 164792 162755 167562
##   2000 127564 153980 160324 187143 195327 181675 164889 160183 171579
##   2001      0 157116 154112 184674 194618 184176 166925 157030 174118
##   2002 129604 160995 146544 178204 194097 187643 170170 153336 166531
##   2003 132112 163465 142612 172497 192863 190336 173054 150645 163565
##   2004 135357 164301 144343 165731 191353 193056 176703 148674 162835
##   2005 135149 167139 149480 159418 187175 195029 180366 146258 162923
##   2006 135165 171019 156422 153686 185147 194867 183306 143221 164736
##   2007 135910 175304 163781 149250 179844 194658 186904 139814 168082
##   2008 135510 179889 169947 147211 174309 193261 189434 137691 170891
##   2009 135397 182709 173773 150237 167415 190907 191390 137018 174089
##   2010 134724 183869 178099 156089 161215 186661 192850 137912 177390
##   2011 131962 186473 182321 162928 155436 184202 192578 139832 180504
##       age_range
## year    55_59  60_64  65_69  70_74  75_79  80_84 85_101  85_89 90_101
##   1996 131672 121988 108452  89326  57959  34741  19675      0      0
##   1997 132063 121993 108940  88470  61235  33908  20380      0      0
##   1998 133705 123608 109114  88777  64003  32486  21367      0      0
##   1999 136279 124690 108942  89140  66703  31557  22003      0      0
##   2000 137772 125330 109408  89656  65831  34294  22640      0      0
##   2001 140835 124651 110009  90053  66057  36355      0  16661   6226
##   2002 152814 124921 110695  91259  66273  39473      0  16592   6606
##   2003 159154 126440 112326  92124  66969  41266      0  15955   6876
##   2004 163713 129468 113886  92710  67834  43239      0  15907   7106
##   2005 167535 131064 114503  93679  68789  42992      0  17853   7477
##   2006 169377 135028 113650  94702  69739  43643      0  19562   7629
##   2007 162448 145949 114360  95589  71300  44315      0  20997   7613
##   2008 159342 151994 116127  97242  72582  45182      0  22159   7553
##   2009 158114 155859 118616  98803  73474  46151      0  23425   7912
##   2010 158082 159572 120391  99722  74873  47282      0  23672   9232
##   2011 160001 161473 124444  99208  76231  48539      0  24332  10242

As you can see, the age groupings are net entirely consistent across years. I have produced a version with some consistent groupings, just for 2001 and 2011, here

demo_grps <- read_csv("data/derived/demographic_groupings.csv")
demo_grps

## Source: local data frame [4,400 x 16]
## 
##      dz_2001 year  f1 f2  f3  f4  f5 f6 f7  m1 m2  m3  m4  m5 m6 m7
## 1  S01000758 2001  75 54  80 132  74 65 20  84 46  79 131  76 64 12
## 2  S01000758 2011 101 47  72 170 131 95 38 102 62  77 157 120 91 30
## 3  S01001423 2001  88 50 133  75  42 40  9  80 37 112  86  33 23  4
## 4  S01001423 2011  72 52  86 105  37 42 16  88 30  90 113  42 24  9
## 5  S01001424 2001  87 65 148 134  58 75 32 104 69 110 100  51 40 12
## 6  S01001424 2011  91 56 108 144  59 70 33 119 79  92  93  42 46 14
## 7  S01001425 2001  93 70 121 103  57 77 15  90 53 108  87  45 49 12
## 8  S01001425 2011 214 85 167 236  87 69 27 188 87 101 216  65 53 10
## 9  S01001426 2001  65 38  90  80  46 45 43  79 35  92  77  39 38 20
## 10 S01001426 2011  58 31  60  89  47 64 36  58 40  52  74  54 46 20
## ..       ...  ... ... .. ... ... ... .. .. ... .. ... ... ... .. ..

demo_grps %>% gather(key="grp", value="count", -dz_2001, -year) %>% xtabs(count ~ year + grp, data = .)

##       grp
## year       f1     f2     f3     f4     f5     f6     f7     m1     m2
##   2001 154712 114836 205295 184612  96175 118928  43198 163866 113696
##   2011 153793 104959 180282 207995 108624 115699  48244 160325 110443
##       grp
## year       m3     m4     m5     m6     m7
##   2001 186314 176185  87259  87198  17872
##   2011 180888 186988 100827  90584  24462

# The code used to produce the groupings:
#         f1 = female_0_4 + female_5_9 + female_10_14 + female_10_15,
#         f2 = female_15_19 + female_16_19 + female_20_24,
#         f3 = female_25_29 + female_30_34 + female_35_39,
#         f4 = female_40_44 + female_45_49 + female_50_54,
#         f5 = female_55_59 + female_60_64,
#         f6 = female_65_69 + female_70_74 + female_75_79,
#         f7 = female_80_84 + female_85_89 + female_90_101,
#         
#         m1 = male_0_4 + male_5_9 + male_10_14 + male_10_15,
#         m2 = male_15_19 + male_16_19 + male_20_24,
#         m3 = male_25_29 + male_30_34 + male_35_39,
#         m4 = male_40_44 + male_45_49 + male_50_54,
#         m5 = male_55_59 + male_60_64,
#         m6 = male_65_69 + male_70_74 + male_75_79,
#         m7 = male_80_84 + male_85_89 + male_90_101

Quirkier tables

I have also extracted data from the Postcode Address File (PAF) on different types of building use. The PAF distinguishes between

address points: residential
delivery points: any use
small business points: small businesses

The number of delivery points less the number of address points and small business points should indicate the number of large businesses. But in practice (I think) categories are not mutually exclusive, so correctly using the PAF can be tricky…

bld_use <- read_csv("data/derived/building_use.csv")
bld_use

## Source: local data frame [4,400 x 5]
## 
##     datazone year address deliverypoint smallbus
## 1  S01005669 2001     325           344       19
## 2  S01005669 2010     325           344       19
## 3  S01005740 2001     336           350       14
## 4  S01005740 2010     336           350       14
## 5  S01004599 2001     286           294        9
## 6  S01004599 2010     286           294        9
## 7  S01005843 2001     269           275        6
## 8  S01005843 2010     269           275        6
## 9  S01004837 2001     318           327        9
## 10 S01004837 2010     318           327        9
## ..       ...  ...     ...           ...      ...

bld_use %>% gather(key="type", value="count", -datazone, -year) %>% xtabs(count ~ year + type, data = . )

##       type
## year   address deliverypoint smallbus
##   2001  862834        893111    30328
##   2010  862834        893111    30328

Other variables

A very large number of variables have been extracted and rearranged from the SNS. These are available on Dropbox.

Data for dissimilarity calculations

Jon Minton

16 June 2015