A number of datasets have been identified that can be used to calculate dissimilarity scores using. These data sets are derived mainly from the 2001 and 2011 censuses, and the Scottish Neighbourhood Statistics (SNS) website. Where possible, I have used data relating to 2001 and 2011 for comparability with census variables. The tables have been prepared mainly for the 2196 datazones that comparise a definition of Greater Glasgow based on health boards. However in principle all tables could be prepared for the whole of Scotland.
These tables were first found and used as part of the dissimilarity inference app, and are available to select within the latest version of the Shiny app.
require(readr)
## Loading required package: readr
## Warning: package 'readr' was built under R version 3.1.3
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(tidyr)
## Loading required package: tidyr
dta_coo <- read_csv("data/derived/coo.csv")
dta_coo
## Source: local data frame [4,392 x 10]
##
## [EMPTY] datazone year scotland england northern_ireland wales
## 1 1 S01000758 2001 943 174 5 2
## 2 2 S01001423 2001 923 37 5 1
## 3 3 S01001424 2001 1123 35 7 0
## 4 4 S01001425 2001 1400 66 0 1
## 5 5 S01001426 2001 899 40 4 0
## 6 6 S01001427 2001 851 24 4 0
## 7 7 S01001428 2001 959 31 1 1
## 8 8 S01001429 2001 578 16 3 0
## 9 9 S01001430 2001 877 43 1 2
## 10 10 S01001431 2001 962 19 8 0
## .. ... ... ... ... ... ... ...
## Variables not shown: rep_ireland (int), other_eu (int), elsewhere (int)
dta_coo %>% group_by(year) %>% tally
## Source: local data frame [2 x 2]
##
## year n
## 1 2001 2196
## 2 2011 2196
dta_coo %>% select(-1) %>% gather(key="coo", value="count", -datazone, -year) %>%
group_by(year, coo) %>% summarise(count=sum(count))
## Source: local data frame [14 x 3]
## Groups: year
##
## year coo count
## 1 2001 scotland 1782533
## 2 2001 england 79618
## 3 2001 northern_ireland 11064
## 4 2001 wales 3354
## 5 2001 rep_ireland 10865
## 6 2001 other_eu 12218
## 7 2001 elsewhere 45347
## 8 2011 scotland 1583216
## 9 2011 england 77173
## 10 2011 northern_ireland 10225
## 11 2011 wales 3096
## 12 2011 rep_ireland 8319
## 13 2011 other_eu 30997
## 14 2011 elsewhere 75648
dta_eg <- read_csv("data/derived/ethnicity.csv")
dta_eg %>% group_by(year) %>% tally
## Source: local data frame [2 x 2]
##
## year n
## 1 2001 2196
## 2 2011 2196
dta_eg %>% gather(key= "eg", value="count", -datazone, -year) %>% xtabs(count ~ year + eg, data= . )
## eg
## year wt_scot wt_nonscot acb asian mixed other
## 2001 1772730 117872 2826 43615 4476 3480
## 2011 1560354 128296 17580 71020 5968 5456
dta_rel <- read_csv("data/derived/rel.csv")
dta_rel
## Source: local data frame [4,392 x 8]
##
## datazone year cos catholic other_christian none other_religion
## 1 S01000758 2001 624 134 90 276 7
## 2 S01001423 2001 302 364 48 230 7
## 3 S01001424 2001 414 365 55 265 9
## 4 S01001425 2001 525 494 36 355 8
## 5 S01001426 2001 368 296 45 201 14
## 6 S01001427 2001 341 268 30 179 5
## 7 S01001428 2001 403 268 26 177 2
## 8 S01001429 2001 282 143 23 110 3
## 9 S01001430 2001 318 343 28 197 15
## 10 S01001431 2001 425 335 37 168 13
## .. ... ... ... ... ... ... ...
## Variables not shown: not_announced (int)
dta_rel %>% gather(key= "rel", value="count", -datazone, -year) %>% xtabs(count ~ year + rel, data= . )
## rel
## year cos catholic other_christian none other_religion not_announced
## 2001 729342 543572 88379 409128 13317 124651
## 2011 527455 491980 69727 513576 67839 118097
These variables were identified more recently.
dta_qual <- read_csv("data/derived/highest_qual.csv")
dta_qual
## Source: local data frame [4,396 x 7]
##
## datazone year none lvl_1 lvl_2 lvl_3 lvl_4
## 1 S01005669 2001 163 123 52 30 86
## 2 S01005669 2011 129 99 56 47 102
## 3 S01005740 2001 139 132 77 44 170
## 4 S01005740 2011 122 125 83 56 234
## 5 S01004599 2001 184 90 53 34 84
## 6 S01004599 2011 196 174 104 84 165
## 7 S01005843 2001 133 100 70 32 93
## 8 S01005843 2011 127 116 74 60 143
## 9 S01004837 2001 319 156 71 31 45
## 10 S01004837 2011 250 198 80 77 129
## .. ... ... ... ... ... ... ...
dta_qual %>% gather(key="hq", value="count", -datazone, -year) %>% xtabs(count ~ year + hq, data = .)
## hq
## year none lvl_1 lvl_2 lvl_3 lvl_4
## 2001 475472 304377 197507 92691 222539
## 2011 442367 327327 209479 148518 347752
dta_ind <- read_csv("data/derived/industry.csv")
dta_ind
## Source: local data frame [4,396 x 16]
##
## datazone year mining_and_quarrying manufacturing
## 1 S01000758 2001 1 42
## 2 S01001423 2001 0 54
## 3 S01001424 2001 0 48
## 4 S01001425 2001 1 31
## 5 S01001426 2001 3 39
## 6 S01001427 2001 1 36
## 7 S01001428 2001 2 31
## 8 S01001429 2001 1 20
## 9 S01001430 2001 1 25
## 10 S01001431 2001 3 62
## .. ... ... ... ...
## Variables not shown: electricity_gas_water_supply (int), construction
## (int), wholesale_retail_trade_repairs (int), hotels_restaurants (int),
## transport_storage_communications (int), financial_intermediaries (int),
## real_estate_renting_business_activities (int),
## public_admin_defence_social_security (int), education (int),
## health_and_social_work (int), other (int), fish_agg_hunt_forestry (int)
dta_ind %>% gather(key="industry", value="count", -datazone, -year) %>% xtabs(count ~ industry + year, data = .)
## year
## industry 2001 2011
## mining_and_quarrying 2127 2389
## manufacturing 101695 63270
## electricity_gas_water_supply 7689 13281
## construction 55249 65686
## wholesale_retail_trade_repairs 108903 127302
## hotels_restaurants 36990 46463
## transport_storage_communications 56492 71350
## financial_intermediaries 35953 41057
## real_estate_renting_business_activities 84442 50038
## public_admin_defence_social_security 50887 96897
## education 54316 69915
## health_and_social_work 94241 128723
## other 38164 39642
## fish_agg_hunt_forestry 5519 4249
dta_econ <- read_csv("data/derived/economic_activity.csv")
dta_econ
## Source: local data frame [4,396 x 17]
##
## datazone year all_people_16_74 employees_part_time employees_full_time
## 1 S01005669 2001 454 46 185
## 2 S01005669 2011 405 54 150
## 3 S01005740 2001 562 50 245
## 4 S01005740 2011 588 65 231
## 5 S01004599 2001 445 31 160
## 6 S01004599 2011 685 75 268
## 7 S01005843 2001 428 37 193
## 8 S01005843 2011 493 52 210
## 9 S01004837 2001 622 53 226
## 10 S01004837 2011 698 78 312
## .. ... ... ... ... ...
## Variables not shown: self_employed (int), unemployed (int),
## full_time_students (int), retired (int), student (int),
## looking_after_family_home (int), permanently_sick_disabled (int), other
## (int), unemployed_16_24 (int), unemployed_50_over (int),
## unemployed_never_worked (int), unemployed_long_term_unemployed (int)
dta_econ %>% gather(key="econ", value="count", -datazone, -year) %>% xtabs(count ~ econ + year, data = .)
## year
## econ 2001 2011
## all_people_16_74 1292586 1346000
## employees_part_time 127870 169698
## employees_full_time 506129 527201
## self_employed 63882 81275
## unemployed 57797 75792
## full_time_students 40238 54548
## retired 173140 188719
## student 57490 77029
## looking_after_family_home 74628 49932
## permanently_sick_disabled 128566 92269
## other 62846 29537
## unemployed_16_24 16928 22526
## unemployed_50_over 9155 13380
## unemployed_never_worked 6499 11810
## unemployed_long_term_unemployed 19893 30384
dta_sec <- read_csv("data/derived/sec_by_dz.csv")
dta_sec
## Source: local data frame [4,396 x 8]
##
## [EMPTY] datazone year I II III IV X
## 1 1 S01000758 2001 206 265 19 29 197
## 2 2 S01001423 2001 153 234 19 26 138
## 3 3 S01001424 2001 69 269 47 53 305
## 4 4 S01001425 2001 62 227 39 53 303
## 5 5 S01001426 2001 80 222 29 34 157
## 6 6 S01001427 2001 104 281 38 34 142
## 7 7 S01001428 2001 69 259 40 53 242
## 8 8 S01001429 2001 48 142 12 30 194
## 9 9 S01001430 2001 203 206 15 32 212
## 10 10 S01001431 2001 126 350 39 43 136
## .. ... ... ... ... ... ... .. ...
dta_sec %>% gather(key="sec", value = "count", -datazone, -year) %>% xtabs(count ~ year + sec, data = .)
## sec
## year [EMPTY] I II III IV X
## 2001 2421100 166910 404828 71653 89276 468475
## 2011 7243506 107357 444293 395829 181644 216877
A number of tables refer to houses rather than individuals or households. These are mainly taken from the SNS website.
dta_dwelsize <- read_csv("data/derived/dwellings_by_size.csv")
dta_dwelsize
## Source: local data frame [520,400 x 4]
##
## datazone year num_of_rooms count
## 1 S01000001 2006 1 10
## 2 S01000002 2006 1 0
## 3 S01000003 2006 1 0
## 4 S01000004 2006 1 0
## 5 S01000005 2006 1 0
## 6 S01000006 2006 1 8
## 7 S01000007 2006 1 0
## 8 S01000008 2006 1 0
## 9 S01000009 2006 1 0
## 10 S01000010 2006 1 0
## .. ... ... ... ...
dta_dwelsize %>% xtabs(count ~ num_of_rooms + year, data = .)
## year
## num_of_rooms 2006 2007 2008 2009 2010 2011 2012 2013
## 1 17833 18433 18685 18664 19231 19901 21172 21638
## 2 294549 300883 301083 301275 301326 301410 301936 302018
## 3 706754 723064 727002 730865 733858 736361 739883 741849
## 4 643915 659167 661959 664419 666438 668329 670383 673791
## 5 386144 401899 405449 407842 410329 412425 414283 414947
## 6 160801 171001 175114 177686 180564 183067 185410 188535
## 7 70557 76233 78622 80085 81610 82888 84125 84796
## 8 31688 34093 35170 35797 36610 37253 37891 38617
## 9 13731 14637 15025 15330 15638 15887 16133 16355
## 10 12456 12890 13149 13326 13512 13707 13876 14007
dta_band <- read_csv("data/derived/dwellings_by_band.csv")
dta_band
## Source: local data frame [24,200 x 10]
##
## datazone year A B C D E F G H
## 1 S01000758 2003 5 10 21 33 121 94 113 7
## 2 S01001423 2003 1 30 57 94 88 73 6 0
## 3 S01001424 2003 58 159 262 59 31 1 0 0
## 4 S01001425 2003 29 50 204 103 9 10 3 0
## 5 S01001426 2003 0 2 88 83 91 13 5 0
## 6 S01001427 2003 0 0 3 130 186 7 0 0
## 7 S01001428 2003 0 128 198 11 54 3 0 0
## 8 S01001429 2003 70 114 94 35 27 5 3 0
## 9 S01001430 2003 0 2 43 35 76 107 105 0
## 10 S01001431 2003 0 0 0 97 201 34 2 0
## .. ... ... .. ... ... ... ... ... ... .
dta_band %>% gather(key="band", value="count", -datazone, -year) %>% xtabs(count ~ year + band, data = .)
## band
## year A B C D E F G H
## 2003 223434 197661 132589 93880 85733 40777 26766 2503
## 2004 221129 198234 135587 96333 88716 42991 27630 2620
## 2005 217624 198617 137526 98331 91249 45242 28620 2673
## 2006 214868 198765 139307 100253 93068 47065 29436 2722
## 2007 213218 201133 139328 102105 94806 48613 29961 2749
## 2008 209969 201780 140745 104123 95663 49760 30451 2795
## 2009 207133 202348 142549 105506 96486 50551 30790 2828
## 2010 204677 202655 143860 106820 97318 51244 31070 2886
## 2011 203279 202934 145337 108107 98046 51983 31425 2963
## 2012 203264 202835 147147 108884 98756 52623 31816 3002
## 2013 200679 203048 148119 110412 99134 53226 32310 3021
dta_type <- read_csv("data/derived/dwellings_by_type.csv")
dta_type
## Source: local data frame [208,160 x 4]
##
## datazone year type count
## 1 S01000001 2006 flat 269
## 2 S01000002 2006 flat 0
## 3 S01000003 2006 flat 46
## 4 S01000004 2006 flat 52
## 5 S01000005 2006 flat 255
## 6 S01000006 2006 flat 95
## 7 S01000007 2006 flat 12
## 8 S01000008 2006 flat 88
## 9 S01000009 2006 flat 12
## 10 S01000010 2006 flat 16
## .. ... ... ... ...
dta_type %>% xtabs(count ~ year + type, data = .)
## type
## year detached flat semidetached terraced
## 2006 490978 933084 481828 503681
## 2007 500771 940350 486071 506800
## 2008 509049 945163 489286 509695
## 2009 514480 948153 492320 512411
## 2010 519525 951654 494987 514959
## 2011 524418 954533 497263 517103
## 2012 529359 959024 499347 519579
## 2013 534294 961141 501564 521992
I have data on population, by five year age groups and gender, at datazone level, based on a combination of the 2001 and 2011 censuses, and small area population estimates (SAPE) for non-census years. I have combined into a single, very large file, as well as used this file to group people into different categories based on age and gender classifications.
demo_big <- read_csv("data/derived/populations_by_age_year_sex.csv")
demo_big
## Source: local data frame [4,085,140 x 7]
##
## datazone year sex age_range lower_age upper_age count
## 1 S01000001 1996 male 0_0 0 0 2
## 2 S01000002 1996 male 0_0 0 0 0
## 3 S01000003 1996 male 0_0 0 0 13
## 4 S01000004 1996 male 0_0 0 0 0
## 5 S01000005 1996 male 0_0 0 0 10
## 6 S01000006 1996 male 0_0 0 0 1
## 7 S01000007 1996 male 0_0 0 0 4
## 8 S01000008 1996 male 0_0 0 0 6
## 9 S01000009 1996 male 0_0 0 0 5
## 10 S01000010 1996 male 0_0 0 0 6
## .. ... ... ... ... ... ... ...
demo_big %>% xtabs(count ~ year + age_range + sex, data = .)
## , , sex = female
##
## age_range
## year 0_0 0_4 1_4 10_12 10_14 10_15 13_14 15_15 15_19
## 1996 28495 0 123077 94510 0 0 62150 32318 0
## 1997 28992 0 118815 95842 0 0 61049 31265 0
## 1998 28040 0 116433 95610 0 0 62429 30561 0
## 1999 27451 0 113954 94350 0 0 64017 30393 0
## 2000 25835 0 111922 93579 0 0 63629 32125 0
## 2001 0 134514 0 0 157287 0 0 0 156338
## 2002 0 131069 0 0 0 188933 0 0 0
## 2003 0 128948 0 0 0 188062 0 0 0
## 2004 0 128617 0 0 0 186891 0 0 0
## 2005 0 129659 0 0 0 185283 0 0 0
## 2006 0 131068 0 0 0 182827 0 0 0
## 2007 0 134052 0 0 0 179978 0 0 0
## 2008 0 137820 0 0 0 176353 0 0 0
## 2009 0 140964 0 0 0 173332 0 0 0
## 2010 0 143656 0 0 0 169400 0 0 0
## 2011 0 145766 0 0 0 166683 0 0 0
## age_range
## year 16_19 20_24 25_29 30_34 35_39 40_44 45_49 5_9 50_54
## 1996 120822 170987 198148 208113 194993 171653 178707 157866 150416
## 1997 124534 161928 192860 208070 198364 175847 171353 157868 160918
## 1998 126513 156062 185245 206727 201876 180496 168388 156698 166541
## 1999 126653 154856 177069 203848 204986 185035 167774 155695 170435
## 2000 125008 155625 169648 199590 207731 189629 168467 153159 174284
## 2001 0 157271 163191 197420 208336 193734 170544 150108 176989
## 2002 124736 159045 152332 191954 208554 197772 175329 145823 169908
## 2003 126357 160256 147171 185233 207594 201927 179977 143278 167460
## 2004 129045 161118 146876 178091 205836 206066 185256 141242 167362
## 2005 128221 164408 149237 171218 201996 209485 189987 138514 168226
## 2006 127791 168046 153891 163497 199628 210261 194619 136209 170649
## 2007 128046 171506 160211 156046 195444 210810 198548 133626 175257
## 2008 128597 174164 164884 152061 188718 209627 202651 131789 179686
## 2009 128790 176188 168375 152297 180864 206824 206174 131027 184353
## 2010 128754 177415 172523 154953 173843 203033 209099 132146 188853
## 2011 126492 180665 176112 159172 166275 200441 209974 133542 193527
## age_range
## year 55_59 60_64 65_69 70_74 75_79 80_84 85_101 85_89 90_101
## 1996 141327 136725 130752 119202 93274 70234 61401 0 0
## 1997 141149 136071 130969 117576 97171 68085 62292 0 0
## 1998 142735 136893 130651 116942 100500 65328 63411 0 0
## 1999 144492 137680 129767 116571 103461 62795 64133 0 0
## 2000 145221 137906 129230 116918 100346 66141 65021 0 0
## 2001 147164 137082 129107 116864 99466 68634 0 42580 22888
## 2002 158653 136906 128750 117883 98146 72395 0 41440 23367
## 2003 164311 138713 129682 118134 97724 74937 0 39426 23644
## 2004 168876 141004 130619 117861 97855 77304 0 38112 24121
## 2005 172797 141859 130840 117675 98599 75369 0 40752 24566
## 2006 175422 144542 129719 117673 98732 74907 0 43228 24784
## 2007 168594 155044 129905 117522 100080 74272 0 45468 24192
## 2008 165859 160388 131669 118326 100765 74167 0 47133 23638
## 2009 165060 164333 133497 119075 100775 74604 0 48537 23643
## 2010 165873 167917 134380 119364 101222 75654 0 47745 25955
## 2011 168046 170514 137089 118572 101768 76306 0 48100 27556
##
## , , sex = male
##
## age_range
## year 0_0 0_4 1_4 10_12 10_14 10_15 13_14 15_15 15_19
## 1996 30016 0 128495 98326 0 0 64181 33876 0
## 1997 30648 0 124250 99859 0 0 63419 32101 0
## 1998 29525 0 122393 100604 0 0 64741 31551 0
## 1999 28762 0 120144 99644 0 0 66608 31623 0
## 2000 27576 0 117880 98994 0 0 66682 33199 0
## 2001 0 142360 0 0 165583 0 0 0 160935
## 2002 0 137399 0 0 0 198649 0 0 0
## 2003 0 134880 0 0 0 197427 0 0 0
## 2004 0 134505 0 0 0 195527 0 0 0
## 2005 0 135541 0 0 0 193739 0 0 0
## 2006 0 137389 0 0 0 191119 0 0 0
## 2007 0 141148 0 0 0 188333 0 0 0
## 2008 0 145152 0 0 0 184729 0 0 0
## 2009 0 148025 0 0 0 181974 0 0 0
## 2010 0 149864 0 0 0 178816 0 0 0
## 2011 0 151975 0 0 0 175519 0 0 0
## age_range
## year 16_19 20_24 25_29 30_34 35_39 40_44 45_49 5_9 50_54
## 1996 123431 169460 193276 198514 187139 168162 176466 165674 146191
## 1997 127334 160124 186130 197710 190198 171565 168794 165907 157294
## 1998 129691 153381 178086 195554 192441 174603 165810 164004 163547
## 1999 129006 152526 168959 192661 194100 178079 164792 162755 167562
## 2000 127564 153980 160324 187143 195327 181675 164889 160183 171579
## 2001 0 157116 154112 184674 194618 184176 166925 157030 174118
## 2002 129604 160995 146544 178204 194097 187643 170170 153336 166531
## 2003 132112 163465 142612 172497 192863 190336 173054 150645 163565
## 2004 135357 164301 144343 165731 191353 193056 176703 148674 162835
## 2005 135149 167139 149480 159418 187175 195029 180366 146258 162923
## 2006 135165 171019 156422 153686 185147 194867 183306 143221 164736
## 2007 135910 175304 163781 149250 179844 194658 186904 139814 168082
## 2008 135510 179889 169947 147211 174309 193261 189434 137691 170891
## 2009 135397 182709 173773 150237 167415 190907 191390 137018 174089
## 2010 134724 183869 178099 156089 161215 186661 192850 137912 177390
## 2011 131962 186473 182321 162928 155436 184202 192578 139832 180504
## age_range
## year 55_59 60_64 65_69 70_74 75_79 80_84 85_101 85_89 90_101
## 1996 131672 121988 108452 89326 57959 34741 19675 0 0
## 1997 132063 121993 108940 88470 61235 33908 20380 0 0
## 1998 133705 123608 109114 88777 64003 32486 21367 0 0
## 1999 136279 124690 108942 89140 66703 31557 22003 0 0
## 2000 137772 125330 109408 89656 65831 34294 22640 0 0
## 2001 140835 124651 110009 90053 66057 36355 0 16661 6226
## 2002 152814 124921 110695 91259 66273 39473 0 16592 6606
## 2003 159154 126440 112326 92124 66969 41266 0 15955 6876
## 2004 163713 129468 113886 92710 67834 43239 0 15907 7106
## 2005 167535 131064 114503 93679 68789 42992 0 17853 7477
## 2006 169377 135028 113650 94702 69739 43643 0 19562 7629
## 2007 162448 145949 114360 95589 71300 44315 0 20997 7613
## 2008 159342 151994 116127 97242 72582 45182 0 22159 7553
## 2009 158114 155859 118616 98803 73474 46151 0 23425 7912
## 2010 158082 159572 120391 99722 74873 47282 0 23672 9232
## 2011 160001 161473 124444 99208 76231 48539 0 24332 10242
As you can see, the age groupings are net entirely consistent across years. I have produced a version with some consistent groupings, just for 2001 and 2011, here
demo_grps <- read_csv("data/derived/demographic_groupings.csv")
demo_grps
## Source: local data frame [4,400 x 16]
##
## dz_2001 year f1 f2 f3 f4 f5 f6 f7 m1 m2 m3 m4 m5 m6 m7
## 1 S01000758 2001 75 54 80 132 74 65 20 84 46 79 131 76 64 12
## 2 S01000758 2011 101 47 72 170 131 95 38 102 62 77 157 120 91 30
## 3 S01001423 2001 88 50 133 75 42 40 9 80 37 112 86 33 23 4
## 4 S01001423 2011 72 52 86 105 37 42 16 88 30 90 113 42 24 9
## 5 S01001424 2001 87 65 148 134 58 75 32 104 69 110 100 51 40 12
## 6 S01001424 2011 91 56 108 144 59 70 33 119 79 92 93 42 46 14
## 7 S01001425 2001 93 70 121 103 57 77 15 90 53 108 87 45 49 12
## 8 S01001425 2011 214 85 167 236 87 69 27 188 87 101 216 65 53 10
## 9 S01001426 2001 65 38 90 80 46 45 43 79 35 92 77 39 38 20
## 10 S01001426 2011 58 31 60 89 47 64 36 58 40 52 74 54 46 20
## .. ... ... ... .. ... ... ... .. .. ... .. ... ... ... .. ..
demo_grps %>% gather(key="grp", value="count", -dz_2001, -year) %>% xtabs(count ~ year + grp, data = .)
## grp
## year f1 f2 f3 f4 f5 f6 f7 m1 m2
## 2001 154712 114836 205295 184612 96175 118928 43198 163866 113696
## 2011 153793 104959 180282 207995 108624 115699 48244 160325 110443
## grp
## year m3 m4 m5 m6 m7
## 2001 186314 176185 87259 87198 17872
## 2011 180888 186988 100827 90584 24462
# The code used to produce the groupings:
# f1 = female_0_4 + female_5_9 + female_10_14 + female_10_15,
# f2 = female_15_19 + female_16_19 + female_20_24,
# f3 = female_25_29 + female_30_34 + female_35_39,
# f4 = female_40_44 + female_45_49 + female_50_54,
# f5 = female_55_59 + female_60_64,
# f6 = female_65_69 + female_70_74 + female_75_79,
# f7 = female_80_84 + female_85_89 + female_90_101,
#
# m1 = male_0_4 + male_5_9 + male_10_14 + male_10_15,
# m2 = male_15_19 + male_16_19 + male_20_24,
# m3 = male_25_29 + male_30_34 + male_35_39,
# m4 = male_40_44 + male_45_49 + male_50_54,
# m5 = male_55_59 + male_60_64,
# m6 = male_65_69 + male_70_74 + male_75_79,
# m7 = male_80_84 + male_85_89 + male_90_101
I have also extracted data from the Postcode Address File (PAF) on different types of building use. The PAF distinguishes between
The number of delivery points less the number of address points and small business points should indicate the number of large businesses. But in practice (I think) categories are not mutually exclusive, so correctly using the PAF can be tricky…
bld_use <- read_csv("data/derived/building_use.csv")
bld_use
## Source: local data frame [4,400 x 5]
##
## datazone year address deliverypoint smallbus
## 1 S01005669 2001 325 344 19
## 2 S01005669 2010 325 344 19
## 3 S01005740 2001 336 350 14
## 4 S01005740 2010 336 350 14
## 5 S01004599 2001 286 294 9
## 6 S01004599 2010 286 294 9
## 7 S01005843 2001 269 275 6
## 8 S01005843 2010 269 275 6
## 9 S01004837 2001 318 327 9
## 10 S01004837 2010 318 327 9
## .. ... ... ... ... ...
bld_use %>% gather(key="type", value="count", -datazone, -year) %>% xtabs(count ~ year + type, data = . )
## type
## year address deliverypoint smallbus
## 2001 862834 893111 30328
## 2010 862834 893111 30328
A very large number of variables have been extracted and rearranged from the SNS. These are available on Dropbox.