\( \newcommand{\bm}[1]{\boldsymbol{#1}} \)

Preface

Aim

The aims of this document are to:

  1. Briefly explain the ABS and non-ABS spatial structures defined by the Australian Statistical Geography Standard (ASGS);
  2. Investigate what ABS and non-ABS boundaries are provided by the absmapsdata R package; and
  3. Demonstrate how to define several non-ASGS structures.

Aim 3 is purposefully developed with the Victorian Department of Health and Human Services in mind. However, the approaches should be useful to a wider audience.

If you find any mistakes or have any questions about the content, please contact me at earl.w.duncan@gmail.com.

List of Acronyms and Abbreviations

ABS \(\hspace{2.0em}\) Australian Bureau of Statistics
ASGS \(\hspace{1.4em}\) Australian Statistical Geography Standard
CRAN \(\hspace{1.2em}\) Comprehensive R Archive Network
DHHS \(\hspace{1.2em}\) Department of Health and Human Services (Victoria)
ERP \(\hspace{2.1em}\) estimated residential population
LGA \(\hspace{1.9em}\) local government area
LPHU \(\hspace{1.3em}\) local public health unit
PHN \(\hspace{2.0em}\) primary health networks
POA \(\hspace{2.0em}\) postal area
SA1 \(\hspace{2.0em}\) statistical area level 1
SA2 \(\hspace{2.0em}\) statistical area level 2
SA3 \(\hspace{2.0em}\) statistical area level 3
SA4 \(\hspace{2.0em}\) statistical area level 4

R Libraries

The following packages are used extensively in the R code throughout this document. These packages must be installed and loaded:

library(sf)             # For handling sd objects
library(tidyverse)
library(gridExtra)

Additional packages will be introduced later as required.

1 Spatial Structures

1.1 ASGS Structures

From the ABS website:

"The Australian Statistical Geography Standard (ASGS) provides a framework of statistical areas used by the Australian Bureau of Statistics (ABS) and other organisations to enable the publication of statistics that are comparable and spatially integrated. ... The ASGS provides users with an integrated set of standard areas that can be used for analysing, visualising and integrating statistics produced by the ABS and other organisations."

The ASGS defines two types of spatial structures:

  • ABS Structures -- statistical areas designed specifically for outputting statistics (i.e. "meet the requirements of specific statistical collections as well as geographic concepts relevant to those statistics such as remoteness and urban/rural definitions"). This includes mesh blocks and statistical areas (SA1, SA2, SA3).
  • Non-ABS Structures -- administrative areas, e.g. local government areas (LGAs) and postal areas (POAs). While these are based on ABS structures like mesh blocks and statistical areas, they are not defined by the ABS. However, the ABS may release statistics for these non-ABS structures, e.g. estimated resident population (ERP) which is provided for LGAs.

For a visual illustration of how these structures relate to each other, visit the ABS webpage on the ASGS.

1.2 Non-ASGS Structures

There are several non-ASGS structures that may be of interest to some working in the DHHS. These are neither ABS nor non-ABS structures.

These include the following:

  • DHHS regions -- a fairly old structure that is centred around schools and early childhood services for administrative purposes, defined by the Department of Health and Human Services (DHHS), Victoria. This structure is only applicable to the state of Victoria, although similar regions may be defined in other states. Refer here for more details.
  • Primary health networks (PHNs) -- boundaries that generally align with local hospital networks or equivalents, or clusters of local hospital networks. Refer here for more details.
  • Local public health units (LPHUs) -- TBA

Here is an rough illustration of how these non-ASGS structures connect to ASGS structures:

2 Acquiring ASGS Spatial Structures

There are several ways to acquire ASGS spatial structures. The ABS and non-ABS spatial structures can be downloaded from the ABS website in the CSV or ESRI shapefile formats. See the ABS ASGS digital boundaries webpage. (After selecting a Volume, click on the Downloads tab to find the specific data files.)

A handy alternative to acquiring these spatial structures is provided by the absmapsdata R package. This package provides (most) ABS and non-ABS spatial structures as simple features (sf) objects.

There is a useful package on GitHub,

This package is located on GitHub, not on the Comprehensive R Archive Network (CRAN). This package can be installed directly from within R using the following code (note that the devtools library must be installed (from CRAN) and loaded first):

This may take several minutes to install.

library(devtools)       # For install_github()
devtools::install_github("wfmackey/absmapsdata")

If you have any difficulties installing devtools or the absmapsdata package, please refer to this troubleshooting manual.

Once installed, this package must be loaded.

library(absmapsdata)

For more information about the absmapsdata package and how to get started, visit the github webpage. The following section should also serve as a guide on how to use the maps loaded with this package.

2.1 Exploring absmapsdata

2.1.1 For States/Territories

For illustration, I will focus on 6 key types of areas: SA2s, SA3s, SA4s, POAs, LGAs, and remoteness areas.

To find a list of the geospatial data files contained within the absmapsdata package, use:

data(package = "absmapsdata")
Data sets in package 'absmapsdata':

ced2018                      Commonwelath Electoral Districts, 2018
dz2011                       Destination Zones, 2011
dz2016                       Destination Zones, 2016
gcc2011                      Greater Capital Cities, 2011
gcc2016                      Greater Capital Cities, 2016
lga2011                      Local Government Areas, 2011
lga2016                      Local Government Areas, 2016
lga2018                      Local Government Areas, 2018
postcode2016                 Postcodes, 2016
ra2011                       Remoteness areas, 2011
ra2016                       Remoteness areas, 2016
regional_ivi2008             Regional IVI, 2008
sa12011                      Statistical Area 1, 2011
sa12016                      Statistical Area 1, 2016
sa22011                      Statistical Area 2, 2011
sa22016                      Statistical Area 2, 2016
sa32011                      Statistical Area 3, 2011
sa32016                      Statistical Area 3, 2016
sa42011                      Statistical Area 4, 2011
sa42016                      Statistical Area 4, 2016
sed2018                      State Electoral Districts, 2018
state2011                    States, 2011
state2016                    States, 2016

Due to the hierarchy of the ABS spatial structures up till the state/territory level, and the corresponding derivations of the non-ABS units, plotting these structures for an entire state/territory is staightforward.

For example, to acquire the 6 spatial structures list above for the state of Victoria, we use the state name and rely on the knowledge that the postcode starts with a "3".

# ABS main structures
dat.sa2 <- sa22016 %>%
    filter(state_name_2016 == "Victoria")
dat.sa3 <- sa32016 %>%
    filter(state_name_2016 == "Victoria")
dat.sa4 <- sa42016 %>%
    filter(state_name_2016 == "Victoria")
# Other ABS and non-ABS structures
dat.poa <- postcode2016 %>%
    filter(substr(postcode_num_2016, 0, 1) == "3")
dat.lga <- lga2018 %>%
    filter(state_name_2016  == "Victoria")
dat.ra <- ra2016 %>%
    filter(state == "Victoria")

Here is a comparison of these ASGS boundaries:

# Base map with desired theme/properties
gg.base <- ggplot() + 
    theme_void()

# Specific maps
map.sa2 <- gg.base +
    geom_sf(data = dat.sa2, aes(geometry = geometry)) + 
    ggtitle("SA2") 
map.sa3 <- gg.base + 
    geom_sf(data = dat.sa3, aes(geometry = geometry)) + 
    ggtitle("SA3") 
map.sa4 <- gg.base + 
    geom_sf(data = dat.sa4, aes(geometry = geometry)) + 
    ggtitle("SA4")

map.ra <- gg.base + 
    geom_sf(data = dat.ra, aes(geometry = geometry)) + 
    ggtitle("Remoteness areas") 
map.poa <- gg.base + 
    geom_sf(data = dat.poa, aes(geometry = geometry)) + 
    ggtitle("Postal areas") 
map.lga <- gg.base + 
    geom_sf(data = dat.lga, aes(geometry = geometry)) + 
    ggtitle("LGAs") 

# Combine plots
grid.arrange(
    map.sa2, map.sa3, map.sa4, map.poa, map.lga, map.ra, 
    nrow = 2
)

2.1.2 For Arbitrary regions

For an arbitrarily defined region, this task can be more tricky since the boundaries of the spatial structures might not be wholly contained within that region.

For illustration, consider the Greater Capital City region of Melbourne (note that this is not synonymous with the Melbourne Metro DHHS region). For the main ABS structures like statistical areas, this is easily accomplished by filtering on the Greater Capital City name (gcc_name_2016).

# ABS structures
dat.sa2 <- sa22016 %>%
    filter(gcc_name_2016 == "Greater Melbourne")
dat.sa3 <- sa32016 %>%
    filter(gcc_name_2016 == "Greater Melbourne")
dat.sa4 <- sa42016 %>%
    filter(gcc_name_2016 == "Greater Melbourne")

For other ABS structures like remoteness areas and non-ABS structures, this is less straightforward. The solution is to construct a concordance for the two spatial structures which quantifies the overlap between each pair of areas contained within.

For postcode areas, I use the concordance provided by the function get_correspondence_absmaps to filter postcode areas so only those that have some overlap with the Greater Capital City region are retained. (Note: the area-year key value pairs required by this function can be deteremined from this ABS webpage. In our case, the closest concordance is CG_POA_2016_SA4_2016, so the key value pairs are POA, 2016, and SA4, 2016)

# Concordance
conc <- get_correspondence_absmaps("POA", 2016, "SA4", 2016) %>%
    filter(sa4_name_2016 %in% unique(dat.sa4$sa4_name_2016))

# Postcode that overlap with Melbourne
dat.poa <- postcode2016 %>%
  filter(postcode_num_2016 %in% conc$poa_code_2016)

For LGAs, a similar technique is used. The best available concordance is CG_SA2_2016_LGA_2018.

# Concordance
conc <- get_correspondence_absmaps("SA2", 2016, "LGA", 2018) %>%
    filter(sa2_maincode_2016 %in% unique(dat.sa2$sa2_main_2016))

# LGAs that overlap with Melbourne
dat.lga <- lga2018 %>%
  filter(lga_code_2018 %in% conc$lga_code_2018)

For remoteness areas, we include major city and inner regional areas. This large region is then restricted by taking the intersection of this region with the Greater Capital City region.

# Melbourne Greater Capital City region boundary
Mel <- gcc2016 %>%
    filter(gcc_name_2016 == "Greater Melbourne")

# Remoteness areas
dat.ra <- ra2016 %>%
    # filter(state == "Victoria" & ra_code <= 21)  %>%
    st_intersection(., Mel)

Here is a comparison of these ASGS boundaries in Melbourne. Note that the remoteness areas has been "clipped" to the Melbourne capital city region, while the POAs and LGAs extend beyond the boundary where overlaps occured (this behaviour couldbe changed by following the process for remoteness areas above).

# Base map with desired theme/properties
gg.base <- ggplot() + 
    theme_void()

# Specific maps
map.sa2 <- gg.base +
    geom_sf(data = dat.sa2, aes(geometry = geometry)) + 
    ggtitle("SA2") 
map.sa3 <- gg.base + 
    geom_sf(data = dat.sa3, aes(geometry = geometry)) + 
    ggtitle("SA3") 
map.sa4 <- gg.base + 
    geom_sf(data = dat.sa4, aes(geometry = geometry)) + 
    ggtitle("SA4")

map.poa <- gg.base + 
    geom_sf(data = dat.poa, aes(geometry = geometry)) + 
    ggtitle("Postal areas") 
map.lga <- gg.base + 
    geom_sf(data = dat.lga, aes(geometry = geometry)) + 
    ggtitle("LGAs") 
map.ra <- gg.base + 
    geom_sf(data = dat.ra, aes(geometry = geometry)) + 
    ggtitle("Remoteness areas") 

# Combine plots
grid.arrange(
    map.sa2, map.sa3, map.sa4, map.poa, map.lga, map.ra,
    nrow = 2
)

3 Defining non-ASGS Strucutres

3.1 Primary Health Networks

Although the absmapsdata package does not include primary health networks (PHNs) boundaries, the ABS does offer correspondence tables between 2017 PHNs and several ABS structures including SA2s, SA3s, and Postcodes.

An interactive map showing PHN boundaries in Australia and the corresponding SA3 boundaries can be found here. Using this map, it seems like the PHNs are constructed from one ore more whole SA3s. However, the correspondence isn't perfect. In fact, the correspondence between PHNs and any of these other building blocks (SA1s, SA2s, postcodes) isn't perfect. (This makes sense since PHNs are not defined in the ASGS.)

We can confirm this by exploring the ratio (degree of correspondence) variable in each PHN correspondence table.

# Concordances to PHNs
conc.phn.poa <- get_correspondence_absmaps("POA", 2016, "PHN", 2017)
conc.phn.ced <- get_correspondence_absmaps("CED", 2016, "PHN", 2017)
conc.phn.sa1 <- get_correspondence_absmaps("SA1", 2016, "PHN", 2017)
conc.phn.sa2 <- get_correspondence_absmaps("SA2", 2016, "PHN", 2017)
conc.phn.sa3 <- get_correspondence_absmaps("SA3", 2016, "PHN", 2017)

# Compute percentage of perfect matches (ratio = 1)
percent_perfect_match <- function(x){
    tab <- table(x$ratio)
    ppm <- tab[length(tab)] / nrow(x)
    return(as.numeric(ppm))
}

c(percent_perfect_match(conc.phn.poa),
    percent_perfect_match(conc.phn.ced),
    percent_perfect_match(conc.phn.sa1),
    percent_perfect_match(conc.phn.sa2),
    percent_perfect_match(conc.phn.sa3)
)
## [1] 0.8803659 0.4485981 0.9990437 0.9648521 0.7827225

We see that the best correspondence as a percentage of all units of a given structure is provided by SA1s.

From the PHN website:

Where possible, boundaries of the PHNs align with Local Hospital Networks (LHNs) or equivalents, or clusters of LHNs. ...In determining boundaries, a number of factors were taken into account, including population size and future projected population growth, LHN alignment, State and Territory borders, patient flows and administrative efficiencies.

So without explicit definitions of the PHN boundaries, using a concordance between SA1s and PHNs to merge the SA1s into PHN areas probably provides the best approximation. E.g.

# Remove ambiguities in concordance (use highest ratio when not 1)
conc <- conc.phn.sa1 %>%
    filter(ratio != 1)
codes <- unique(conc$sa1_maincode_2016)
conc.phn.resolved <- conc.phn.sa1
for(i in length(codes):1){
    # rows with ambiguities
    r <- which(conc.phn.sa1$sa1_maincode_2016 == codes[i])
    # row(s) to drop
    d <- r[-which.max(conc.phn.sa1$ratio[r])]
    if(length(d) > 0){
        conc.phn.resolved <- conc.phn.resolved[-d,]
    }
}

# Join PHN concordance then merge (dissolve) SA1s by PHN
names(conc.phn.resolved)[1] <- names(sa12016)[1]
phn2017 <- left_join(sa12016, conc.phn.resolved) %>%
    group_by(phn_code_2017) %>%
    summarize() # Works like gUnion on SpatialPolygons
## Joining, by = "sa1_main_2016"

Here is a plot of the PHNs:

gg.base + 
    geom_sf(data = phn2017, aes(geometry = geometry, fill = phn_code_2017)) + 
    scale_fill_discrete(guide = FALSE) +
    ggtitle("PHNs") 

Upon inspection, this looks identical to the map shown on the PHN website.

To export this sf to file, use st_write. There may be a better driver, but the MapInfo File format preserves the object properties well (ESRI Shapefile does not, and GPKG fails).

st_write(phn2017, dsn = fp.wd, layer = "PHN Boundaries", driver = "MapInfo File")

where fp.wd is the filepath of the working directory (or any valid directory you like). To read it in again, use st_read:

phn2017.test <- st_read(dsn = fp.wd, layer = "PHN Boundaries")

3.2 DHHS Regions

The DHHS regions are defined as groups of LGAs in the state Victoria. A list of the LGAs that comprise each DHHS region can be found on the Department's webpage. To recreate these boundaries, we first create a list of the DHHS region names and the LGAs comprising them, then convert this to a data.frame.

DHHS.regions <- list(
    "Inner Eastern Melbourne" = c("Boroondara (C)", "Manningham (C)", "Monash (C)", "Whitehorse (C)"),
    "Outer Eastern Melbourne" = c("Knox (C)", "Maroondah (C)", "Yarra Ranges (S)"),
    "Goulburn" = c("Greater Shepparton (C)", "Mitchell (S)", "Moira (S)", "Murrindindi (S)", "Strathbogie (S)"),
    "Ovens Murray" = c("Alpine (S)", "Benalla (RC)", "Indigo (S)", "Mansfield (S)", "Towong (S)", "Wangaratta (RC)", "Wodonga (C)"),
    "North Eastern Melbourne" = c("Banyule (C)", "Darebin (C)", "Nillumbik (S)", "Whittlesea (C)", "Yarra (C)"),
    "Hume Moreland" = c("Hume (C)", "Moreland (C)"),
    "Loddon Campaspe" = c("Campaspe (S)", "Central Goldfields (S)", "Greater Bendigo (C)", "Loddon (S)", "Macedon Ranges (S)", "Mount Alexander (S)"),
    "Mallee" = c("Buloke (S)", "Gannawarra (S)", "Mildura (RC)", "Swan Hill (RC)"),
    "Southern Melbourne" = c("Cardinia (S)", "Casey (C)", "Greater Dandenong (C)") ,
    "Bayside Peninsula" = c("Bayside (C)", "Frankston (C)", "Glen Eira (C)", "Kingston (C) (Vic.)", "Mornington Peninsula (S)", "Port Phillip (C)", "Stonnington (C)"),
    "Inner Gippsland" = c("Bass Coast (S)", "Baw Baw (S)", "Latrobe (C) (Vic.)", "South Gippsland (S)", "Unincorporated Vic"),
    "Outer Gippsland" = c("East Gippsland (S)", "Wellington (S)"),
    "Western Melbourne" = c("Hobsons Bay (C)", "Maribrynong (C)", "Melbourne (C)", "Moonee Valley (C)", "Wyndham (C)"),
    "Brimbank Melton" = c("Brimbank (C)", "Melton (C)"),
    "Central Highlands" = c("Ararat (RC)", "Ballarat (C)", "Golden Plains (S)", "Hepburn (S)", "Moorabool (S)", "Pyrenees (S)"),
    "Barwon" = c("Colac-Otway (S)", "Greater Geelong (C)", "Queenscliffe (B)", "Surf Coast (S)"),
    "Wimmera South West Area" = c("Corangamite (S)", "Glenelg (S)", "Hindmarsh (S)", "Horsham (RC)", "Moyne (S)", "Northern Grampians (S)", "Southern Grampians (S)", "Warrnambool (C)", "West Wimmera (S)", "Yarriambiack (S)")
)
# Convert to data.frame
L <- sapply(DHHS.regions, length)
DHHS.regions <- data.frame(
    DHHS.regions = rep(names(DHHS.regions), L),
    lga_name_2018 = as.vector(unlist(DHHS.regions))
)
rm(L)

To create these boundaries as a spatial sf object, we merge this data.frame with the lga2018 sf data after restricting it to Victoria LGAs. Then we "dissolve" the LGA boundaries within each DHHS region. (This method of dissolving boundaries isn't obvious for sf objects. For SpatialPolygonDataFrames or similar objects, this would normally be accomplished using either rgeos::gUnion or maptools::unionSpatialPolygons. However, the likely counterpart being st_union from the sf package does not achieve this. There is a lengthy discussion about this unexpected behaviour here. The suggested approach is to use the less obvious summarise function in combination with dplyr::group_by.)

dat.dhhs <- lga2018 %>%
    filter(state_name_2016  == "Victoria") %>%
    merge(DHHS.regions, by = "lga_name_2018") %>%
    group_by(DHHS.regions) %>%
    summarise()

Here is a plot of these boundaries for Victoria:

# Base map with desired theme/properties
ggplot() + 
    theme_void() +
    geom_sf(data = dat.dhhs, aes(geometry = geometry)) + 
    ggtitle("DHHS Regions") 

3.3 Local Public Health Units

TBA

4 Practical Applications

4.0.1 Comparing Structures

One practical application is comparing PHNs to other ASGS structures.

Here is an example map which replicates that shown on the PHN website when SA3 boundaries are included.

gg.base + 
    geom_sf(data = sa32016, aes(geometry = geometry), colour = "black") + 
    geom_sf(data = phn2017, aes(geometry = geometry), colour = "red", fill = NA, size = 1) + 
    ggtitle("SA3 (black) and PHNs (red)")

This might be useful as part of a Shiny app, where the underlying spatial structure can be dynamically selected by the user, with added zoom functionality.

4.0.2 Data Overlays

Another application is overlaying user-generated spatial data, e.g. from a model, with the PHNs. These administrative regions may be the most useful from a practical, epidemiological perspective, rather than postcodes, or ABS structures.

For example, in one departmental report entitled How do people move - using mobile phone data to inform control interventions circulated 30 July 2020, mobile phone data was used to estimate population movements beteween different postcodes. These movements were subsequently used to group postcode areas into interconnected communities, which could be an important spatial unit when implementing control interventions, for example, due to COVID-19, rather than other pre-defiend administrative boundaries. However, investigating how such spatial units correspond to PHN boundaries could be interesting.

Below is a map of arbitrarily grouped postcode areas in Metropolitan Melbourne that resembles results similar to that found in the aforementioned report.

Here is the data overlayed with PHNs:

# PHNs in Metro Melbourne
dat.phn <- phn2017[11:13,]

# Plot
gg.base + 
    geom_sf(data = Metro.poa, aes(geometry = geometry, fill = factor(grp)), colour = "black") +
    geom_sf(data = dat.phn, aes(geometry = geometry), colour = "red", fill = NA, size = 1) + 
    ggtitle("SA3 (black) and PHNs (red)") 

Back to top.