Introduction

Column

  • If we could precisely geo-locate all individuals in British Columbia, it would be relatively straightforward to classify them as either living in an urban or rural setting.

  • Statistics Canada defines urban areas as those that have a population of at least 1,000 and a density of 400 or more people per square kilometre, otherwise rural.

  • But what if we do not have precise location data for individuals? What if all we have is the first 3 digits of their postal code? (FSA).

  • Canada Post is the operational authority that defines and uses FSAs for mail sorting and delivery.

  • FSA boundaries are determined by:

    • Locations of mail processing plants and delivery depots.

    • Major transportation routes (highways, ferries, flight routes to the North).

    • Growth and demand: if a region’s addresses expand rapidly, Canada Post may split or reassign FSAs to keep sorting manageable.

  • Canada Post aims for FSAs to represent roughly comparable units of mail processing workload, not people or addresses.

  • Canada Post categorizes the FSAs as either rural (2nd digit 0) or urban (2nd digit not 0) but this is an operational categorization, not intended for analysis.

  • Note that the largest rural FSA is 278 times larger than the smallest rural FSA (plot to right), which makes it difficult to compare across FSAs that are mostly empty.

  • For example, consider two identical towns located in two different FSAs that are otherwise uninhabited.

  • If the size of the FSA “container” differs by a factor 278, then so does their population densities, even though the “contents” are identical.

  • Thus, we need a way to characterize how urban/rural are the areas that are actually inhabited.

  • We use night time light pollution aka “All Angle Composite Snow Free” sourced from NASA (2018).

  • Doing so allows us to calculate:

    • Proportion Lit: The proportion of LANDAREA lit at night aka inhabited.
    • Inhabited Area= Proportion Lit \(\times\) LANDAREA
    • Effective Density= \(\frac{\mbox{Population}}{\mbox{Inhabited Area}}\)
    • Conditional Median = median(All Angle Composite Snow Free, na.rm=TRUE)
  • Going back to our previous example, our two different sized FSAs that share identical “contents” would have identical values for our measures of urban development:

    • Effective Density (how tightly packed are people in the inhabited area) and
    • Conditional1 Median2 (how bright is the inhabited area at night)

Column

Light pollution

Light pollution South West British Columbia: Even the most densely populated area of the province is mostly empty.

Conditional Inference Tree

Column

The model:

  • We fit a conditional inference tree (Hothorn, Hornik, and Zeileis (2006)) to establish whether Canada Post’s urban/rural classification can be replicated using measures of urban development based on night time light pollution.

The Model

\[\mbox{Class} \sim \mbox{Proportion Lit} + \mbox{Conditional Median}+ \mbox{Effective Density}+\epsilon\] where

  • Class is the response variable with values Urban or Rural.
  • Explanatory proxy variables that might be related to Class:
    • The proportion the FSA area that is lit up at night.
    • The median light intensity of the FSA, ignoring unlit areas.
    • The number of people per square kilometer of lit area in the FSA.

Interpretation:

  • When examining the tree to the right, note that Proportion Lit is the variable with the strongest association with class (i.e it is the first split).
  • Conditional Median is also used for lower branches of the tree.
  • Note that Effective Density is not used at any point in the tree.

Takeaways

  • Canada Post’s classification is based mainly on how empty the FSA’s are (as measured by proportion lit), and ignores how tightly packed people are in inhabited areas (as measured by effective density.)

  • This calls into question whether Canada Post’s classification is useful for analysis.

  • Next we create an alternative classification based on our measures of development:

    • Effective Density is a FSA’s population divided by its lit area.
    • Conditional Median is the FSA’s median light intensity, ignoring all unlit areas.

Column

The tree:

K-means clusters

Column

Rural vs Urban:

Column

Map

Development Index

Column

Principle Component Index of Development

Column

Map

Data

References

Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics 15 (3): 651–74. https://doi.org/10.1198/106186006X133933.
NASA. 2018. “VIIRS/NPP Daily at-Sensor TOA Nighttime Radiance 500m VNP46A4, Version 1.” NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/VIIRS/VNP46A4.001.

  1. ignoring the unlit (NAs) areas↩︎

  2. More robust to outliers (greenhouses) than the mean.↩︎