Geospatial Analysis using ggplot

Geospatial analysis has been increasingly being used across different industries and it has become one of the important visualisations for decision making. This vignette explains how to create geo maps for spatial analysis using ggplot. In this vignette, we will be using population data to be analysed across different cities in Australia.

Data Files & Packages

For this example, we would be using ESRI shape files and Population data by city from Australian Beareau of Statistics.

ESRI shape files will have four different types of files like .dbf,.shp,.shx and .prj. Save all these files to your working directory. We would be using rgdal package to read Shape files.

library(rgdal)
## Loading required package: sp
## rgdal: version: 1.2-8, (SVN revision 663)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-4

Load the shape files using readOGR() function:

aus_st <- readOGR("aust_cd66states.shp",layer="aust_cd66states")
## OGR data source with driver: ESRI Shapefile 
## Source: "aust_cd66states.shp", layer: "aust_cd66states"
## with 8 features
## It has 2 fields
## Integer64 fields read as strings:  COUNT
summary(aus_st)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min        max
## x 112.92111 159.105333
## y -43.74037  -9.142319
## Is projected: NA 
## proj4string : [NA]
## Data attributes:
##       STE           COUNT  
##  Min.   :1.00   1089   :1  
##  1st Qu.:2.75   11619  :1  
##  Median :4.50   3152   :1  
##  Mean   :4.50   3481   :1  
##  3rd Qu.:6.25   389    :1  
##  Max.   :8.00   492    :1  
##                 (Other):2

Load the below packages for reading Excel files and using ggplot:

library(readxl)
library(ggplot2)

Read population data using read_excel() function:

Population <- read_excel("~/Desktop/R Directory/Population.xlsx")

Population data file will have population for cities and their respective co-ordiantes:

summary(Population)
##      City             Population           lat              long      
##  Length:60          Min.   :  22600   Min.   :-42.85   Min.   :114.6  
##  Class :character   1st Qu.:  28775   1st Qu.:-35.53   1st Qu.:143.5  
##  Mode  :character   Median :  46100   Median :-32.73   Median :147.3  
##                     Mean   : 277683   Mean   :-31.15   Mean   :144.4  
##                     3rd Qu.:  97825   3rd Qu.:-26.78   3rd Qu.:151.7  
##                     Max.   :4277200   Max.   :-12.40   Max.   :153.4

Building Geo Maps

First lets build a simple geo map using the shape file. Function geom_polygon() is used to plot geo shapes using the latitudes and longitudes:

ggplot() +
  geom_polygon(data = aus_st,aes(x=long,y=lat,group=group),
               fill = "lightblue", colour = "black") 
## Regions defined for each Polygons

Now lets add another layer within the map that shows the Population size. This would be done by using geom_point() function and mapping it to population data by using relevant co-ordinates. Here “shape = 21” will give us a bubble shape for each city based on the “size = Population”" property. Also, alpha will allow you to maintain colour transparency for overlapping bubbles:

ggplot() +
  geom_polygon(data = aus_st,aes(x=long,y=lat,group=group),
               fill = "lightblue", colour = "black") +
  geom_point(data = Population, aes(long, lat, size = Population), 
             shape = 21, fill = "red",alpha=.5)  +
  scale_size_area(max_size=20)
## Regions defined for each Polygons

And finally, lets add a colour scale to the bubbles. This will help you to analyse population data as colour group. Note that we can also add another variable to colour instead of population and combine bubble sizes and colour scale for analysing multiple dimensions:

ggplot() +
  geom_polygon(data = aus_st,aes(x=long,y=lat,group=group),
               fill = "lightblue", colour = "black") +
  geom_point(data = Population, shape = 21,aes(long, lat, size = Population,fill=Population)) +
  scale_size_area(max_size=20)  +
  scale_fill_gradient(low = "yellow", high="red")   
## Regions defined for each Polygons

Conclusion

Using shape files and ggplot, we can easily plot geo spatial data and analyse them. We can also use shape files that plots individual post codes or suburbs and drill down to a more detailed level.