Geospatial analysis has been increasingly being used across different industries and it has become one of the important visualisations for decision making. This vignette explains how to create geo maps for spatial analysis using ggplot. In this vignette, we will be using population data to be analysed across different cities in Australia.
For this example, we would be using ESRI shape files and Population data by city from Australian Beareau of Statistics.
ESRI shape files will have four different types of files like .dbf,.shp,.shx and .prj. Save all these files to your working directory. We would be using rgdal package to read Shape files.
library(rgdal)
## Loading required package: sp
## rgdal: version: 1.2-8, (SVN revision 663)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
## Linking to sp version: 1.2-4
Load the shape files using readOGR() function:
aus_st <- readOGR("aust_cd66states.shp",layer="aust_cd66states")
## OGR data source with driver: ESRI Shapefile
## Source: "aust_cd66states.shp", layer: "aust_cd66states"
## with 8 features
## It has 2 fields
## Integer64 fields read as strings: COUNT
summary(aus_st)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x 112.92111 159.105333
## y -43.74037 -9.142319
## Is projected: NA
## proj4string : [NA]
## Data attributes:
## STE COUNT
## Min. :1.00 1089 :1
## 1st Qu.:2.75 11619 :1
## Median :4.50 3152 :1
## Mean :4.50 3481 :1
## 3rd Qu.:6.25 389 :1
## Max. :8.00 492 :1
## (Other):2
Load the below packages for reading Excel files and using ggplot:
library(readxl)
library(ggplot2)
Read population data using read_excel() function:
Population <- read_excel("~/Desktop/R Directory/Population.xlsx")
Population data file will have population for cities and their respective co-ordiantes:
summary(Population)
## City Population lat long
## Length:60 Min. : 22600 Min. :-42.85 Min. :114.6
## Class :character 1st Qu.: 28775 1st Qu.:-35.53 1st Qu.:143.5
## Mode :character Median : 46100 Median :-32.73 Median :147.3
## Mean : 277683 Mean :-31.15 Mean :144.4
## 3rd Qu.: 97825 3rd Qu.:-26.78 3rd Qu.:151.7
## Max. :4277200 Max. :-12.40 Max. :153.4
First lets build a simple geo map using the shape file. Function geom_polygon() is used to plot geo shapes using the latitudes and longitudes:
ggplot() +
geom_polygon(data = aus_st,aes(x=long,y=lat,group=group),
fill = "lightblue", colour = "black")
## Regions defined for each Polygons
Now lets add another layer within the map that shows the Population size. This would be done by using geom_point() function and mapping it to population data by using relevant co-ordinates. Here “shape = 21” will give us a bubble shape for each city based on the “size = Population”" property. Also, alpha will allow you to maintain colour transparency for overlapping bubbles:
ggplot() +
geom_polygon(data = aus_st,aes(x=long,y=lat,group=group),
fill = "lightblue", colour = "black") +
geom_point(data = Population, aes(long, lat, size = Population),
shape = 21, fill = "red",alpha=.5) +
scale_size_area(max_size=20)
## Regions defined for each Polygons
And finally, lets add a colour scale to the bubbles. This will help you to analyse population data as colour group. Note that we can also add another variable to colour instead of population and combine bubble sizes and colour scale for analysing multiple dimensions:
ggplot() +
geom_polygon(data = aus_st,aes(x=long,y=lat,group=group),
fill = "lightblue", colour = "black") +
geom_point(data = Population, shape = 21,aes(long, lat, size = Population,fill=Population)) +
scale_size_area(max_size=20) +
scale_fill_gradient(low = "yellow", high="red")
## Regions defined for each Polygons
Using shape files and ggplot, we can easily plot geo spatial data and analyse them. We can also use shape files that plots individual post codes or suburbs and drill down to a more detailed level.