Goal: This project focuses on how to read shapefiles, change projection and datum of data to overlay on shapefile map , ggmap will add a google map layer and animation is added to present increasing outage numbers (this is not the sequence outages occurred).
Background: Many times Government agencies only provide shapefile map(boundary) data to show affected region. I wanted to visualize these boundaries in R, although online documentation of reading shapefiles and plotting data is ambiguous. After going through many approaches offered online, following is one way that I found most helpful.
Process: Let’s get started with the New York five borough boundary shapefiles available here. Shapefile is a zip file consist of different extension files as .shp,.shx,.prj etc. The ambiguous online documentation but make sure to unzip this zip file in your working directory (otherwise you might be not able to download it in R). To read this folder in R, I am using readOGR function from rgdal package. sp package provides classes and methods for spatial data.
library(rgdal)
library(sp)
NYboundry<-readOGR(dsn="./nybb_14d",layer="nybb")#layer is the name of the file without extension(e.g. nybb.prj)
## OGR data source with driver: ESRI Shapefile
## Source: "./nybb_14d", layer: "nybb"
## with 5 features and 4 fields
## Feature type: wkbMultiPolygon with 2 dimensions
Because shapefiles use non-topological data we can only see the boundaries of a given area. Which can be plotted using ggplot2 and geom ploygon.
library(ggplot2)
ggplot()+geom_polygon(data=NYboundry,aes(x=long,y=lat,group=group),fill='salmon',size=.2,color='#555555',alpha=0.3)+ggtitle("NY BOROUGHS\n")+theme_bw()
I am plotting Con Edison New York power outage coordinates data a week after super storm sandy which can be found at Coordinates and Outagenumber Thank you, Charles DiMaggio.[1]
conedcoord<-read.csv("./conEdCoordinates.csv",stringsAsFactors=FALSE)
head(conedcoord,n=3)
## lat long location
## 1 41.01349 -73.84017 Ardsley Village
## 2 40.71077 -74.01646 Battery Park City
## 3 40.63773 -74.02290 Bay Ridge
To overlay data points on shapefile plot we must match the class and align projection and datum. Every shapefile data has specific projection and datum, which can be assigned and aligned to coordinates data such as mine. Please refer following link to understand CRS(Coordinate Reference Systems in R).https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/OverviewCoordinateReferenceSystems.pdf
So, the first step is to match the class(Spatial Plygons DataFrame) of data points(conedcoord) and map(NYboundry). In this case only Coned data class needs to be change.
coordinates(conedcoord)<-~long+lat
class(conedcoord)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
Second step: Match the projection and datum and plot data on shapefile and ggmap.
library(maptools)
library(ggmap)
proj4string(conedcoord)<-CRS("+proj=longlat +datum=NAD83") # assign a known CRS to spatial data
NYboundry<-spTransform(NYboundry,CRS("+proj=longlat +datum=NAD83"))#transform shape coordinates
conedcoord <-spTransform(conedcoord,CRS("+proj=longlat +datum=NAD83"))
identical(proj4string(conedcoord),proj4string(NYboundry))
## [1] TRUE
conedcoord<-data.frame(conedcoord) #change back the data coordinates to dataframe
Newyork<-get_map(location="New York City",zoom=10)
overlayplot<-ggmap(Newyork)+geom_point(data=conedcoord,aes(x=long, y=lat),pch=21,size=2, col="#000099",bg="#FF6666",position=position_jitter())+geom_polygon(data=NYboundry,aes(x=long,y=lat,group=group),fill='salmon',size=.3,color='#555555',alpha=0.3)+ggtitle("ConEd Hurrican Sandy Power Outage Map\n")+theme(panel.border = element_rect(colour = "gray", fill=NA, size=1))
overlayplot
To see maximum outages in New York, I am turning to ggmap and animating data points on the map. Outagenumber dataset contains number of outages and customers per outage.
library(animation)
conedoutages<-read.csv("conEdOutages.csv",header=TRUE,stringsAsFactors=FALSE)
conEddata<-merge(conedoutages,conedcoord,by="location")#merge both datasets by loation names
outagenumber<-conEddata$outages
animatfunc<-function(x){
df <- subset(conEddata,outages <= x)
finalplot<-ggmap(Newyork)+geom_point(data = df,aes(x=long,y=lat),pch=21,size=2,col="#000099",bg="#FF6666",position=position_jitter())+geom_polygon (data=NYboundry,aes(x=long,y=lat,group=group),fill='salmon',size=.3,color='#555555',alpha=0.3)+ggtitle(paste("ConEd Hurrican Sandy Power Outages: ",x))+theme(panel.border = element_rect(colour = "gray", fill=NA, size=1))
}
time<-sort(outagenumber)
saveHTML({
for( i in time) print (animatfunc(i))
ani.options(interval = 0.4)}, img.name = "numberofoutages",htmlfile="coned.html",verbose = FALSE,ani.height= 600,ani.width= 600,single.opts = "'controls': ['first', 'previous', 'play', 'next', 'last', 'loop', 'speed'], 'delayMin': 0",title = "Number of outages in NewYork - ConEd data"
)
References:
[1]http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/styled/code-11/
[2]http://resources.esri.com/help/9.3/ArcGISDesktop/com/Gp_ToolRef/geoprocessing_tool_reference/geoprocessing_considerations_for_shapefile_output.htm
[3]http://zevross.com/blog/2014/07/16/mapping-in-r-using-the-ggplot2-package/
[4]http://bcb.dfci.harvard.edu/~aedin/courses/R/CDC/maps.html
[5]http://facweb.knowlton.ohio-state.edu/pviton/courses2/crp87105/spatial-data.html