Punam Katariya
April 13, 2015

Goal: This project focuses on how to read shapefiles, change projection and datum of data to overlay on shapefile map , ggmap will add a google map layer and animation is added to present increasing outage numbers (this is not the sequence outages occurred).

Background: Many times Government agencies only provide shapefile map(boundary) data to show affected region. I wanted to visualize these boundaries in R, although online documentation of reading shapefiles and plotting data is ambiguous. After going through many approaches offered online, following is one way that I found most helpful.

Process: Let’s get started with the New York five borough boundary shapefiles available here. Shapefile is a zip file consist of different extension files as .shp,.shx,.prj etc. The ambiguous online documentation but make sure to unzip this zip file in your working directory (otherwise you might be not able to download it in R). To read this folder in R, I am using readOGR function from rgdal package. sp package provides classes and methods for spatial data.

library(rgdal)
library(sp)
NYboundry<-readOGR(dsn="./nybb_14d",layer="nybb")#layer is the name of the file without extension(e.g. nybb.prj)
## OGR data source with driver: ESRI Shapefile 
## Source: "./nybb_14d", layer: "nybb"
## with 5 features and 4 fields
## Feature type: wkbMultiPolygon with 2 dimensions

Because shapefiles use non-topological data we can only see the boundaries of a given area. Which can be plotted using ggplot2 and geom ploygon.

library(ggplot2)
ggplot()+geom_polygon(data=NYboundry,aes(x=long,y=lat,group=group),fill='salmon',size=.2,color='#555555',alpha=0.3)+ggtitle("NY BOROUGHS\n")+theme_bw()

I am plotting Con Edison New York power outage coordinates data a week after super storm sandy which can be found at Coordinates and Outagenumber Thank you, Charles DiMaggio.[1]

conedcoord<-read.csv("./conEdCoordinates.csv",stringsAsFactors=FALSE)
head(conedcoord,n=3)
##        lat      long          location
## 1 41.01349 -73.84017   Ardsley Village
## 2 40.71077 -74.01646 Battery Park City
## 3 40.63773 -74.02290         Bay Ridge

To overlay data points on shapefile plot we must match the class and align projection and datum. Every shapefile data has specific projection and datum, which can be assigned and aligned to coordinates data such as mine. Please refer following link to understand CRS(Coordinate Reference Systems in R).https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/OverviewCoordinateReferenceSystems.pdf

So, the first step is to match the class(Spatial Plygons DataFrame) of data points(conedcoord) and map(NYboundry). In this case only Coned data class needs to be change.

coordinates(conedcoord)<-~long+lat
class(conedcoord)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

Second step: Match the projection and datum and plot data on shapefile and ggmap.

library(maptools)
library(ggmap)

proj4string(conedcoord)<-CRS("+proj=longlat +datum=NAD83") # assign a known CRS to spatial data
NYboundry<-spTransform(NYboundry,CRS("+proj=longlat +datum=NAD83"))#transform shape coordinates
conedcoord <-spTransform(conedcoord,CRS("+proj=longlat +datum=NAD83"))
identical(proj4string(conedcoord),proj4string(NYboundry))
## [1] TRUE
conedcoord<-data.frame(conedcoord) #change back the data coordinates to dataframe
Newyork<-get_map(location="New York City",zoom=10)

overlayplot<-ggmap(Newyork)+geom_point(data=conedcoord,aes(x=long, y=lat),pch=21,size=2, col="#000099",bg="#FF6666",position=position_jitter())+geom_polygon(data=NYboundry,aes(x=long,y=lat,group=group),fill='salmon',size=.3,color='#555555',alpha=0.3)+ggtitle("ConEd Hurrican Sandy Power Outage Map\n")+theme(panel.border = element_rect(colour = "gray", fill=NA, size=1))
overlayplot