Displaying Michigan state park locations: A “getting started with ggmap (and XML)” project

For quite a while I've been wanting to explore the mapping and spatial analysis capabilities of R. Maps are cool and R is cool, so maps + R should be awesome. I've dabbled here and there but thought I'd try to do a few simple projects to start to build some familiarity with this topic.

In this first “Hello World” level project, I decided to show the location and approximate size of all of state parks in the Michigan State Park system. The Michigan DNR hosts data about the state parks in Michigan at http://www.michigandnr.com/parksandtrails/. Detailed data is available for download as a series of “simple” XML files. So, as a byproduct, I'll also show how I dealt with getting data from an XML file into an R data frame.

We need the XML package and plyr will come in handy too.

# install.packages('XML')
library(XML)
library(plyr)

Let's try to explore the Unit.XML file.

xmlunit <- xmlTreeParse(file = "Unit.XML")
xmltop <- xmlRoot(xmlunit)
xmltop[1]
## $dbo.Unit_Export_v
## <dbo.Unit_Export_v UnitId="408" Active="1" UnitName="Baraga State Park" MainPhoneNum="(906) 353-6558" TTYNum="711 (Michigan Relay Center)" Addr1="1300 US-41 South" City="Baraga" Zip="49908-" Lat="46.760815860000001" Long="-88.50035182000002" DrivingDirections="From Baraga go 1/2 mile S. on US-41.  Group rate - $6.00 per person, per night." CountyIdOfMailingAddr="07" EntranceFee="0" PermitRequired="1" ApproxSize="56" UnitMap="baraga_map.pdf" UnitPhoto="baraga.gif" CRSID="BARAGASTATEPARK" ADA="0" UnitType="SPRK" TotalNumofSites="0" EmergencyInformation="" EventsInformation=" &lt;a href=&quot;http://www.mi.gov/dnr/0,1607,7-153-10365_48474_48519---,00.html&quot;&gt;Baraga State Park Events&lt;/a&gt;&lt;br /&gt;
## &lt;a href=&quot;http://www.michigan.gov/dnr/0,1607,7-153-10365_36576---,00.html&quot;&gt;GO--Get Outdoors Calendar of Events&lt;/a&gt;&lt;br /&gt;
## &lt;a href=&quot;http://www.michigandnr.com/Publications/PDFS/RecreationCamping/Baraga_2014_Events.pdf&quot;&gt;Baraga 2014 Events&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;
## 
## &lt;a class=&quot;twitter-timeline&quot; data-dnt=&quot;true&quot; href=&quot;https://twitter.com/BaragaStatePark&quot; data-widget-id=&quot;360739771502448641&quot;&gt;Tweets by @BaragaStatePark&lt;/a&gt; &lt;script&gt;!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?&apos;http&apos;:&apos;https&apos;;if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+&quot;://platform.twitter.com/widgets.js&quot;;fjs.parentNode.insertBefore(js,fjs);}}(document,&quot;script&quot;,&quot;twitter-wjs&quot;);&lt;/script&gt;" GroupRate="0" OCESiteFEE="0" Rustic="0" Modern="1" SemiModern="0" Cabin="0" Equestrian="0" Teepee="1" Yurt="0" Lodge="0" MiniCabin="1" Description="Baraga State Park overlooks scenic Keweenaw Bay of Lake Superior. Camping is available at 115 grassy sites or in the park&apos;s minicabin or tepee. Watching wildlife, fishing, hiking, swimming and boating are among the many activities of this park.  " LastChangeTimestamp="2014-04-30T14:52:31.213" LengthyDescription="&lt;p&gt;Baraga State Park is situated a quarter mile south of Baraga along US-41 and overlooks scenic Keweenaw Bay of Lake Superior. Camping is available at 115 grassy sites or in the park&apos;s minicabin. Watching wildlife, fishing, hiking, swimming and boating are among the many activities of this park. The park now features five pull through campsites and 20 and 30 amp electrical service.&lt;/p&gt;&lt;p&gt; &lt;a href=&quot;http://www.michigandnr.com/publications/pdfs/RecreationCamping/Baraga_State_Park_Slideshow.pdf &quot;&gt;Click here for a slide show of the park&lt;/a&gt;&lt;br /&gt;
## 
## &lt;p style=&quot;text-align:center;&quot;&gt;&lt;a  href=&quot;http://media.state.mi.us/michiganestore/public/ProductDetails.aspx?categoryId=7&amp;productId=311&quot;&gt;Make a Tax Deductible Donation to Baraga State Park&lt;/a&gt;&lt;/p&gt;" DisplayHeaderGIF="baraga.gif" AreaInformation="The Bishop Baraga Shrine, the Sturgeon River Gorge, Mt. Arvon (Michigan&apos;s highest point), Canyon Falls and attractions associated with the Keweenaw Bay Indian Community are only a few of the points of interest in Baraga County. The park is 75 miles from Copper Harbor, 69 miles from the Porcupine Mountains and 73 miles from Marquette. For more information on local events and information please visit the websites below.&lt;br&gt;&lt;a href=&quot;http://www.baragacounty.org/&quot;&gt;Baraga County&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;http://www.laketroutfestival.com/&quot;&gt;Baraga County Lake Trout Festival&lt;/a&gt;&lt;br /&gt;&lt;a href=&quot;http://mi-trale.org/&quot;&gt;www.mi-trale.org&lt;/a&gt;" ORV="1" Boat="0" WalkIn="0"/>
## 
## attr(,"class")
## [1] "XMLNodeList"

So, each part is a single node at the top level, with the individual data elements represented as XML attributes. Not sure why this choice was made instead of using XML elements, but I'm no XML expert. Let's explore the attributes of the first node.

atts <- xmlAttrs(xmltop[[1]])
str(atts, nchar.max = 25)
##  Named chr [1:42] "408" "1" "Baraga State Park" ...
##  - attr(*, "names")= chr [1:42] "UnitId" "Active" "UnitName" "MainPhoneNum" ...

So, atts is a “named character vector”. We can access elements just like we'd do with a dictionary in Python.

atts["UnitId"]
## UnitId 
##  "408"
atts["UnitName"]
##            UnitName 
## "Baraga State Park"

As the name implies, all of the elements are character data. Our goal is to get each node's data into a data.frame so that we have one nice data table containing
state park unit information. To convert a named character vector into a data frame, I followed the lead of this Stack Overflow post and tried the following:

atts_df <- as.data.frame(lapply(atts, type.convert), stringsAsFactors = FALSE)
str(atts_df, nchar.max = 25)
## 'data.frame':    1 obs. of  42 variables:
##  $ UnitId               : int 408
##  $ Active               : int 1
##  $ UnitName             : Factor w/ 1 level "Baraga State Park": 1
##  $ MainPhoneNum         : Factor w/ 1 level "(906) 353-6558": 1
##  $ TTYNum               : Factor w/ 1 level "711 (Michigan Relay Cent"| __truncated__: 1
##  $ Addr1                : Factor w/ 1 level "1300 US-41 South": 1
##  $ City                 : Factor w/ 1 level "Baraga": 1
##  $ Zip                  : Factor w/ 1 level "49908-": 1
##  $ Lat                  : num 46.8
##  $ Long                 : num -88.5
##  $ DrivingDirections    : Factor w/ 1 level "From Baraga go 1/2 mile "| __truncated__: 1
##  $ CountyIdOfMailingAddr: int 7
##  $ EntranceFee          : int 0
##  $ PermitRequired       : int 1
##  $ ApproxSize           : int 56
##  $ UnitMap              : Factor w/ 1 level "baraga_map.pdf": 1
##  $ UnitPhoto            : Factor w/ 1 level "baraga.gif": 1
##  $ CRSID                : Factor w/ 1 level "BARAGASTATEPARK": 1
##  $ ADA                  : int 0
##  $ UnitType             : Factor w/ 1 level "SPRK": 1
##  $ TotalNumofSites      : int 0
##  $ EmergencyInformation : logi NA
##  $ EventsInformation    : Factor w/ 1 level " <a href=\"http://www.mi"| __truncated__: 1
##  $ GroupRate            : int 0
##  $ OCESiteFEE           : int 0
##  $ Rustic               : int 0
##  $ Modern               : int 1
##  $ SemiModern           : int 0
##  $ Cabin                : int 0
##  $ Equestrian           : int 0
##  $ Teepee               : int 1
##  $ Yurt                 : int 0
##  $ Lodge                : int 0
##  $ MiniCabin            : int 1
##  $ Description          : Factor w/ 1 level "Baraga State Park overlo"| __truncated__: 1
##  $ LastChangeTimestamp  : Factor w/ 1 level "2014-04-30T14:52:31.213": 1
##  $ LengthyDescription   : Factor w/ 1 level "<p>Baraga State Park is "| __truncated__: 1
##  $ DisplayHeaderGIF     : Factor w/ 1 level "baraga.gif": 1
##  $ AreaInformation      : Factor w/ 1 level "The Bishop Baraga Shrine"| __truncated__: 1
##  $ ORV                  : int 1
##  $ Boat                 : int 0
##  $ WalkIn               : int 0

We use lapply to apply the type.convert function to each element of our character vector. The stringsAsFactors=FALSE is intended to leave data that can't be interpreted as integer, numeric, logical, or complex, as character data with no conversion to a factor. The problem with this approach is that the stringsAsFactors=FALSE is applied “too late”. The type.convert function converts character elements to factors if it can't convert them to one of the other data types. The fix is to use the as.is option of the type.convert function.

atts_df <- as.data.frame(lapply(atts, type.convert, as.is = TRUE), stringsAsFactors = FALSE)
str(atts_df, nchar.max = 25)
## 'data.frame':    1 obs. of  42 variables:
##  $ UnitId               : int 408
##  $ Active               : int 1
##  $ UnitName             : chr "Baraga State Park"
##  $ MainPhoneNum         : chr "(906) 353-6558"
##  $ TTYNum               : chr "711 (Michigan Relay Cent"| __truncated__
##  $ Addr1                : chr "1300 US-41 South"
##  $ City                 : chr "Baraga"
##  $ Zip                  : chr "49908-"
##  $ Lat                  : num 46.8
##  $ Long                 : num -88.5
##  $ DrivingDirections    : chr "From Baraga go 1/2 mile "| __truncated__
##  $ CountyIdOfMailingAddr: int 7
##  $ EntranceFee          : int 0
##  $ PermitRequired       : int 1
##  $ ApproxSize           : int 56
##  $ UnitMap              : chr "baraga_map.pdf"
##  $ UnitPhoto            : chr "baraga.gif"
##  $ CRSID                : chr "BARAGASTATEPARK"
##  $ ADA                  : int 0
##  $ UnitType             : chr "SPRK"
##  $ TotalNumofSites      : int 0
##  $ EmergencyInformation : logi NA
##  $ EventsInformation    : chr " <a href=\"http://www.mi"| __truncated__
##  $ GroupRate            : int 0
##  $ OCESiteFEE           : int 0
##  $ Rustic               : int 0
##  $ Modern               : int 1
##  $ SemiModern           : int 0
##  $ Cabin                : int 0
##  $ Equestrian           : int 0
##  $ Teepee               : int 1
##  $ Yurt                 : int 0
##  $ Lodge                : int 0
##  $ MiniCabin            : int 1
##  $ Description          : chr "Baraga State Park overlo"| __truncated__
##  $ LastChangeTimestamp  : chr "2014-04-30T14:52:31.213"
##  $ LengthyDescription   : chr "<p>Baraga State Park is "| __truncated__
##  $ DisplayHeaderGIF     : chr "baraga.gif"
##  $ AreaInformation      : chr "The Bishop Baraga Shrine"| __truncated__
##  $ ORV                  : int 1
##  $ Boat                 : int 0
##  $ WalkIn               : int 0

That's what we wanted. Ok, now we just need to do this for the entire XML file. As a first step I created a list of all the named character vectors in xmltop.

unit_atts_charvec_list <- sapply(1:length(xmltop), function(x) xmlAttrs(xmltop[[x]]))

Now, we can use sapply on this list with as.data.frame and lapply like we did above.

unit_atts_df_list <- sapply(unit_atts_charvec_list, function(x) as.data.frame(lapply(x, 
    type.convert, as.is = TRUE), stringsAsFactors = FALSE))

All of the dataframes in this have names that overlap. However, not all dataframes have the same number of columns. The base rbind function wants component data frames to have the same columns (though they don't need to be ordered the same.) Fortunately, the plyr package has a more general version of rbind called rbind.fill that can handle differing number of columns and can take a list of dataframes as input.

unit_df <- rbind.fill(unit_atts_df_list)
nrow(unit_df)
## [1] 263
ncol(unit_df)
## [1] 49

Show parks on state of Michigan map

We'll use ggmap to display the state park locations on a map of Michigan.

library(ggmap)
## Loading required package: ggplot2

Here's a map of Michigan from Google Maps.

michgmap <- get_map(location = "Michigan", zoom = 7)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Michigan&zoom=7&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Michigan&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
gmich <- ggmap(michgmap)
gmich

plot of chunk unnamed-chunk-11

Now show the state park locations with the size of the points being a function of the approximate size (acres).

gmich + geom_point(aes(x = Long, y = Lat, size = ApproxSize), data = unit_df, 
    colour = "#006837") + scale_size_continuous(breaks = c(1000, 2000, 4000, 
    8000, 16000, 32000)) + labs(shape = "Visitor Center", size = "Acres")
## Warning: Removed 31 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-12

Find state parks within some drive time radius

The mapdist function takes a (from,to) pair and can create a data frame containing distance measures.

mapdist("Rochster Hills, MI", unit_df$UnitName[5], output = "simple", mode = "driving")
## Information from URL : http://maps.googleapis.com/maps/api/distancematrix/json?origins=Rochster+Hills+MI&destinations=Brimley+State+Park&mode=driving&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
##                 from                 to      m    km miles seconds minutes
## 1 Rochster Hills, MI Brimley State Park 509727 509.7 316.7   16726   278.8
##   hours
## 1 4.646

Each call to mapdist uses the Google Maps API. So, how best to answer questions like:

Show me all the state parks within a 3hr drive of Rochester Hills, Michigan?

We'll take up that question in the next post.

References