For quite a while I've been wanting to explore the mapping and spatial analysis capabilities of R. Maps are cool and R is cool, so maps + R should be awesome. I've dabbled here and there but thought I'd try to do a few simple projects to start to build some familiarity with this topic.
In this first “Hello World” level project, I decided to show the location and approximate size of all of state parks in the Michigan State Park system. The Michigan DNR hosts data about the state parks in Michigan at http://www.michigandnr.com/parksandtrails/. Detailed data is available for download as a series of “simple” XML files. So, as a byproduct, I'll also show how I dealt with getting data from an XML file into an R data frame.
We need the XML
package and plyr
will come in handy too.
# install.packages('XML')
library(XML)
library(plyr)
Let's try to explore the Unit.XML
file.
xmlunit <- xmlTreeParse(file = "Unit.XML")
xmltop <- xmlRoot(xmlunit)
xmltop[1]
## $dbo.Unit_Export_v
## <dbo.Unit_Export_v UnitId="408" Active="1" UnitName="Baraga State Park" MainPhoneNum="(906) 353-6558" TTYNum="711 (Michigan Relay Center)" Addr1="1300 US-41 South" City="Baraga" Zip="49908-" Lat="46.760815860000001" Long="-88.50035182000002" DrivingDirections="From Baraga go 1/2 mile S. on US-41. Group rate - $6.00 per person, per night." CountyIdOfMailingAddr="07" EntranceFee="0" PermitRequired="1" ApproxSize="56" UnitMap="baraga_map.pdf" UnitPhoto="baraga.gif" CRSID="BARAGASTATEPARK" ADA="0" UnitType="SPRK" TotalNumofSites="0" EmergencyInformation="" EventsInformation=" <a href="http://www.mi.gov/dnr/0,1607,7-153-10365_48474_48519---,00.html">Baraga State Park Events</a><br />
## <a href="http://www.michigan.gov/dnr/0,1607,7-153-10365_36576---,00.html">GO--Get Outdoors Calendar of Events</a><br />
## <a href="http://www.michigandnr.com/Publications/PDFS/RecreationCamping/Baraga_2014_Events.pdf">Baraga 2014 Events</a><br /><br />
##
## <a class="twitter-timeline" data-dnt="true" href="https://twitter.com/BaragaStatePark" data-widget-id="360739771502448641">Tweets by @BaragaStatePark</a> <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>" GroupRate="0" OCESiteFEE="0" Rustic="0" Modern="1" SemiModern="0" Cabin="0" Equestrian="0" Teepee="1" Yurt="0" Lodge="0" MiniCabin="1" Description="Baraga State Park overlooks scenic Keweenaw Bay of Lake Superior. Camping is available at 115 grassy sites or in the park's minicabin or tepee. Watching wildlife, fishing, hiking, swimming and boating are among the many activities of this park. " LastChangeTimestamp="2014-04-30T14:52:31.213" LengthyDescription="<p>Baraga State Park is situated a quarter mile south of Baraga along US-41 and overlooks scenic Keweenaw Bay of Lake Superior. Camping is available at 115 grassy sites or in the park's minicabin. Watching wildlife, fishing, hiking, swimming and boating are among the many activities of this park. The park now features five pull through campsites and 20 and 30 amp electrical service.</p><p> <a href="http://www.michigandnr.com/publications/pdfs/RecreationCamping/Baraga_State_Park_Slideshow.pdf ">Click here for a slide show of the park</a><br />
##
## <p style="text-align:center;"><a href="http://media.state.mi.us/michiganestore/public/ProductDetails.aspx?categoryId=7&productId=311">Make a Tax Deductible Donation to Baraga State Park</a></p>" DisplayHeaderGIF="baraga.gif" AreaInformation="The Bishop Baraga Shrine, the Sturgeon River Gorge, Mt. Arvon (Michigan's highest point), Canyon Falls and attractions associated with the Keweenaw Bay Indian Community are only a few of the points of interest in Baraga County. The park is 75 miles from Copper Harbor, 69 miles from the Porcupine Mountains and 73 miles from Marquette. For more information on local events and information please visit the websites below.<br><a href="http://www.baragacounty.org/">Baraga County</a><br><a href="http://www.laketroutfestival.com/">Baraga County Lake Trout Festival</a><br /><a href="http://mi-trale.org/">www.mi-trale.org</a>" ORV="1" Boat="0" WalkIn="0"/>
##
## attr(,"class")
## [1] "XMLNodeList"
So, each part is a single node at the top level, with the individual data elements represented as XML attributes. Not sure why this choice was made instead of using XML elements, but I'm no XML expert. Let's explore the attributes of the first node.
atts <- xmlAttrs(xmltop[[1]])
str(atts, nchar.max = 25)
## Named chr [1:42] "408" "1" "Baraga State Park" ...
## - attr(*, "names")= chr [1:42] "UnitId" "Active" "UnitName" "MainPhoneNum" ...
So, atts is a “named character vector”. We can access elements just like we'd do with a dictionary in Python.
atts["UnitId"]
## UnitId
## "408"
atts["UnitName"]
## UnitName
## "Baraga State Park"
As the name implies, all of the elements are character data. Our goal is to get each
node's data into a data.frame
so that we have one nice data table containing
state park unit information. To convert a named character vector into a data frame,
I followed the lead of this Stack Overflow post and tried the following:
atts_df <- as.data.frame(lapply(atts, type.convert), stringsAsFactors = FALSE)
str(atts_df, nchar.max = 25)
## 'data.frame': 1 obs. of 42 variables:
## $ UnitId : int 408
## $ Active : int 1
## $ UnitName : Factor w/ 1 level "Baraga State Park": 1
## $ MainPhoneNum : Factor w/ 1 level "(906) 353-6558": 1
## $ TTYNum : Factor w/ 1 level "711 (Michigan Relay Cent"| __truncated__: 1
## $ Addr1 : Factor w/ 1 level "1300 US-41 South": 1
## $ City : Factor w/ 1 level "Baraga": 1
## $ Zip : Factor w/ 1 level "49908-": 1
## $ Lat : num 46.8
## $ Long : num -88.5
## $ DrivingDirections : Factor w/ 1 level "From Baraga go 1/2 mile "| __truncated__: 1
## $ CountyIdOfMailingAddr: int 7
## $ EntranceFee : int 0
## $ PermitRequired : int 1
## $ ApproxSize : int 56
## $ UnitMap : Factor w/ 1 level "baraga_map.pdf": 1
## $ UnitPhoto : Factor w/ 1 level "baraga.gif": 1
## $ CRSID : Factor w/ 1 level "BARAGASTATEPARK": 1
## $ ADA : int 0
## $ UnitType : Factor w/ 1 level "SPRK": 1
## $ TotalNumofSites : int 0
## $ EmergencyInformation : logi NA
## $ EventsInformation : Factor w/ 1 level " <a href=\"http://www.mi"| __truncated__: 1
## $ GroupRate : int 0
## $ OCESiteFEE : int 0
## $ Rustic : int 0
## $ Modern : int 1
## $ SemiModern : int 0
## $ Cabin : int 0
## $ Equestrian : int 0
## $ Teepee : int 1
## $ Yurt : int 0
## $ Lodge : int 0
## $ MiniCabin : int 1
## $ Description : Factor w/ 1 level "Baraga State Park overlo"| __truncated__: 1
## $ LastChangeTimestamp : Factor w/ 1 level "2014-04-30T14:52:31.213": 1
## $ LengthyDescription : Factor w/ 1 level "<p>Baraga State Park is "| __truncated__: 1
## $ DisplayHeaderGIF : Factor w/ 1 level "baraga.gif": 1
## $ AreaInformation : Factor w/ 1 level "The Bishop Baraga Shrine"| __truncated__: 1
## $ ORV : int 1
## $ Boat : int 0
## $ WalkIn : int 0
We use lapply
to apply the type.convert
function to each element of our character vector. The
stringsAsFactors=FALSE
is intended to leave data that can't be interpreted as integer,
numeric, logical, or complex, as character data with no conversion to a factor.
The problem with this approach is that the stringsAsFactors=FALSE
is applied “too late”.
The type.convert
function converts character elements to factors if it can't convert them to one
of the other data types. The fix is to use the as.is
option of the type.convert
function.
atts_df <- as.data.frame(lapply(atts, type.convert, as.is = TRUE), stringsAsFactors = FALSE)
str(atts_df, nchar.max = 25)
## 'data.frame': 1 obs. of 42 variables:
## $ UnitId : int 408
## $ Active : int 1
## $ UnitName : chr "Baraga State Park"
## $ MainPhoneNum : chr "(906) 353-6558"
## $ TTYNum : chr "711 (Michigan Relay Cent"| __truncated__
## $ Addr1 : chr "1300 US-41 South"
## $ City : chr "Baraga"
## $ Zip : chr "49908-"
## $ Lat : num 46.8
## $ Long : num -88.5
## $ DrivingDirections : chr "From Baraga go 1/2 mile "| __truncated__
## $ CountyIdOfMailingAddr: int 7
## $ EntranceFee : int 0
## $ PermitRequired : int 1
## $ ApproxSize : int 56
## $ UnitMap : chr "baraga_map.pdf"
## $ UnitPhoto : chr "baraga.gif"
## $ CRSID : chr "BARAGASTATEPARK"
## $ ADA : int 0
## $ UnitType : chr "SPRK"
## $ TotalNumofSites : int 0
## $ EmergencyInformation : logi NA
## $ EventsInformation : chr " <a href=\"http://www.mi"| __truncated__
## $ GroupRate : int 0
## $ OCESiteFEE : int 0
## $ Rustic : int 0
## $ Modern : int 1
## $ SemiModern : int 0
## $ Cabin : int 0
## $ Equestrian : int 0
## $ Teepee : int 1
## $ Yurt : int 0
## $ Lodge : int 0
## $ MiniCabin : int 1
## $ Description : chr "Baraga State Park overlo"| __truncated__
## $ LastChangeTimestamp : chr "2014-04-30T14:52:31.213"
## $ LengthyDescription : chr "<p>Baraga State Park is "| __truncated__
## $ DisplayHeaderGIF : chr "baraga.gif"
## $ AreaInformation : chr "The Bishop Baraga Shrine"| __truncated__
## $ ORV : int 1
## $ Boat : int 0
## $ WalkIn : int 0
That's what we wanted. Ok, now we just need to do this for the entire XML file. As a first step I created a list of all the named character vectors in xmltop.
unit_atts_charvec_list <- sapply(1:length(xmltop), function(x) xmlAttrs(xmltop[[x]]))
Now, we can use sapply
on this list with as.data.frame
and lapply
like we did above.
unit_atts_df_list <- sapply(unit_atts_charvec_list, function(x) as.data.frame(lapply(x,
type.convert, as.is = TRUE), stringsAsFactors = FALSE))
All of the dataframes in this have names that overlap. However, not all
dataframes have the same number of columns. The base rbind
function wants component data frames
to have the same columns (though they don't need to be ordered the same.) Fortunately, the plyr
package has a more general version of rbind
called rbind.fill
that can handle differing
number of columns and can take a list of dataframes as input.
unit_df <- rbind.fill(unit_atts_df_list)
nrow(unit_df)
## [1] 263
ncol(unit_df)
## [1] 49
We'll use ggmap
to display the state park locations on a map of Michigan.
library(ggmap)
## Loading required package: ggplot2
Here's a map of Michigan from Google Maps.
michgmap <- get_map(location = "Michigan", zoom = 7)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Michigan&zoom=7&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Michigan&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
gmich <- ggmap(michgmap)
gmich
Now show the state park locations with the size of the points being a function of the approximate size (acres).
gmich + geom_point(aes(x = Long, y = Lat, size = ApproxSize), data = unit_df,
colour = "#006837") + scale_size_continuous(breaks = c(1000, 2000, 4000,
8000, 16000, 32000)) + labs(shape = "Visitor Center", size = "Acres")
## Warning: Removed 31 rows containing missing values (geom_point).
The mapdist
function takes a (from,to) pair and can create a data frame containing distance
measures.
mapdist("Rochster Hills, MI", unit_df$UnitName[5], output = "simple", mode = "driving")
## Information from URL : http://maps.googleapis.com/maps/api/distancematrix/json?origins=Rochster+Hills+MI&destinations=Brimley+State+Park&mode=driving&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## from to m km miles seconds minutes
## 1 Rochster Hills, MI Brimley State Park 509727 509.7 316.7 16726 278.8
## hours
## 1 4.646
Each call to mapdist
uses the Google Maps API. So, how best to answer questions like:
Show me all the state parks within a 3hr drive of Rochester Hills, Michigan?
We'll take up that question in the next post.