From Seth's Rpubs: http://rpubs.com/Seth/Exp_Point_Pattern
This exercise involves the exploratory analysis of point patterns. The goal is to learn how to visualize and do some preliminary analysis on sets of locations (points). Many of the skills learned here will be essential for hypothesis tests involving spatial point patterns. We will use a number of libraries today including:
setwd("Desktop/")
## Error: cannot change working directory
# install.packages('spatstat')
library(spatstat) # this library is for the analysis and manipulation of spatial points
## Loading required package: mgcv
## This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.
## Loading required package: deldir
## deldir 0.0-21
## spatstat 1.31-1 Type 'help(spatstat)' for an overview of spatstat
## 'latest.news()' for news on latest version 'licence.polygons()' for
## licence information on polygon calculations
library(sp)
## Warning: package 'sp' was built under R version 2.15.3
library(rgdal)
## rgdal: version: 0.8-5, (SVN revision 449) Geospatial Data Abstraction
## Library extensions to R successfully loaded Loaded GDAL runtime: GDAL
## 1.9.0, released 2011/12/29 Path to GDAL shared files:
## /Library/Frameworks/R.framework/Versions/2.15/Resources/library/rgdal/gdal
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480] Path to
## PROJ.4 shared files:
## /Library/Frameworks/R.framework/Versions/2.15/Resources/library/rgdal/proj
library(maptools)
## Loading required package: foreign
## Warning: package 'foreign' was built under R version 2.15.3
## Loading required package: grid
## Loading required package: lattice
## Checking rgeos availability: TRUE
# install.packages('ggmap')
library(ggmap) # This is a very new package called ggmaps that allows us to send a query to online mapping services like Google maps. It returns a georeferenced image of the specified type and scale. We can then plot on top of that image.
## Loading required package: ggplot2
# install.packages('mapproj')
library(mapproj)
## Warning: package 'mapproj' was built under R version 2.15.3
## Loading required package: maps
id <- read.csv("/Users/caitlin/Dropbox/CAITLINS DOCUMENTS/CU Boulder/Courses/GEOG 5023 Quant methods - Spielman/Labs and Exams/Lab 4/Deaths only.csv")
summary(id)
## Report.key Date.and.time
## 00023432-98D1-7C41-A16F6C4DC998257A : 1 13/06/2006 00:00: 53
## 0004720F-CDC0-4AB7-A02E-9B2DD86EFDE7: 1 19/07/2006 00:00: 50
## 0004A198-CDFA-6C7A-316C5E0B1B99315A : 1 26/07/2006 00:00: 48
## 000560C8-BC1A-7033-558315CE07936A6A : 1 29/06/2006 00:00: 45
## 0007F13F-423D-4561-5458204F2B748859 : 1 16/07/2006 00:00: 44
## 00084758-0A44-4D3F-971C-7675E2A227A4: 1 23/07/2006 00:00: 44
## (Other) :52029 (Other) :51751
## YEAR Type Category
## 2006 :11913 Criminal Event :24353 Murder :24044
## 2007 :10432 Enemy Action :11678 IED Explosion : 9101
## 2008 : 3429 Explosive Hazard: 9876 Direct Fire : 7982
## 2005 : 2741 Friendly Action : 4631 Attack : 2343
## 2004 : 1925 Non-Combat Event: 1107 Small Unit Actions: 1419
## 2009 : 1208 Friendly Fire : 184 Indirect Fire : 1349
## (Other):20387 (Other) : 206 (Other) : 5797
## Title
## UNK : 1521
## : 1341
## (CRIMINAL EVENT) MURDER RPT BY NOT PROVIDED IVO (ROUTE UNKNOWN): 1 CIV KIA : 304
## (CRIMINAL EVENT) MURDER RPT BAGHDAD OPS/NCC : 1 CIV KIA : 255
## (CRIMINAL EVENT) MURDER RPT BAGHDAD POLICE OPS/NCC : 1 CIV KIA : 197
## (CRIMINAL EVENT) MURDER RPT BY NOT PROVIDED IVO BAGHDAD (ZONE 37) (ROUTE UNKNOWN): 1 CIV KIA: 171
## (Other) :48246
## Region Attack.on Coalition.forces.wounded
## MND-BAGHDAD:23782 ENEMY :45967 Min. : 0.00
## MND-N :15944 FRIEND : 4815 1st Qu.: 0.00
## MND-C : 4678 NEUTRAL: 1253 Median : 0.00
## MNF-W : 4472 Mean : 0.15
## MND-SE : 2668 3rd Qu.: 0.00
## MND-S : 378 Max. :118.00
## (Other) : 113
## Coalition.forces.killed Iraq.forces.wounded Iraq.forces.killed
## Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.000 Median : 0.00 Median : 0.00
## Mean : 0.072 Mean : 0.34 Mean : 0.29
## 3rd Qu.: 0.000 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :31.000 Max. :97.00 Max. :48.00
##
## Civilian.wia Civilian.kia Enemy.wia Enemy.kia
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.0 Median : 1.0 Median : 0.00 Median : 0.00
## Mean : 1.2 Mean : 1.3 Mean : 0.07 Mean : 0.46
## 3rd Qu.: 0.0 3rd Qu.: 1.0 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :641.0 Max. :436.0 Max. :194.00 Max. :262.00
##
## Enemy.detained Total.deaths Latitude Longitude
## Min. : 0.0 Min. : 1.0 Min. :29.1 Min. :38.9
## 1st Qu.: 0.0 1st Qu.: 1.0 1st Qu.:33.3 1st Qu.:44.1
## Median : 0.0 Median : 1.0 Median :33.4 Median :44.4
## Mean : 0.3 Mean : 2.1 Mean :33.7 Mean :44.3
## 3rd Qu.: 0.0 3rd Qu.: 2.0 3rd Qu.:33.9 3rd Qu.:44.5
## Max. :560.0 Max. :443.0 Max. :37.9 Max. :48.8
##
names(id)
## [1] "Report.key" "Date.and.time"
## [3] "YEAR" "Type"
## [5] "Category" "Title"
## [7] "Region" "Attack.on"
## [9] "Coalition.forces.wounded" "Coalition.forces.killed"
## [11] "Iraq.forces.wounded" "Iraq.forces.killed"
## [13] "Civilian.wia" "Civilian.kia"
## [15] "Enemy.wia" "Enemy.kia"
## [17] "Enemy.detained" "Total.deaths"
## [19] "Latitude" "Longitude"
summary(id$Type)
## criminal event Criminal Event CRIMINAL EVENT
## 1 24353 3
## Enemy Action Explosive Hazard EXPLOSIVE HAZARD
## 11678 9876 1
## Friendly Action Friendly Fire Non-Combat Event
## 4631 184 1107
## Other Suspicious Incident Threat Report
## 146 32 23
The data is a little sloppy, lets recode the types so that “CRIMINAL EVENT” and “Criminal Event” are not separate categories. To to do this we'll use the function sub() which substitutes one text string for another text string in the specified object:
id$Type <- sub(id$Type, pattern = "CRIMINAL EVENT", replacement = "Criminal Event")
id$Type <- sub(id$Type, pattern = "criminal event", replacement = "Criminal Event")
id$Type <- sub(id$Type, pattern = "EXPLOSIVE HAZARD", replacement = "Explosive Hazard")
In order to be sure the old categories are removed from the list of possible event types we have to run:
id$Type <- factor(id$Type)
levels(id$Type)
## [1] "Criminal Event" "Enemy Action" "Explosive Hazard"
## [4] "Friendly Action" "Friendly Fire" "Non-Combat Event"
## [7] "Other" "Suspicious Incident" "Threat Report"
The spatstat library is for the analysis and manipulation of spatial points. In a spatstat point file the attributes of each location are stored in “marks”. Each point on the map can have zero, one, or many associated marks. A file with too many marks is cumbersome, select a subset of attributes to serve as potential marks:
id.marks.simple <- id[, 4] #if you type summary(id) you'll see that the 4th row of the dataset contains info on type of event. This code is assigning those categories to a new object, which we'll use later.
id.marks.full <- id[, c(4, 7, 9:18)]
If you looked carefully at summary(id) you noticed that latitude and longitude are attributes in the table, we'll use these to map out the events:
id.spp <- SpatialPointsDataFrame(coords = id[, 19:20], proj4string = CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"),
data = id.marks.full)
The above text creates a SpatialPointsDataFrame which is a type of map that stores point objects, if you're a GIS person its roughly equivalent to a point shapefile. I'm guessing at the projection of the original data.
##Exploratory Maps
To make maps we'll use a very new package called ggmaps that allows us to send a query to online mapping services like Google maps. It returns a georeferenced image of the specified type and scale. We can then plot on top of that image. The code below uses a package called ggplot2 which we'll look at more carefully later in the semester. The first step is to get the base map.
map <- get_map(location = "Iraq", maptype = "hybrid", zoom = 6)
ggmap(map, extent = "panel")
I think its cool how ggmaps was able to translate the word “Iraq” into a map. Now we can overlay the WikiLeaks Iraq war data:
ggmap(map, extent = "panel") + geom_point(data = id, aes(x = id$Longitude, y = id$Latitude),
alpha = 0.1, color = "red") + ggtitle("WikiLeaks Iraq Data \n Incidents involving Death(s)")
# Caitlin: but I think the title here is wrong! this is not actually
# incidents involving deaths (that's the next map). this is just all the
# events in the dataset...
Alter the map the so that the size of the dots are linked to the total number of deaths in each events (note the use of size inside of aes()):
ggmap(map, extent = "panel") + geom_point(data = id, aes(x = id$Longitude, y = id$Latitude,
size = Total.deaths), alpha = 0.3, color = "red") + ggtitle("WikiLeaks Iraq Data \n Incidents involving Death(s)")
Map a subset of events, for example, those involving detentions:
ggmap(map, extent = "panel") + geom_point(data = id[id$Enemy.detained > 0, ],
aes(x = Longitude, y = Latitude, size = Enemy.detained), alpha = 0.5, color = "green") +
ggtitle("WikiLeaks Iraq Data \n Incidents involving Dententions(s)")
Again, we'll discuss the use of ggplot2 more later in the semester. Quickly, the code above plots 1 layer which is the Google map and then overlays a set of points, the data for the points is the data.frame id. We then map specific columns in id to visual variables, such as size and position using aes(). I'm a big fan of ggplot2's functionality for visualization (even though it kinda sucks for mapping).
##Exploratory spatial analysis using spatstat package
Changing gears a bit we'll begin using the package spatstat do some analysis. Spatstat is a R library designed in Australia by Adrian Baddeley. The logic of spatstat is a little odd from a geographic perspective, but once you understand how it works it is a really useful and robust library.
Many of the analytic tools in spatstat are based on distance. Working with decimal degrees is a pain, they're difficult to interpret as distance units because 1 degree is a different distance on different parts of the globe. Re-project the data to a coordinate system that uses meters instead of degrees of latitude/longitude. Knowing how many deaths occurred per decimal degree isn't that useful. My knowledge of Iraqi coordinate systems is slim, I've found one that seems to work, but I'm not 100% sure it's correct…
## Convert to a different coordinate system. So that coordinates are in
## meters not degrees
id.spp.rp <- spTransform(id.spp, CRS("+init=epsg:3839"))
Now, we'll save the new coordinates:
id.spp.rp$Easting <- coordinates(id.spp.rp)[, 1]
id.spp.rp$Northing <- coordinates(id.spp.rp)[, 2]
I find it difficult to look at the entire country, because violence rates vary so much. I'm going to limit my analysis to Baghdad:
## Remove all events outside of Baghdad and convert back to a table
id.df.rp.bag <- id.spp.rp@data[id.spp.rp@data$Region == "MND-BAGHDAD", ]
Notice my use of @data I have pulled out a subset of the map and saved it as a non-spatial data.frame. The spatial information is preserved in the columns Easting and Northing. We can still make a “map” of the points:
plot(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing)
There is an eastern outlier and few northern outliers, let's remove them:
id.df.rp.bag <- id.df.rp.bag[id.df.rp.bag$Northing < 4960000 & id.df.rp.bag$Easting >
9960000, ]
plot(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing)
On much of the map the points are still very sparse, let's zoom into the region with the dense concentration of events:
id.df.rp.bag <- id.df.rp.bag[(id.df.rp.bag$Northing < 4950000 & id.df.rp.bag$Northing >
4940000) & (id.df.rp.bag$Easting > 1e+07 & id.df.rp.bag$Easting < 10010000),
]
These are the points we'll use in our analysis.
##About spatstat
Today we will use three types of spatstat() objects:
ppp: is a set of points located in space. Points can have attributes called “marks” or the ppp object can be “unmarked” in which case the object stores only locations. Marks are used to differentiate types of points or to store a variable that describes some continuous variable for each point. For example, if the points were trees the marks might record “species” and “dbh”.
owin: is a “window” which defines the spatial extent of the ppp object. All ppp objects must have an owin.
im: a pixel image, like a raster layer in a GIS. Often derived from the ppp object.
The first step in making a ppp for spatial analysis is to define the window. Define the window using convexhull.xy() which traces the outermost points of a scatter plot. Give the function the x and y coordinates of the points, and it returns an owin object – which is necessary for making a point pattern object…
id.chull <- convexhull.xy(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing)
It is possible (and sometimes necessary) to manually draw a window. This can be done with the following code. It is important to draw the window counter clockwise:
plot(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing)
bdry <- locator() #this expects you to start clicking on the plot. You have to click in a counter-clockwise direction to make the shape. hit finish when you're done and R will automatically complete the shape. Useful tool for getting oddly-shaped, hand-drawn shapes in R.
id.win <- owin(poly = bdry)
Use this window to create the ppp object (ignore the warning):
id.ppp <- ppp(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing, window = id.chull,
marks = id.df.rp.bag[, c(1, 3:12)])
## Warning: data contain duplicated points
Assign names to units:
unitname(id.ppp) <- c("meter", "meters")
If you want to change the window to one that you've drawn manually:
# To change the window of an existing ppp object
id.ppp <- id.ppp[id.win]
Now visualize the ppp object:
p1 <- plot.ppp(id.ppp)
legend(max(coords(id.ppp))[1] + 1000, mean(coords(id.ppp))[2], pch = p1, legend = names(p1),
cex = 0.5)
## Error: 'legend' is of length 0
Notice that by default the first “mark” is use to label the points. This mark was the “Type” column in the original data, it describes the type of incident. The data are so dense that the resulting graph is impossible to read. There are alternative visualizations:
## Plot the density of points using a gaussian (normal) kernel with a
## 1000m standard deviation gaussian is the default, but further down we
## create a 'rectangular' kernel, which shows how to selection alternative
## kernel shapes
plot(density(id.ppp, sigma = 1000))
This density plot places a kernel over the center of each pixel and counts up the number of points per unit area. The points further away from the center of each pixel count less toward the density of the pixel than points that are close to the center. This is the standard kernel density estimator that is used in most GIS programs.
The density command creates a pixel image or im object. These objects allow us to do cool stuff when combined with points, as we will see later…
We can make a contour map of the density:
# png('testplot.png', width = 820, height = 820)
contour(density(id.ppp, 1000), axes = F)
## NULL
# dev.off()
# top and bottom lines include code from Dan for automatically exporting
# the image to the working directory
A 3D, “perspective” plot, of the data:
layout(matrix(c(1, 1, 1, 2, 3, 4), 2, 3, byrow = TRUE))
persp(density(id.ppp, 1000))
## Error: invalid 'z' limits
Rotate tthe 3D plot by adjusting theta:
persp(density(id.ppp, 1000), theta = 150, phi = 15) #rotate by adjusting theta
## Error: invalid 'z' limits
persp(density(id.ppp, 1000), theta = 120, phi = 15)
## Error: invalid 'z' limits
persp(density(id.ppp, 1000), theta = 100, phi = 15)
## Error: invalid 'z' limits
# dev.off() #clearlayout - Caitlin - don't run this, it's in Seth's code
# but messed up the plots in the next lines for me...
Alternatively, we can visualize each type of event: Caitlin - not getting plots as expected here…
plot(split(id.ppp))
plot(density(split(id.ppp)))
contour(density(split(id.ppp)), axes = F)
plot(id.ppp, which.marks = "Total.deaths", maxsize = 3000, main = "Total Deaths")
# It takes some fiddling with maxsize to get a decent plot, by default all
# the circles are tiny.
###Exploratory analysis of points
There are a lot of potential ways to visualize points in R, we've just scratched the surface. One really cool thing about spatstat is that its possible to use pixel images (like the density maps) in the analysis of points. Let's make a weighted density map that reflects the total number of deaths at each location in our window. We'll use this map to do some preliminary analysis of the points. To start with lets simplify the ppp object:
id.ppp.simp <- unmark(id.ppp) #remove all data from the points
deaths <- marks(id.ppp)["Total.deaths"] #create a list of # of deaths
id.ppp.simp.death <- id.ppp.simp %mark% deaths #make a point pattern (ppp) with a single variable
The density function has a lot of options, for example we can weight each point by the total deaths using the “weight” option, we can specify the type of kernel (using kernel), and the size of the pixels (using eps):
id.death.im <- density(id.ppp.simp.death, sigma = 1000, weights = marks(id.ppp.simp.death),
kernel = "rectangular", eps = 100) #eps sets pixel size in map units
## Warning: pixel size parameter eps > size of window
## Error: Only 1 column of pixels - cannot determine pixel width
plot(id.death.im, main = "Weighted Density Map of \nWikiLeaks Deaths in Baghdad")
## Error: error in evaluating the argument 'x' in selecting a method for
## function 'plot': Error: object 'id.death.im' not found
I don't really like the default color scheme for the density maps but changing the colors is a pain. To change the colors we first have to define a function called a color ramp. The color ramp will return a specified number of colors between the two user selected colors. To see a list of possible colors type colors().
# We can change the colors by making a custom color ramp
grRd <- colorRampPalette(c("grey60", "red"))
The results of the above code is a function that returns the specified number of colors on a color ramp between grey and red. For example, grRd(7) returns seven colors with grey on the low end and red on the high end. To use this function we have to make a colourmap that assigns each of the seven colors to a range of values.
co <- colourmap(grRd(5), range = c(3.37e-06, 0.00036)) #range from summary(id.death.im)
plot(id.death.im, main = "Weighted Density Map of \nWikiLeaks Deaths in Baghdad",
col = co)
## Error: error in evaluating the argument 'x' in selecting a method for
## function 'plot': Error: object 'id.death.im' not found
It is possible to do map algebra like operations with pixel images (like the density map). Its also possible to export the images to a table, such that the each pixel becomes a cell in the table.
The death density image can be used to divide the study region into areas of high, medium, and low “death density”. The function tess is used to create a special type of spatstat object called a tessellation. Tessellations can be used to group points.
death.map.breaks <- quantile(id.death.im, probs = (0:3)/3) #define density cut points
## Error: object 'id.death.im' not found
death.map.cuts <- cut(id.death.im, breaks = death.map.breaks, labels = c("Low DD",
"Medium DD", "High DD")) #cut the image
## Error: object 'id.death.im' not found
death.map.quartile <- tess(image = death.map.cuts)
## Error: object 'death.map.cuts' not found
plot(death.map.quartile)
## Error: error in evaluating the argument 'x' in selecting a method for
## function 'plot': Error: object 'death.map.quartile' not found
We can also use the tessellation to count the number of events in each category on the map:
qCount.im <- quadratcount(X = id.ppp, tess = death.map.quartile) #ignore warning?
## Error: object 'death.map.quartile' not found
plot(qCount.im)
## Error: error in evaluating the argument 'x' in selecting a method for
## function 'plot': Error: object 'qCount.im' not found
We could also divide the window up into a n by n grid and count the number of events in each grid cell:
qCount <- quadratcount(X = id.ppp, nx = 4, ny = 4) #returns the count of points in each quadrant
plot(qCount)
We can also use the density map to look at how the intensity of events (number of points per unit area) relates to the density of death image:
plot(rhohat(unmark(id.ppp), id.death.im))
## Error: error in evaluating the argument 'x' in selecting a method for
## function 'plot': Error in rhohat.ppp(unmark(id.ppp), id.death.im) : object
## 'id.death.im' not found Calls: rhohat -> rhohat.ppp
We'll discuss rhohat more in lecture. This plot basically shows you the density of points as a function of the values on the weighted death density map. The yellow (High DD) regions on the map contain not only more points but the points are higher density than parts of the death density of image that have lower numbers of deaths.
We could also split the data frame up into a series of point patterns, one for each type of event:
id.ppp.types <- split(id.ppp, which.marks = "Type")
id.ppp.types <- unmark(id.ppp.types)
The object id.ppp.types now holds a bunch of unmarked point patterns, each type of event is in its own ppp object. For example, we could look at only the criminal events:
id.ppp.types$"Criminal Event"
## planar point pattern: 0 points
## window: rectangle = [0, 1] x [0, 1] units
There are so many criminal events that its difficult to make heads or tails of the map. Let's try to filter the significant clusters (high density areas) from the background rate (noise). We'll do this using an approach developed by Byers and Raferty (1998) called Nearest Neighbor Cleaning. We'll discuss the theory in class, the basic idea is to estimate the parameters of two point generating processes (one for noise and one for signal). Once the parameters of each process are estimated points are assigned to one of the two processes.
nnc <- nnclean(X = id.ppp.types$"Criminal Event", k = 20)
## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to max; returning -Inf
## Iteration 1 logLik = 0 p = NaN
## Iteration 2 logLik = 0 p = NaN
## Estimated parameters:
## p [cluster] = NaN
## lambda [cluster] = NaN
## lambda [noise] = NaN
plot(split(nnc))
The map above shows intense clusters of criminal activity against the background “noise.” In the next exercise we'll do hypothesis tests to understand the processes that generated the observed point patterns.
#Caitlin's notes from lab:
PPP = poisson point something.
#Seth's comments on attaching marks:
to give multiple attributes to a point, you need to use a dataframe (a table), and it needs to be the same length as the # of points you have. You have to explicitly give an NA or null to any points that don't have an associated “mark ”
R assumes that the order matches - that the first x in the PPP object is the first row in the table, adn so on. And it makes sense to have the coordinates of the points in the data frame.
When doing hte lab, Seth anticipates stumbling blocks trying to get the dataframe to work.