This exercise involves the exploratory analysis of point patterns.
load libraries
library(spatstat)
## Warning: package 'spatstat' was built under R version 2.15.3
## Loading required package: mgcv
## This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.
## Loading required package: deldir
## deldir 0.0-21
## spatstat 1.31-1 Type 'help(spatstat)' for an overview of spatstat
## 'latest.news()' for news on latest version 'licence.polygons()' for
## licence information on polygon calculations
library(sp)
library(rgdal)
## Warning: package 'rgdal' was built under R version 2.15.3
## rgdal: version: 0.8-5, (SVN revision 449) Geospatial Data Abstraction
## Library extensions to R successfully loaded Loaded GDAL runtime: GDAL
## 1.9.2, released 2012/10/08 Path to GDAL shared files: C:/Program
## Files/R/R-2.15.2/library/rgdal/gdal GDAL does not use iconv for recoding
## strings. Loaded PROJ.4 runtime: Rel. 4.7.1, 23 September 2009,
## [PJ_VERSION: 470] Path to PROJ.4 shared files: C:/Program
## Files/R/R-2.15.2/library/rgdal/proj
library(maptools)
## Warning: package 'maptools' was built under R version 2.15.3
## Loading required package: foreign
## Loading required package: grid
## Loading required package: lattice
## Checking rgeos availability: FALSE Note: when rgeos is not available,
## polygon geometry computations in maptools depend on gpclib, which has a
## restricted licence. It is disabled by default; to enable gpclib, type
## gpclibPermit()
library(ggmap)
## Warning: package 'ggmap' was built under R version 2.15.3
## Loading required package: ggplot2
library(mapproj)
## Warning: package 'mapproj' was built under R version 2.15.3
## Loading required package: maps
## Warning: package 'maps' was built under R version 2.15.3
read cvs file
id <- read.csv(file.choose())
summary(id)
## Report.key Date.and.time
## 00023432-98D1-7C41-A16F6C4DC998257A : 1 13/06/2006 00:00: 53
## 0004720F-CDC0-4AB7-A02E-9B2DD86EFDE7: 1 19/07/2006 00:00: 50
## 0004A198-CDFA-6C7A-316C5E0B1B99315A : 1 26/07/2006 00:00: 48
## 000560C8-BC1A-7033-558315CE07936A6A : 1 29/06/2006 00:00: 45
## 0007F13F-423D-4561-5458204F2B748859 : 1 16/07/2006 00:00: 44
## 00084758-0A44-4D3F-971C-7675E2A227A4: 1 23/07/2006 00:00: 44
## (Other) :52029 (Other) :51751
## YEAR Type Category
## 2006 :11913 Criminal Event :24353 Murder :24044
## 2007 :10432 Enemy Action :11678 IED Explosion : 9101
## 2008 : 3429 Explosive Hazard: 9876 Direct Fire : 7982
## 2005 : 2741 Friendly Action : 4631 Attack : 2343
## 2004 : 1925 Non-Combat Event: 1107 Small Unit Actions: 1419
## 2009 : 1208 Friendly Fire : 184 Indirect Fire : 1349
## (Other):20387 (Other) : 206 (Other) : 5797
## Title
## UNK : 1521
## : 1341
## (CRIMINAL EVENT) MURDER RPT BY NOT PROVIDED IVO (ROUTE UNKNOWN): 1 CIV KIA : 304
## (CRIMINAL EVENT) MURDER RPT BAGHDAD OPS/NCC : 1 CIV KIA : 255
## (CRIMINAL EVENT) MURDER RPT BAGHDAD POLICE OPS/NCC : 1 CIV KIA : 197
## (CRIMINAL EVENT) MURDER RPT BY NOT PROVIDED IVO BAGHDAD (ZONE 37) (ROUTE UNKNOWN): 1 CIV KIA: 171
## (Other) :48246
## Region Attack.on Coalition.forces.wounded
## MND-BAGHDAD:23782 ENEMY :45967 Min. : 0.00
## MND-N :15944 FRIEND : 4815 1st Qu.: 0.00
## MND-C : 4678 NEUTRAL: 1253 Median : 0.00
## MNF-W : 4472 Mean : 0.15
## MND-SE : 2668 3rd Qu.: 0.00
## MND-S : 378 Max. :118.00
## (Other) : 113
## Coalition.forces.killed Iraq.forces.wounded Iraq.forces.killed
## Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.000 Median : 0.00 Median : 0.00
## Mean : 0.072 Mean : 0.34 Mean : 0.29
## 3rd Qu.: 0.000 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :31.000 Max. :97.00 Max. :48.00
##
## Civilian.wia Civilian.kia Enemy.wia Enemy.kia
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.0 Median : 1.0 Median : 0.00 Median : 0.00
## Mean : 1.2 Mean : 1.3 Mean : 0.07 Mean : 0.46
## 3rd Qu.: 0.0 3rd Qu.: 1.0 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :641.0 Max. :436.0 Max. :194.00 Max. :262.00
##
## Enemy.detained Total.deaths Latitude Longitude
## Min. : 0.0 Min. : 1.0 Min. :29.1 Min. :38.9
## 1st Qu.: 0.0 1st Qu.: 1.0 1st Qu.:33.3 1st Qu.:44.1
## Median : 0.0 Median : 1.0 Median :33.4 Median :44.4
## Mean : 0.3 Mean : 2.1 Mean :33.7 Mean :44.3
## 3rd Qu.: 0.0 3rd Qu.: 2.0 3rd Qu.:33.9 3rd Qu.:44.5
## Max. :560.0 Max. :443.0 Max. :37.9 Max. :48.8
##
recode the types so that “CRIMINAL EVENT” “Criminal Event” are not separate categories. Use the function sub() which substitutes one text string for another text string in the specified object
id$Type <- sub(id$Type, pattern = "CRIMINAL EVENT", replacement = "Criminal Event")
id$Type <- sub(id$Type, pattern = "criminal event", replacement = "Criminal Event")
id$Type <- sub(id$Type, pattern = "EXPLOSIVE HAZARD", replacement = "Explosive Hazard")
check to see if the old categories are removed from the list of possible event types we will run:
id$Type <- factor(id$Type)
levels(id$Type)
## [1] "Criminal Event" "Enemy Action" "Explosive Hazard"
## [4] "Friendly Action" "Friendly Fire" "Non-Combat Event"
## [7] "Other" "Suspicious Incident" "Threat Report"
The spatstat library is for the analysis and manipulation of spatial points. In a spatstat point file the attributes of each location are stored in “marks”. Each point on the map can have zero, one, or many associated marks. select a subset of attributes to serve as potential marks
id.marks.simple <- id[, 4]
id.marks.full <- id[, c(4, 7, 9:18)]
Use latitude and longitude to map out events and create a SpatialPointsDataFrame which is a type of map that stores point objects:
id.spp <- SpatialPointsDataFrame(coords = id[, 19:20], proj4string = CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"),
data = id.marks.full)
Using ggmaps, obtained a georeferenced image of the specified type and scale
map <- get_map(location = "Iraq", maptype = "hybrid", zoom = 6) #get map
ggmap(map, extent = "panel")
Overlay the WikiLeaks Iraq war data
ggmap(map, extent = "panel") + geom_point(data = id, aes(x = id$Longitude, y = id$Latitude),
alpha = 0.1, color = "red") + ggtitle("WikiLeaks Iraq Data \n Incidents involving Death(s)")
Alter the map so the size of the dots are linked to the total number of deaths in each event
ggmap(map, extent = "panel") + geom_point(data = id, aes(x = id$Longitude, y = id$Latitude,
size = Total.deaths), alpha = 0.3, color = "red") + ggtitle("WikiLeaks Iraq Data \n Incidents involving Death(s)")
Map a subset of events (here, those involving detentions). The following code plots 1 layer which is the Google map and then overlays a set of points, the data for the points is the data.frame id. We then map specific columns in id to visual variables, such as size and position using aes().
ggmap(map, extent = "panel") + geom_point(data = id[id$Enemy.detained > 0, ],
aes(x = Longitude, y = Latitude, size = Enemy.detained), alpha = 0.5, color = "green") +
ggtitle("WikiLeaks Iraq Data \n Incidents involving Dententions(s)")
Many of the analytic tools in spatstat are based on distance. . Re-project the data to a coordinate system that uses meters instead of degrees of latitude/longitude.
## Convert to a different coordinate system. So that coordinates are in
## meters not degrees
id.spp.rp <- spTransform(id.spp, CRS("+init=epsg:3839"))
Save new coordinates
id.spp.rp$Easting <- coordinates(id.spp.rp)[, 1]
id.spp.rp$Northing <- coordinates(id.spp.rp)[, 2]
Limit analysis to Baghdad:
## Remove all events outside of Baghdad and convert back to a table
id.df.rp.bag <- id.spp.rp@data[id.spp.rp@data$Region == "MND-BAGHDAD", ]
We can still make a “map” of the points that are saved in a non-spatial data.frame.
plot(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing)
Remove eastern outlier and a few northern outliers
id.df.rp.bag <- id.df.rp.bag[id.df.rp.bag$Northing < 4960000 & id.df.rp.bag$Easting >
9960000, ]
plot(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing)
Zoom into the region with the dense concentration of events and use these points in analysis:
id.df.rp.bag <- id.df.rp.bag[(id.df.rp.bag$Northing < 4950000 & id.df.rp.bag$Northing >
4940000) & (id.df.rp.bag$Easting > 1e+07 & id.df.rp.bag$Easting < 10010000),
]
We will use three types of spatstat() objects:
1.ppp: is a set of points located in space. Points can have attributes called “marks” or the ppp object can be “unmarked” in which case the object stores only locations. Marks are used to differentiate types of points or to store a variable that describes some continuous variable for each point. For example, if the points were trees the marks might record “species” and “dbh”.
2.owin: is a “window” which defines the spatial extent of the ppp object. All ppp objects must have an owin.
3.im: a pixel image, like a raster layer in a GIS. Often derived from the ppp object.
The first step in making a ppp for spatial analysis is to define the window. Define the window using convexhull.xy() which traces the outermost points of a scatter plot. Give the function the x and y coordinates of the points, and it returns an owin object - which is necessary for making a point pattern object.
id.chull <- convexhull.xy(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing)
plot(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing)
bdry <- locator() #will need to click locations in a counter clockwise direction to draw window, then hit 'escape' button
id.win <- owin(poly = bdry)
Use the selected window to create the ppp object (ignore the warning)
id.ppp <- ppp(x = id.df.rp.bag$Easting, y = id.df.rp.bag$Northing, window = id.chull,
marks = id.df.rp.bag[, c(1, 3:12)])
## Warning: data contain duplicated points
unitname(id.ppp) <- c("meter", "meters") ##Assign names to units
To change the window to one that you've drawn manually:
# To change the window of an existing ppp object
id.ppp <- id.ppp[id.win]
## Error: object 'id.win' not found
Visualize the ppp object:
p1 <- plot.ppp(id.ppp)
legend(max(coords(id.ppp))[1] + 1000, mean(coords(id.ppp))[2], pch = p1, legend = names(p1),
cex = 0.5)
## Warning: mean(<data.frame>) is deprecated. Use colMeans() or sapply(*,
## mean) instead.
Another visualization:
## Plot the density of points using a gaussian (normal) kernel with a
## 1000m standard deviation
plot(density(id.ppp, sigma = 1000))
make a contour map of the density:
contour(density(id.ppp, 1000), axes = F)
## NULL
A 3D, 'perspective' plot, of the data:
layout(matrix(c(1, 1, 1, 2, 3, 4), 2, 3, byrow = TRUE))
persp(density(id.ppp, 1000))
## [,1] [,2] [,3] [,4]
## [1,] 2.016e-04 0.000e+00 0.000e+00 0.000e+00
## [2,] 0.000e+00 5.219e-05 -1.948e-04 1.948e-04
## [3,] 0.000e+00 1.866e+04 5.000e+03 -5.000e+03
## [4,] -2.018e+03 -2.588e+02 9.602e+02 -9.592e+02
persp(density(id.ppp, 1000), theta = 150, phi = 15) #rotate by adjusting theta
## [,1] [,2] [,3] [,4]
## [1,] -1.746e-04 -2.610e-05 9.739e-05 -9.739e-05
## [2,] 1.008e-04 -4.520e-05 1.687e-04 -1.687e-04
## [3,] -5.915e-13 1.866e+04 5.000e+03 -5.000e+03
## [4,] 1.249e+03 4.839e+02 -1.811e+03 1.812e+03
persp(density(id.ppp, 1000), theta = 120, phi = 15) #rotate by adjusting theta
## [,1] [,2] [,3] [,4]
## [1,] -1.008e-04 -4.520e-05 1.687e-04 -1.687e-04
## [2,] 1.746e-04 -2.610e-05 9.739e-05 -9.739e-05
## [3,] -1.024e-12 1.866e+04 5.000e+03 -5.000e+03
## [4,] 1.452e+02 5.805e+02 -2.172e+03 2.173e+03
persp(density(id.ppp, 1000), theta = 100, phi = 15) #rotate by adjusting theta
## [,1] [,2] [,3] [,4]
## [1,] -3.502e-05 -5.140e-05 1.918e-04 -1.918e-04
## [2,] 1.986e-04 -9.063e-06 3.382e-05 -3.382e-05
## [3,] -1.165e-12 1.866e+04 5.000e+03 -5.000e+03
## [4,] -6.317e+02 5.583e+02 -2.089e+03 2.090e+03
dev.off() #clearlayout
## null device
## 1
Visualize each type of event:
plot(split(id.ppp))
plot(density(split(id.ppp)))
contour(density(split(id.ppp)), axes = F)
By default only the “Type” mark is plotted. Map different marks as follows:
plot(id.ppp, which.marks = "Total.deaths", maxsize = 3000, main = "Total Deaths")
## 0 50 100 150 200
## 0.0 828.7 1657.5 2486.2 3314.9
Make a weighted density map that reflects the total number of deaths at each location in our window. We'll use this map to do some preliminary analysis of the points. To start simplify the ppp object:
id.ppp.simp <- unmark(id.ppp) #remove all data from the points
deaths <- marks(id.ppp)["Total.deaths"] #create a list of # of deaths
id.ppp.simp.death <- id.ppp.simp %mark% deaths #make a point pattern (ppp) with a single variable
Specify the type of kernel (using kernel), and the size of the pixels (using eps):
id.death.im <- density(id.ppp.simp.death, sigma = 1000, weights = marks(id.ppp.simp.death),
kernel = "rectangular", eps = 100) #eps sets pixel size in map units
plot(id.death.im, main = "Weighted Density Map of \nWikiLeaks Deaths in Baghdad")
To change the colors we first have to define a function called a color ramp. The color ramp will return a specified number of colors between the two user selected colors. To see a list of possible colors type colors().
# We can change the colors by making a custom color ramp
grRd <- colorRampPalette(c("grey60", "red"))
grRd(7) returns seven colors with grey on the low end and red on the high end. To use this function we have to make a colourmap that assigns each of the seven colors to a range of values.
co <- colourmap(grRd(5), range = c(3.37e-06, 0.00036)) #range from summary(id.death.im)
plot(id.death.im, main = "Weighted Density Map of \nWikiLeaks Deaths in Baghdad",
col = co)
It is possible to do map algebra like operations with pixel images (like the density map). Its also possible to export the images to a table, such that the each pixel becomes a cell in the table.
The death density image can be used to divide the study region into areas of high, medium, and low “death density”. The function tess is used to create a special type of spatstat object called a tessellation. Tessellations can be used to group points.
death.map.breaks <- quantile(id.death.im, probs = (0:3)/3) #define density cut points
death.map.cuts <- cut(id.death.im, breaks = death.map.breaks, labels = c("Low DD",
"Medium DD", "High DD")) #cut the image
death.map.quartile <- tess(image = death.map.cuts)
plot(death.map.quartile)
Use the tessellation to count the number of events in each category on the map:
qCount.im <- quadratcount(X = id.ppp, tess = death.map.quartile)
## Warning: Tessellation does not contain all the points of X
plot(qCount.im)
Divide the window up into a n by n grid and count the number of events in each grid cell:
qCount <- quadratcount(X = id.ppp, nx = 4, ny = 4) #returns the count of points in each quadrant
plot(qCount)
Use the density map to look at how the intensity of events (number of points per unit area) relates to the density of death image. This plot shows the density of points as a function of the values on the weighted death density map. The yellow (High DD) regions on the map contain not only more points but the points are higher density than parts of the death density of image that have lower numbers of deaths.
plot(rhohat(unmark(id.ppp), id.death.im))
Split the data frame up into a series of point patterns, one for each type of event:
id.ppp.types <- split(id.ppp, which.marks = "Type")
id.ppp.types <- unmark(id.ppp.types)
The object id.ppp.types now holds a bunch of unmarked point patterns, each type of event is in its own ppp object. For example, we could look at only the criminal events:
id.ppp.types$"Criminal Event"
## planar point pattern: 1816 points
## window: polygonal boundary
## enclosing rectangle: [10000045, 10009992] x [4940001, 4949997] meters
Filter the significant clusters (high density areas) from the background rate (noise). We'll do this using an approach developed by Byers and Raferty (1998) called Nearest Neighbor Cleaning. We'll discuss the theory in class, the basic idea is to estimate the parameters of two point generating processes (one for noise and one for signal). Once the parameters of each process are estimated points are assigned to one of the two processes.
nnc <- nnclean(X = id.ppp.types$"Criminal Event", k = 20)
## Iteration 1 logLik = -361092.732623139 p = 0.6355
## Iteration 2 logLik = -359146.366272647 p = 0.5596
## Iteration 3 logLik = -358203.24296497 p = 0.4912
## Iteration 4 logLik = -358005.479109628 p = 0.4498
## Estimated parameters:
## p [cluster] = 0.44985
## lambda [cluster] = 7.2878e-05
## lambda [noise] = 1.25e-05
plot(split(nnc))
The map above shows intense clusters of criminal activity against the background “noise.”