
The purpose of this Journal is to document all of the tools and functions covered in Geography 5023: Quantitative Methods. To do this, all of the labs, homework, and in-class exercises will be posted and discussed. In order to maintain some type of organization, please use the below table of contents to find the assignment of interest. Here, a list of the functions covered for each exercise will be given accompanied by a very brief description. Each exercise heading will be hyperlinked to an external document that has a deeper explanation of the functions used and shows a practical, worked example of each. The external documents for the in-class exercises also overview a very brief outline of the lecture material covered.
Here is my custom-made unofficial expectations rubric that lists the in-class exercises. I take no responsibility of its accuracy or completeness. Last updated Apr 22, 2013.
4/23/13 Update: Note that this assignment is now extra credit instead of 10% of the course grade.
Final Project
1. Descriptive Geostatistics
Labs (note: external documents crudely converted to rmd files. See the lab write-up for details.)
1. Linear Regression
2. Logistic Regression
3. Spatial Autocorrelation
4. Point Pattern Analysis
Homework
1. Calculate Root Mean Square Standard Deviation
2. Hypothesis Testing
In-Class Exercises
1. Introduction to R
2. Review of Key Concepts
3. Intro to Correlation and Regression
4. Logistic Regression
5. Logistic and Poisson Regression
6. Multiple Regression
7. Autocorrelation (time series)
8. Intro to Spatial Autocorrelation (in Geoda; not shown here)
9a. Spatial Analysis and Regression 1 Exercise 1
9b. Spatial Analysis and Regression 1 Exercise 2
10. Spatial Regression 2
11. Point Pattern Analysis Part 1: Visualization
12. Point Pattern Analysis Part 2: Hypothesis Testing
Misc. In-Class Exercises
1. Swain v. Alabama
2. Maps and Multiple Regression
3. Using GGPLOT2 for visualization
# I'm looking at comparing variograms. Check out the link for more
# information.
General functions
read.csv() #reads a csv file as a table
names() #gets the column names of a table
dataset[rows, cols] #arrangement of parsing through a table
partialData <- dataset[logicalFunction, ] #uses a logical function to get only the rows of interest
par(mfrow = c(2, 2)) #Creates a tiled layout to display multiple plots at the same time. This configuration shows 4 plots.
I(x^2) #this is a literal reading of x. Useful for creating quadratic regression variables.
Testing for normality
hist() #creates a histogram of the data
qqnorm() #creates a qq plot
qqline() #plots the line of normality on a qq plot
shaprio.test() #uses the shaprio wilks test for normality; null hypothesis = the data is normally distributed
Comparing datasets
# Normally we could use a t-test. But the data has to be normally
# distributed. In this lab the data wasn't normally distributed, so we
# must use nonparametric tests.
wilcox.test(x, y, paried = T) #Compared two datasets that do not follow normally distributions. The null hypothesis is that there is no difference between the two datasets.
Creating and assessing linear regression models
lm(D.V. ~ I.V., dataset) #creates a linear regression model.
plot(lm) #plots the residuals. This allows us to see if the residuals follow a normal distribution. If not, our model is no good.
summary(lm) #gives p-values to assess model parameters. It also gives us an R^2 fit statistic
anova(lm1, lm2) #Compares 2 regression models using a F-Test where the null hypothesis is that there is no difference between the datasets.
Creating dummy variables
dataset$newField <- ifelse(dataset$categoricalField == XXXX, c(1), c(0)) #creates a dummy variable manually using the ifelse function. XXXX would be a value within the categorical field. It translates to all of these values being 1 and all others being 0.
factor(dataset$categoricalField) #automatically creates dummy variables using the factor() function.
# When doing a regression model using factors, one must always be dropped.
# It is easy to do this when creating dummy variables manually by simply
# not including a variable, but an extra step is required if the factor()
# function is done:
relevel(factor(dataset$categoricalField), "XXXX") #Relevel omits one of the dummies. You can specify which category to drop by placing the number inside the quotes.
# Plotting categorical data is sometimes a useful visualization:
plot(factor(dataset$categroricalField1) ~ factor(dataset$categoricalField)) #creates a really interesting plot. See the lab for the actual plot.
Creating a logistic model
glm(I.V. ~ relvel(factor(D.V.), "OmittedDummy"), binomial, dataset) #creates a generalized linear model - logistic regression
# using summary(glm), we cannot interpret the model parameters. We must
# compute the odds ratio:
exp(cbind(OR = coef(glm), confint(glm)))
# Comparing 2 logistic models:
anova(glm_full, glm_reduced, test = "LRT") #Calculates a Likelihood Ratio Test to compare the full and reduced logistic models.
anova(glm_full, glm_reduced, test = "Chisq") #Calculates a Chi-Squared Test to compare the full and reduced logistic models.
Assessing Multicolinearity
vif(glm) #Calculates the variance inflation factor. see the in-class exercises for more explanation about this.
mean(vif(glm)) #The average VIF. values over 1 indicate some degree of multicolinearity.
Note: This lab takes a very long time to run. Beware if you are interested in rerunning it.
Create a weights matrix
# Contiguity weights matrices:
poly2nb(dataset) #queen's contiguity
poly2nb(dataset, queen = F) # rook'c contiguity
# Fixed distance based weights matrix:
coords <- coordinates(dataset) #gets the centroids of polygons. This is needed to assess distance
FIPS <- row.names(as(dataset, "data.frame")) #Creates a unique identifier for each point
dnearneigh(coords, d1 = 0, d2 = FixedDistance, row.names = dataset$FIPS) #Create distance based weights matrix
# Create a k-nearest neighbor weights matrix
library(RANN) #This is needed, I forget why...
coords <- coordinates(dataset) #gets the centroids of polygons. This is needed to assess the nearest centroids
FIPS <- row.names(as(dataset, "data.frame")) #Creates a unique identifier for each point
knn2nb(knearneigh(coords, k = NumberOfNeighbors), row.names = FIPS) #Creates k-nearest neighbors using the nearest k neighbors specified.
# Row Standardized weights Matrix
nb2listw(WEIGHTS, style = "W", zero.policy = T) #row standardized weights matrix. The zero.policy=T allows for islands to be in the dataset.
Assessing Autocorrelation - Converting an OLS to SAR
moran.test(dataset$attribute, listw = WEIGHTS, zero.policy = T) #Calculates the Morna's I of a variable using a weights matrix.
ols <- lm(D.V. ~ I.V., data = dataset) #Creates a general linear model (OLS)
shapiro.test(ols$residuals) #Are the residuals normally distributed? They should be randomly distributed.
lm.morantest(ols, nbqw, zero.policy = T) #tests the autocorrelation in the OLS model. The null hypothesis is that there is no autocorrelation, or the Moran's I value is 0.
sar <- lagsarlm(D.V. ~ I.V., data = dataset, nbqw, zero.policy = T) #Creates a spatial lag model. This helps to remove autocorrelation in the OLS.
summary(sar) # The Lagrange Multiplier test tells if there is any residual autocorrelation left in the SAR model. If there is, other steps need to be taken to remove it.
bptest.sarlm(sar) #Breusch-Pagen test for heteroscedacity. The null hypothesis is that the residuals are homoscedactic, which is good. Heteroscedacity also alludes to autocorrelation, or at least the residuals are dependent on X (they shouldn't be...)
anova(sar, ols) #Does a Likelihood Ratio Test to see if incorporating autocorrelation into the SAR model improves the model fit.
Mapping and Understanding Spillover Effects
edit <- dataset #copy the data so we don't corrupt it.
edit@data[edit@data$attribute == "LogicalTest", "AttributeToBeEdited"] <- "EditiedValue" #edit the data
# Step 1: Predict original and edited data based on the SAR model...
orig_predict <- as.data.frame(predict(sar)) #predict a variable using the original dataset
edit_predict <- as.data.frame(predict(sar, newdata = edit, listw = nbqw, zero.policy = T)) #predict the same variable after editing one of the I.V. values.
Effects <- edit_predict$fit - orig_predict$fit #calculate the change in the predicted values
edit$EDIT_ATTR <- Effects #append it to the edited dataset so we can map it.
# Step 2: Make a map...
library(RColorBrewer)
library(classInt)
colorPal_ <- brewer.pal(5, "PuOr")
cats <- classIntervals(edit$EDIT_ATTR, n = 4, style = "jenks") #Create class breaks. Note that n is one less than the number of colors selected!
colors <- findColours(cats, colorPal) #Tie the class breaks with the colors
plot(edit, col = colors) #This actually creates the map!
legend("bottomright", legend = round(cats$brks, 2), fill = colorPal, bty = "n")
mtext("Map Title", side = 3, line = 1)
Making a ppp object and doing simple visualizations:
convexhull.xy(x = points_df$Easting, y = points_df$Northing) #This is a convex hull of the AOI and is used as the owin window
ppp(x = points_df$Easting, y = points_df$Northing, window = convexhull, marks = points_marks) #ppp object. points_marks is a table of the points df
plot(ppp) #factors the points in the points_df
plot(density(ppp, sigma = 500)) #kernel density map
Comparing data to CSR using hypothesis testing
quadrat.test(deaths_ppp, nx = 5, ny = 5) #quadrant testing. The null hypothesis is the data was created by a CSR process
plot(envelope(ppp, Kest, nsim = 99, correction = "border")) #K-function. Note that the PPP cannot have marks set.
Examine the distribution of points
plot(envelope(ppp, Kcross, i = "event type 1", j = "event type 2", nsim = 99,
correction = "border")) #Cross K. The ppp must have a marks object that ONLY contains the attribute that distinguishes the event type. This is easier than the alltypes() function because the user doesn't have to compute cross k for every event type.
ktype1 <- Kest(ppp[marks$Type == "event type 1", ], correction = "best")
ktype2 <- Kest(ppp[marks$Type == "event type 2", ], correction = "best")
KdiffCEEA <- ktype1 - ktype2 # Difference in K Functions. This examines the similarity in spatial patterns of two event types.
Create a point process model
# Prepare data for input into a ppm:
eventType1_df <- points_df[points_df$Type == "event type 1", ]
eventType1_ppp <- ppp(x = eventType1_df$Easting, y = eventType1_df$Northing,
window = convexHull)
eventType2_df <- points_df[points_df$Type == "event type 2", ]
eventType2_ppp <- ppp(x = eventType2_df$Easting, y = eventType2_df$Northing,
window = convexHull)
et2_den <- density(eventType2_ppp) #Density Surface used to predict event type 1
# Create the PPM object
ppm(eventType1_ppp, ~et2_den, covariates = list(et2_den = et2_den)) #point process model
ppm_sim <- simulate(ppm) #Simulates CSR with a variable lambda intensity, not needed to assess the model, but a nice visual.
envelope(ppm, Kest, nsim = 99, correction = "border", verbose = F) #K function. If the model is good, the K-function should appear to be a CSR process (i.e. we were able to understand the distribution of data). However, in a better scenario, we should use this to understand if the process really is a CSR process (i.e. trees on high vs. low slopes.)
Part of homework 1 was to create a function to calculate RMS Standard Deviation. Using RMarkdown, we are able to code LaTeX Equations such as: \[ s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2} \]
Here is how the above formula was coded was coded:
$$ s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2} $$
Creating a function:
functionName<-function(requiredData){
...does stuff...
... it can be multiple lines
return(result)
}
Creating graphs:
plot(dataset ~ dataset)
# OR
plot(dataset$Y ~ dataset$X) #This is probably the better way to do it...
Graphs can also be customized on the screen:
myLayout <- layout(mat = matrix(c(1, 2), 1, 2), respect = TRUE)
plot(DatasetA)
plot(DatasetB)
# The result will be the two plots tiled side-by-side. This can be useful
# when looking at summaries of multiple plots or comparing plots.
# However, the sizes are greatly reduced.
Common functions:
grep() #allows you to search for text. Does this only apply to tables?
summary() #returns some descriptive statistics for a list or table
is.na() #logical operator that tests if something is null or not
ifelse() #this is an if-else statement
Working with Tables:
dim() #returns number or rows and columns in a table()
as.factor(table$heading) #formalizes a custom-created column to a table
list_subselection <- list_original[rows, cols] #slices though a table
# e.g. myNewList <- oldList[oldList$colA < 20] This will return all the
# records in the table were column A is less than 20.
head(table) #will return all of the column headings and first few lines of a table.
Hypothesis Testing
# State null hypothesis (the two groups are the same)
t.test(x, y) #calculates a t-test which compares group X and group Y.
# Afterwards, state p-value and if you accept(p-value>0.05) or
# reject(p-value<0.05) the null hypothesis
Understanding Correlation and Correlation Testing
# State null hypothesis (the correlation is 0)
cor.test(x, y, method) #Runs a correlation test on the two variables
# Afterwards, state p-value and if you accept(p-value>0.05) or
# reject(p-value<0.05) the null hypothesis
# Possible Methods:
cor.test(x, y, "pearson") #DEFAULT
cor.test(x, y, "kendall")
cor.test(x, y, "spearman")
Working with external datasets
read.csv() #Import csv file
names() #gets the column names of a table
length() #can parse through a table and return the number of objects
Plotting Data
hist() #creates a histogram of a field
boxplot() #creates a boxplot for a given field
boxplot(x ~ y) #will create multiple box plots. X will be subdivided among y
plot(x, y) #can create a scatter plot
# Extra formatting can also be applied:
plot(x, y, main = "Main Title", xlab = "x-axis label", ylab = "y-axis label")
# Not shown, but useful:
par(mfrow = c(2, 2)) #Will divide the plots into a 2x2 frame, showing 4 plots at once.
General Functions
summary() #returns summary information of an object.
Descriptive Statistics
mean() #gets the mean
# The worked example shows how to do this manually, instead of using an
# out-of-the box function
sd() #gets the standard deviation
# homework 1 calculates this manually.
Assessing Normality
# Statistical Methods:
shaprio.test() #null hypothesis is that the data is normally distributed
ks.test() #not 100% sure... couldn't get it to work in the example
# Graphical Methods:
qqnorm() #Quantile-Quantile Plot. Data MUST be plotted as a straight line to be normally distributed
qqline() #plots the line for normality to the QQ Plot
# The following have been done before, but we can make more edits to the
# graphical display, like color and number of bins.
hist() #Histograms
# We can add more detail to the histogram:
hist(x, col = "red", breaks = seq(20000, 110000, 5000))
boxplot() #boxplots
Creating and testing linear models
lm(D.V. ~ I.V.) #Creates a general linear model
summary(lmObject) #Returns a t-test of the model parameter; returns the R^2 (coefficient of determination). The null hypothesis for a univariate model's t-test is the coefficient is 0. (nothing explained by the model)
anova(lmObject) #Uses an F-test to identify if the model is significant. The null hypothesis is that the variance of the data described by the model is due to random chance.
Data Transformations
# When the data is not homoscedastic, we have the option to transform the
# data. Tukey's Ladder of Power gives us some idea of how to transform,
# but it takes a while to process...
ladder(D.V. ~ I.V.) #Tukey's Ladder of Powers Graphic.
# This is how we can transform the data:
lm(log(D.V.) ~ (I.V.^0.5)) #natural log to the D.V. and sq root to the I.V. This makes interpreting model outputs somewhat tricky though!
Intro to Multiple Regression
lm(D.V. ~ I.V.1 + I.V.2) #a 2 variable multiple regression
# We can use R to help us find the 'best' multivariate model
models <- regsubsets(D.V. ~ I.V.1 + I.V.2 + +++....) #Finds the best x-numbered multivariate model
plot(models) #we can display this graphically to see how the variables change as we increase the number of variables
Logistic model creation
glm(D.V. ~ I.V. + I.V. + ++., family = binomial) #creates a generalized linear model. The family determines what link function to use. Binomial is logit
# Because the outcome of a logistic model is either a 1 or a 0, it can be
# difficult to see the plotted results. To compensate, we can add noise
# to the plot:
plot(jitter(D.V.) ~ I.V.) #jitter staggers the plotted data, adding noise.
# We can also see how the likelihood of a 1 occurring changes with the
# I.V. :
plot(I.V., fitted(glm))
Interpreting glm model outputs: Using the link function
# Odds Ratio
exp(confint(glm)) #uses the link function to interpret the likelihood confidence interval of the model parameters. Values greater than 1 are more likely to return a 1; values less than 1 are less likely (or have a declining likelihood)
anova(glm, test = "LRT") #ANOVA's Likelihood Ratio Test. Null hypothesis is that the model parameters do not change the likelihood of the model returning a 1. Can also be used to compare full and reduced models.
General to-dos
# parse through data using ifelse()
Ifelse(logicStatement, c(TrueValue), c(FalseValue)) #ifelse statement to select data. The False Value could also be a nested ifelse statment
Using data stored in R (example data)
library(faraway) #install the library that contains the data
data(datasetName) #use the data function to get at the dataset called gala. I'm not sure how you can easily tell what the datasets included in the library are besides doing a web search...
Create a Poisson Model
glm(D.V. ~ I.V., family = poisson) #same setup as a logistic model, but with a different family. The link function is still the same though
# Checking for over dispersion:
plot(log(fitted(glm)), log((dataset$D.V. - fitted(glm))^2)) #plot the data against its residuals
abline(0, 1)
hist(predict(glm)) #Are the fitted values following a Poisson distribution?
glm(D.V. ~ I.V., family = quasipoisson) #the quasipoisson family factors over dispersion into the model, generally returning more reliable information
# We can also fit a negative binomial model:
glm.nb(glm)
General know how
fitted(glm) #extracts fitted values of a function, the Poisson model here.
# Question: How does this differ from predict(glm)?
Reading in data
readShapePoly() #imports a shapefile
read.dbf() #imports the attribute table as a non-spatial table
More stuff with RegSubsets
regModels <- regsubsets(D.V. ~ I.V., nbest = XXXX, nvmax = XXXX) #nbest tells R to save the XXXX best models for a given multivariate model; nvmax tells R how many models to make (i.e. 20 will make the best 1-variable model to the best 20-variable model)
# Accessing attributes of regsubsets
regModels$adjr2 #extracts the Adjusted R^2 values for the models created in regsubsets()
regModels$bic #extracts the BIC (similar to AIC) - lower values are better
# We can also represent these statistics graphically. Where the plots
# level out indicate that adding more variables won't add much more detail
# to the predictive power of the model:
plot(1:nvmaxAsNumber, y = regModels$adjr2) #Will plot the calculated adjusted R^2 for each of the models created
# easily finding out the variables used in a given model calculated by
# regsubsets. Previously, we graphed the output, which can be difficult
# to read. We can also show the parameters for a model in a tabular
# format:
regModels$outmat[xxxx] #where xxxx is the n-variable model
Multiple Regression Diagnostics
# Testing for multicolinearity
library(car)
vif(lm) #Variance inflation factor. Values of 10 indicate a red flag that there is multicolinearity between one or more variables
mean(vif(lm)) #returns the mean VIF - values over 1 indicate some degree of multicolinearity
# Plot residuals against fitted values
plot(predict(lm), lm$residuals)
# other ways previously covered include:
qqnorm() #qq plot
shapiro.test() #Shapiro Wilks test for normality
Testing for autocorrelation
dwtest(lm) #Burbin-Watson test. Null hypothesis is the autocorrelation in the data is 0 (random; not present)
acf() #plots the residuals. Remember: residuals must be randomly and normally distributed across the 0. We can
acf[x] #returns the autocorrelation (rho) for the provided lag.
Create a time series function to help cope with autocorrelation
ts(D.V., start = min(dataset$TimeAttribute), frequency = Periodicity) #time series function
plot(decompose(ts)) #produces 4 graphs that break the observed data into a trend line, periodicity effects, and random noise
Time Series Diagnostics
summary(acf()) #returns descriptive stats on the autocorrelation observed in the time series
acf[x] #returns the autocorrelation (rho) for the provided lag.
cochrance.orcutt(ts)$rho #calculates the autocorrelation factor or weight of the model
Working with factors and dummy variables
cycle(ts) #creates a periodicity factor for a time series
time(ts) #I don't know....
factor(cycle(ts)) #breaks a categorical attribute into dummy variables
We did stuff in Geoda. It was fun.
Loading Data
# Data can often be exported from R, taking the file extension of .rda.
load("blabla.rda") #imports the rda file
Weights Matrices
# Create a weights matrix
poly2nb() #Default is queen contiguity
poly2nb(x, queen = FALSE) #rooks contiguity
nb2list2() #row standardized weights matrix
# We can also do distance-based weights and k-nearest neighbors weights...
Moran's I
# Global Moran's I:
moran.mc(dataset$attribute, listw = poly2nb, nsim = 999) #simulates random Moran's I values. We can compare this distribution to our own value...
his(moran.mc, breaks = 50) #If the Moran's statistic is extreme, it is likely not caused by random chance
moran.plot() #We can also create a Moran's Scatterplot to see the spread of autocorrelation. See the linked example for more detail about this and how to create one manually to add versatility to the scatterplot.
# LISA
localmoran(dataset$attribute, weights) #Calculates the Local Indicator of Spatial Autocorrelation.
# See the worked example of how to render the LISA map
other ways to visualize autocorrelation.
# Create a correlogram of moran's I
sp.correlogram(weights, dataset$attribute, order = No.Lags, method = "I") #Creates a correlogram with a standard error bar for a specified number of lags.
# Create a variogram for moran's I
plot(variogram(dataset$attribute ~ 1, locations(coordinates(dataset), cloud = F))) #plots a variogram.
Mapping in R
# Check out the worked example here to see how it works. Also, check out
# the Maps and Multiple Regression 'Other In-Class Exercise' for much more
# detail.
Checking for Autocorrelation
# Autocorrelation in the residuals can bias regression outputs. After a
# regression, we should make sure there is NO autocorrelation in the
# residuals. If there is, an ordinary OLS (lm()) model is not
# appropriate.
# moran.mc(lm$resid, weights, nsim=999)#The null hypothesis is that the
# autocorrelation observed in the data is due to random chance. Rejecting
# the null hypothesis tells us that our model is biased due to
# autocorrelation. THIS IS WRONG!! NEVER DO THIS!
# Do this instead...
lm.morantest(lm, weights) #This gives a moran's I statistic without doing a random simulation (like above). We don't want to do this because we already have some assumptions about how the residuals should behave. Returns a p-value which determines if the Moran's statistic is significant. I'm not sure how they differ in terms of results, but anyway...
Creating a spatial autoregressive model
lagsarlm(I.V. ~ D.V., data = dataset, weights) #this creates a spatial lag model.
# We can compare the spatial model to the non-spatial model:
anova(lagsarlm, lm) #it is EXTREMELY important that the spatial model is put first!! We can then compare the AIC to identify the better model.
creating a spatial autoregressive model
lagsarlm(lm, data, weights) #creates a spatial lag model
errorsarlm(lm, data, weights) #creates a spatial error model
summary(sarlm) #Gives a TON on information including the usual coefficient tests, but also information on if including autocorrelation in the regression model significantly improves the performance of the model. Several fit statistics are also provided, but ANOVA should be used to compare the fit statistics.
anova(sarlm, lm) #must be order in this way! Compares the OLS and SAR models using the Likelihood Ratio Test
Understanding lag effects
impacts(sarlm, weights) #allows us to see how the effects of the interconnections defined by the weights matrix affect direct and indirect neighbors. Beware: This takes a long time to run.
# Otherwise, we can use the predict() function on original and altered
# data. We can then plot a map of the predicted values and how they have
# changed by altering a single value.
General know how…
dataset_new <- dataset #copy the existing data frame to a new object. Now, we can make edits without corrupting the original data.
as.data.frame(predict(lm)) #creates a data frame for predicted values. Now we can visualize the results.
abs() #calculates the absolute value.
General know how…
sub(dataset$Attribute, pattern = "old string", replacement = "new string") #this is a tool to help organize table data. It replaces one string with another
Creating a map using ggmap (google)
SpatialPointsDataFrame(coords, proj4string, data) #Create a spatial points data frame. I'm not sure if the projection is required....
get_map(locaiton = "United States", maptype = "road", zoom = zoomLevel) #gets a tile from google maps. Pretty cool!
ggmap(getmap(), extent = "panel") #plots the map
ggmap() + geom_point(data = , ase(x = dataset$Longtiude, y = dataset$Latitude),
color = red) #puts points on the ggmap
geom_point(...., size = attribute) #Can also scale the point sizes
Using SPSTAT to do basic data visualization
# Note: spstat is dependent upon using a spatial data frame and defining
# the ppp and owin. Please see the worked example for implementing these.
plot.ppp(ppp) #Creates a dot map of events. These can also be categorized based on event type.
plot(density(ppp, sigma = DistValue)) #Creates a density image using a kernel. By default, the kernel is a gaussian. The spread is defined by the sigma value.
contour(density(ppp, sigma), axes = F) #Creates a contour map based on the density image
persp(density(ppp, 1000), theta) #Creates a pretty cool 3-D rendering the density map. Theta describes the angle that is facing you.
plot(ppp, which.marks = "variable", maxsize = radius) #Creates a proportional circle map. The max size parameter will likely need to be played with to create a readable map.
plot(split(ppp))
plot(density(split(ppp)))
Editing legends for images (density map)
color() #This returns all the colors available in R. There are a lot...
colorRampPalette(c("Color1", "Color2")) #Will create a gradation between these two colors
colourmap(colourmapObject(NumberOfClasses), range = c(min, max)) #Defines the values the colors will represent.
# Classifying data:
quantile(ppp, probs = (0:3)/3) #This will create 3 classes using the quantile classification
cut(ppp, breaks = quantile(), labels = c("Label1", "Label2", "Label3")) #We can also give sensible labels to the value ranges.
quadratcount(X = ppp, tess = quantileMap)
plot(quadratcount()) #plots the number of cells in each category on the image
Some point pattern exploration/visualization methods…
plot(rhohat(unmark(deaths.ppp), deaths.death.im)) #plots rhohat -> compares kernel densities
nnclean(X = ppp.types$Category, k = numberOfPts) #Will map the clusters that include k neighbors
plot(split(nnclean()))
Testing CSR
rpoispp(lambda, win) #Creates a random poisson point distribution
quadrat.test(points, nx, ny) #Chi Squared quadrant test. The null hypothesis is that the points are formed by a CSR process.
Distances Tests
Gest(points) #G-Function (Nearest Neighbors)
envelope(points, Kest, nsim = 99, correction = "border") #K-function (spatial dependence)
ppm(points, ~covaraite1 + covariate2, covariates = list(covariate1 = data, covariate2 = data2)) #Creates an inhomogeneous test. Like a regression for point intensity
simulate(ppm(), 1) #Can simulate random points using the covariate intensity image. We can then test for CSR assuming some other spatial factor is in the way (A good way to remove an underlying trend surface from an analysis)
Look at multiple events
plot(split(points)) #Like plotting the factor of some variable
alltypes(data, "Kcross", envelope = T) #Calculates the K-Function for all event types against all other event types (This can take a while to run because it creates so many K-functions)
alltypes(data, "Kdot", envelope = T) #Creates a Kdot plot. I'm not sure what this tells us. it is somewhat confusing...
localK(data, verbose = F) #A local K-Function. This plots the K-function for all points in a dataset. It takes a very long time to run (i.e. there are 1000 points in a plot, 1000 k-functions are generated.)
plot(lK, iso1606 ~ r)
This was an in-class discussion on day 1 to get us thinking about statistical methods.
Using colorBrewer to make a map in R
display.brewer.all() #shows all color palettes in the library
# Get Colors...
pal <- brewer.pal(XXXX, "ColorPalette") #Extracts XXXX number of color patches from the given color palette provided by the ColorBrewer library
# Get class range breaks...
cat <- classIntervals(dataset$attribute, n = xxxx, style = "") #classifies the attribute into n number of classes.
# These are the options for style: 'fixed', 'sd', 'equal', 'pretty',
# 'quantile', 'kmeans', 'hclust', 'bclust', 'fisher', or 'jenks'
findColours(cat, pal) #links the color brewer palette with the defined class breaks.
General ggplot creation and editing
# We did a lot of stuff using the ggplot2 library. Please see the linked
# examples for more detail.
ggplot(data, aes(x, y)) #Creates a ggplot object
ggplot + geom_point() #this will plot the ggplot. We need to specify a dataframe type thing.
ggplot + stat_binhex() #This will bin the scatterplot into some aggregated shape
ggplot(data, aes(x, y, color = factorAttribute)) #This will plot the ggplot and color the factored points differently
ggplot + facet_grid(. ~ factorAttribute) #Creates multiple plots; one for each factor variable
Styling a ggplots has a lot of customization options.
library(ggthemes) #GGThemes permit various styling options for a ggplot. The user can also create his/her own. See the linked page for how to do this. Using ?themes is also useful in understanding what can be customized.
Maps can also be made using ggplot, but can take a long time to render
fortify(shapefile, region = "UniqueID")
merge(fortify(), shapefile, by.x = "id", by.y = "UniqueID") #This creates the dataframe that will be used to plot the map
# See the example for combining this to create a ggplot map. The
# structure is somewhat long. The neat thing is that the themes (above)
# can also be applied to the maps.
