This first part is submitted as homework for Western Washington Univeristy’s ESCI597 Time Series and Spatial Analysis course, lead by Dr. Andy Bunn. Please see his RPubs Page Here
# install.packages("sp")
# install.packages("rgeos")
# install.packages("maptools")
# install.packages("rgdal")
# install.packages("raster")
# install.packages("gstat")
# install.packages("usdm")
library(sp)
library(rgeos)
## rgeos version: 0.3-8, (SVN revision 460)
## GEOS runtime version: 3.3.3-CAPI-1.7.4
## Polygon checking: TRUE
library(maptools)
## Checking rgeos availability: TRUE
library(rgdal)
## rgdal: version: 0.9-2, (SVN revision 526)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.9.2, released 2012/10/08
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/rgdal/gdal
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/rgdal/proj
library(raster)
library(gstat)
Let’s take a look at the classis Meuse floodplain metals concentration dataset.
require(sp)
data(meuse)
head(meuse)
## x y cadmium copper lead zinc elev dist om ffreq soil
## 1 181072 333611 11.7 85 299 1022 7.909 0.00135803 13.6 1 1
## 2 181025 333558 8.6 81 277 1141 6.983 0.01222430 14.0 1 1
## 3 181165 333537 6.5 68 199 640 7.800 0.10302900 13.0 1 1
## 4 181298 333484 2.6 81 116 257 7.655 0.19009400 8.0 1 2
## 5 181307 333330 2.8 48 117 269 7.480 0.27709000 8.7 1 2
## 6 181390 333260 3.0 61 137 281 7.791 0.36406700 7.8 1 2
## lime landuse dist.m
## 1 1 Ah 50
## 2 1 Ah 30
## 3 1 Ah 150
## 4 0 Ga 270
## 5 0 Ah 380
## 6 0 Ga 470
class(meuse)
## [1] "data.frame"
coordinates(meuse) <- ~x+y # this comes from the {sp} package
proj4string(meuse) <- CRS("+init=epsg:28992")
class(meuse)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
bubble(meuse, xcol="x",ycol="y",zcol="cadmium", maxsize = 2.5, main = "Cadmium concentrations (ppm)")
This just set up the data.
next we use the variogram, which is “the simplest measure of spatial autocorrelation.”
cadVarCloud <- variogram(cadmium~1, meuse, cloud = TRUE)
plot(cadVarCloud,pch=20,cex=1.5,col="black",
ylab=expression("Semivariance ("*gamma*")"),
xlab="Distance (m)",main = "Cadmium concentrations (ppm)")
You can use this to find where points are no longer spatially autocorrelated. This could be a good baseline distance for point sampling if you wish to avoid autocorrelation. On the flipside, you can look at the distances that are spatially autocorrelated and investigate the reltionships at play there.
Perhaps we could do this with a RasterLayer
I will now perform a variogram on my own raster data.
I have developed here a snow departure timing map by scanning through the MOD10A2 MODIS Data Product to perform a change detection analysis. Every time that a pixel changes from snow to no snow that date (in this case, week of year) is recorded as a value in this map. Colors now represent the week of year that snow left the landscape for 2013.
library(raster)
WUS2013.path<-'/Users/bgatts/Desktop/WWU/ESCI597/ESCI_597_Snowmelt/SnowMeltTiffs/wus2013zero.tif'
WUS2013.raster=raster(WUS2013.path)
plot(WUS2013.raster)
plot(WUS2013.raster)
The color scheme here is a bit funky as this is a wok in progress. Zero values recieved no snow in 2013. Colors then represent 10+the number of weeks into the year that snow left that particular pixel. For example, pixels flanking many of the mountain ranges are dark green and have a value of 30, meaning that they melted off 20 weeks into the year (late May). I began counting here at 10 as to visually separate the snow from non-snow areas, as well as to separate them further in “snow melt timing” space in the variogram.
Let’s just take a close up look at the Central Sierra Nevada
CRLAclip<-'/Users/bgatts/Desktop/WWU/ESCI597/ESCI_597_Snowmelt/SnowMeltTiffs/2013CRLA.tif'
CRLA2013=raster(CRLAclip)
CRLA2013.values <- getValues(CRLA2013)
my.xlim <- c(10,40)
minElev <-min(CRLA2013.values)
maxElev <-max(CRLA2013.values)
plot(CRLA2013)
h<-hist(CRLA2013.values, breaks=seq(minElev,maxElev,length = 40),
col=rev(terrain.colors(40)), xlab="Melt Week of Year (weeks)",
main="",
xlim=my.xlim,
#ylim=c(0,0.001599),
freq=FALSE)
xfit<-seq(minElev,maxElev,length=100)
yfit<-dnorm(xfit,mean=mean(CRLA2013.values),sd=sd(CRLA2013.values))
lines(xfit, yfit, col="black", lwd=2)
boxplot(CRLA2013.values, horizontal=TRUE, outline=TRUE, axes=FALSE,at=0.0001,
ylim=my.xlim, col = "white", add = TRUE, boxwex=0.02)
Here is a discussion of Variogram {usdm}: Empirical variogram for raster data http://www.inside-r.org/packages/cran/usdm/docs/Variogram
#install.packages("usdm")
library("usdm")
Usage Variogram(x, lag, cutoff, cells, size=100) Arguments x a raster object (RasterLayer) lag the lag size (width of subsequent distance intervals) into which cell pairs are grouped for semivariance estimates. If missing, the cell size (raster resolution) is assigned. cutoff spatial separation distance up to which cell pairs are included in semivariance estimates; as a default, the length of the diagonal of the box spanning the data is divided by three. cells numeric (optional). A vector of cell numbers in the Raster object. This forces the function to only consider these cells (and their neighbours) to compute the variogram. size positive integer specifying the number of cells to be drawn from raster object. If the number of cells in the raster object is large, a sample with the specified size is drawn to make the computation more efficient.
RasterStack: interesting (http://www.inside-r.org/packages/cran/raster/docs/stack)
Well let’s look at the variogram:
CRLA2013.var=Variogram(CRLA2013, size=50)
plot(CRLA2013.var)
As one may expect, as physical distance increases, semivariance increases as well. Semvariance here resembles a mountain gradient, and I believe this is an artifact of the linear gradients at play along these mountains. It is hard to see, but it appears that the “nugget” is small and contained mostly within the very first bump (<0.1). Remember this x-axis is in decimal degrees.
Let’s just take a close up look at the Central Sierra Nevada
sierraclip<-'/Users/bgatts/Desktop/WWU/ESCI597/ESCI_597_Snowmelt/SnowMeltTiffs/sierrapos.tif'
sierra2013=raster(sierraclip)
sierra2013.values <- getValues(sierra2013)
# sierra2013.allvalues <- getValues(sierra2013)
sierra2013.sdvalues=sierra2013.values[sierra2013.values >0]
my.xlim <- c(10,34)
minElev <-min(sierra2013.values)
maxElev <-max(sierra2013.values)
plot(sierra2013)
h<-hist(sierra2013.values, breaks=seq(minElev,maxElev,length = 34),
col=rev(terrain.colors(34)), xlab="Melt Week of Year (weeks)",
main="",
xlim=my.xlim,
#ylim=c(0,0.001599),
freq=FALSE)
xfit<-seq(10,maxElev,length=100)
yfit<-dnorm(xfit,mean=mean(22),sd=sd(sierra2013.sdvalues))
lines(xfit, yfit, col="black", lwd=2)
boxplot(sierra2013.values, horizontal=TRUE, outline=TRUE, axes=FALSE,at=0.0001,
ylim=my.xlim, col = "white", add = TRUE, boxwex=0.02)
Well let’s look at the variogram:
sierra2013.var=Variogram(sierra2013, size=20)
plot(sierra2013.var)
plot(sierra2013.var, xlim=c(0,.5))
As one may expect, as physical distance increases, semivariance increases as well. Semvariance here resembles a mountain gradient, and I believe this is an artifact of the linear gradients at play along these mountains. It is hard to see, but it appears that the “nugget” is small and contained mostly within the very first bump (<0.05). Remember this x-axis is in decimal degrees.
CRLA2013.values <- getValues(CRLA2013)
my.xlim <- c(10,40)
minElev <-min(CRLA2013.values)
maxElev <-max(CRLA2013.values)
h<-hist(CRLA2013.values, breaks=seq(minElev,maxElev,length = 40),
col=rev(terrain.colors(40)), xlab="Melt Week of Year (weeks)",
main="",
xlim=my.xlim,
#ylim=c(0,0.001599),
freq=FALSE)
xfit<-seq(minElev,maxElev,length=100)
yfit<-dnorm(xfit,mean=mean(CRLA2013.values),sd=sd(CRLA2013.values))
lines(xfit, yfit, col="black", lwd=2)
boxplot(CRLA2013.values, horizontal=TRUE, outline=TRUE, axes=FALSE,at=0.0001,
ylim=my.xlim, col = "white", add = TRUE, boxwex=0.02)
I then randomly sampled 50 points from the snow loss timing data and tested for normality using the Shapiro-Wilk test:
# Subset this sample to remove areas of no snow
CRLA2013.snowvalues=CRLA2013.values[CRLA2013.values >0]
CRLA2013.shapiro=seq(1:10)
for (i in 1:10) {
CRLA2013.sample50=sample(CRLA2013.snowvalues, size=50, replace = FALSE, prob = NULL)
CRLA2013.shapiro[i]=shapiro.test(CRLA2013.sample50)$p.value
#hist(CRLA2013.sample50)
}
barplot(CRLA2013.shapiro, ylab="p-values")
abline(h=0.05)