A quick analysis to try to identify the level of error in the elevations derived from the Google elevation API. Suggesting that relatively low resolution elevation data is use, the documentation for the API states: “With the Elevation API, you can develop hiking and biking applications, mobile positioning applications, or low resolution surveying applications.” source: https://developers.google.com/maps/documentation/elevation/
The documentation of the actual data source for the elevation from the API is sparse and inconclusive given a brief search.
The following analysis draws a random sample of 2,000 elevation points from a 10, 30, and 90 meter resolution DEMs to compare to Googles API.
*NOTE: this was a quickly constructed experiment to see quantify the differences in elevation; no guarantee implied. The code is likely a little sloppy and the results will vary from place to place. Any errors are my own dumb fault; your milage may vary.
# environment
sessionInfo()
## R version 3.0.3 (2014-03-06)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.6
##
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.5 formatR_1.0 stringr_0.6.2 tools_3.0.3
# required packages
library('rgbif')
library('raster')
library('rgdal')
library('ggplot2')
Data: The elevation data is from a 7.5 by 7.5 minute area around Ohiopyle Pennsylvania; the same bounds as the USGS quad of that name.
10 meter DEM:
location: http://www.pasda.psu.edu/data/dem24k_10m/ohiopyle_pa
metadata: http://www.pasda.psu.edu/uci/FullMetadataDisplay.aspx?file=dem10meter.xml
30 meter DEM:
http://www.pasda.psu.edu/data/dem24k/ohiopyle_pa
metadata: http://www.pasda.psu.edu/uci/FullMetadataDisplay.aspx?file=dem30meter.xml
SRTM 90 meter DEM:
Location: downloaded using the getData() function of the raster package
metadata: (available at) http://www2.jpl.nasa.gov/srtm/
Following the downloading of the DEM from the FTP URL above, the data are projected into a appropriate Lat/Long projection and plotted.
newCRS <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs " # lat/long projection
dem30 <- raster("C:/TEMP/elevR/ohiopyle_pa_30m/ohiopyle_pa.dem")
dem10 <- raster("C:/TEMP/elevR/ohiopyle_pa_10m/ohiopyle_pa.dem")
dem30P <- projectRaster(dem30, crs = newCRS)
dem10P <- projectRaster(dem10, crs = newCRS)
demSRTM <- getData('SRTM', lon=-79, lat=40, path = "C:/TEMP/elevR/SRTM")
demSRTMc <- crop(demSRTM, extent(dem30P))
study area elevation range: 335.2 to 893 feet AMSL
# a little analysis for better plotting
slope = terrain(dem10P,opt='slope')
aspect = terrain(dem10P,opt='aspect')
hill = hillShade(slope,aspect,40,270)
Plot of study area elevation and hillshade:
plot(hill,col=grey(0:100/100),legend=FALSE)
plot(dem10P, col=rainbow(25,alpha=0.35), add=TRUE)
Because of limitations on the API (using the rgbif package), I had to download a number of small samples to compile into a larger sample. I did this using the loop below and saved as a *.csv. The CSV is loaded in the next part of the Rmarkdown to continue the analysis. The new elevation measures are pulled from the Google API using the elevation() function of the rgbif package
coordsN <- NULL
googleSmpleN <- NULL
for(i in 1:200){
bounds <- extent(dem30P)
dem30Smple <- data.frame(sampleRandom(dem30P, size = 10, na.rm=TRUE, ext=bounds,xy=TRUE)) #random sample
colnames(dem30Smple) <- c("longitude", "latitude", "elevation")
coords <- cbind(dem30Smple$longitude, dem30Smple$latitude)
googleSmple <- elevation(latitude = dem30Smple$latitude, longitude = dem30Smple$longitude)
print(head(googleSmple) )# sample of Google API elevation data
coordsN <- rbind(coordsN, coords)
googleSmpleN <- rbind(googleSmpleN, googleSmple)
}
write.csv(coordsN, "C:/TEMP/elevR/coords.csv")
write.csv(googleSmpleN, "C:/TEMP/elevR/googleSmpleN.csv")
Below, the coordinates are used to draw sample at the exact lat/long from the three DEMs
coords <- read.csv("C:/TEMP/elevR/coords.csv")[,2:3]
dem30Smple <- data.frame(extract(dem30P, coords)) # extract coords sample
dem30Smple <- cbind(coords, dem30Smple)
colnames(dem30Smple) <- c("longitude", "latitude", "elevation")
dem10Smple <- data.frame(extract(dem10P, coords)) # extract coords sample
dem10Smple <- cbind(coords, dem10Smple)
colnames(dem10Smple) <- c("longitude", "latitude", "elevation")
srtmSmple <- data.frame(extract(demSRTMc, coords)) # extract coords sample
srtmSmple <- cbind(coords, srtmSmple)
colnames(srtmSmple) <- c("longitude", "latitude", "elevation")
Load the Google API results for the same lat/long coords
googleSmple <- read.csv("C:/TEMP/elevR/googleSmpleN.csv")[,2:4]
str(googleSmple) # sample of Google API elevation data from coords
## 'data.frame': 2000 obs. of 3 variables:
## $ latitude : num 39.8 39.8 39.8 39.9 39.9 ...
## $ longitude: num -79.5 -79.4 -79.4 -79.4 -79.5 ...
## $ elevation: num 716 634 603 709 398 ...
Exploring the results…
samples <- c("dem10Smple", "dem30Smple", "srtmSmple")
results <- matrix(ncol = 3, nrow = length(samples))
colnames(results) <- c("MAE", "Error_SD", "MAX")
rownames(results) <- samples
errors <- data.frame()
sampleList <- list(dem10Smple, dem30Smple, srtmSmple)
for(i in seq_along(samples)){
data <- sampleList[[i]]
error <- abs(data$elevation - googleSmple$elevation) # absolute error
error <- error[!is.na(error)]
data_source <- rep(samples[i], length(error))
errorTmp <- data.frame(error, data_source, stringsAsFactors=FALSE)
errors <- rbind(errors, errorTmp)
mae <- round(mean(error), 3) # mean absolute error
maxEr <- round(max(error),3) # max error
stdDev <- round(sd(error),3) # standard deviation of error
results[i,1] <- mae
results[i,2] <- stdDev
results[i,3] <- maxEr
}
print(results)
## MAE Error_SD MAX
## dem10Smple 2.498 2.009 16.06
## dem30Smple 6.045 4.323 66.20
## srtmSmple 10.728 11.175 80.61
Plot for histogram of sample errors (in feet)
ggplot(errors, aes(x = data_source, y = error, group = data_source)) +
geom_boxplot()
The Google data is better than I had expected. Some info on the web suggested it was all SRTM 90m; mainly because it has worldwide coverage and is at a resolution that is easy to deal with storage and retrieval. These results (for this small study area) show that the Google API data is closer to the higher resolution 1/3rd arc Second (~10 meter) DEM; that is encouraging. Initially I tweeted that the Google API data had a 54 feet MAE compared to a LiDAR derived DEM. I believe this was an overestimated MAE due to a projection error with the LiDAR DEM (wholly my fault). I did not reuse the LiDAR data (1 meter horizontal resolution) here because I have not fixed the projection error and it would take approximately 9 tiles (~450 Mb total) to get the same coverage as the 7.5" 10 and 30m DEMs. I will compare the LiDAR data when time allows.
It is likely the Google, being Google, pulls the best locally available data source and uses it in there API. I am sure the brains there have ways of dealing with the elevated data overheard of higher resolution data. In the end, I would likely use the Google API data for when elevation is not a critical component; but would run a quick check like this first to estimate the error.
If you see any big holes in this analysis, please contact mr.ecos (at) gmail.com