Compare Google API elevation to known resolution Digital Elevation Models (DEM)

A quick analysis to try to identify the level of error in the elevations derived from the Google elevation API. Suggesting that relatively low resolution elevation data is use, the documentation for the API states: “With the Elevation API, you can develop hiking and biking applications, mobile positioning applications, or low resolution surveying applications.” source: https://developers.google.com/maps/documentation/elevation/

The documentation of the actual data source for the elevation from the API is sparse and inconclusive given a brief search.

The following analysis draws a random sample of 2,000 elevation points from a 10, 30, and 90 meter resolution DEMs to compare to Googles API.

*NOTE: this was a quickly constructed experiment to see quantify the differences in elevation; no guarantee implied. The code is likely a little sloppy and the results will vary from place to place. Any errors are my own dumb fault; your milage may vary.

# environment
sessionInfo()
## R version 3.0.3 (2014-03-06)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.6
## 
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.5 formatR_1.0    stringr_0.6.2  tools_3.0.3
# required packages
library('rgbif')
library('raster')
library('rgdal')
library('ggplot2')

Data: The elevation data is from a 7.5 by 7.5 minute area around Ohiopyle Pennsylvania; the same bounds as the USGS quad of that name.

10 meter DEM:
location: http://www.pasda.psu.edu/data/dem24k_10m/ohiopyle_pa
metadata: http://www.pasda.psu.edu/uci/FullMetadataDisplay.aspx?file=dem10meter.xml

30 meter DEM:
http://www.pasda.psu.edu/data/dem24k/ohiopyle_pa
metadata: http://www.pasda.psu.edu/uci/FullMetadataDisplay.aspx?file=dem30meter.xml

SRTM 90 meter DEM:
Location: downloaded using the getData() function of the raster package
metadata: (available at) http://www2.jpl.nasa.gov/srtm/

Following the downloading of the DEM from the FTP URL above, the data are projected into a appropriate Lat/Long projection and plotted.

newCRS <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs " # lat/long projection
dem30 <-  raster("C:/TEMP/elevR/ohiopyle_pa_30m/ohiopyle_pa.dem")
dem10 <- raster("C:/TEMP/elevR/ohiopyle_pa_10m/ohiopyle_pa.dem")
dem30P <- projectRaster(dem30, crs = newCRS) 
dem10P <- projectRaster(dem10, crs = newCRS) 
demSRTM <- getData('SRTM', lon=-79, lat=40, path = "C:/TEMP/elevR/SRTM")
demSRTMc <- crop(demSRTM, extent(dem30P))

study area elevation range: 335.2 to 893 feet AMSL

# a little analysis for better plotting
slope = terrain(dem10P,opt='slope')
aspect = terrain(dem10P,opt='aspect')
hill = hillShade(slope,aspect,40,270)

Plot of study area elevation and hillshade:

plot(hill,col=grey(0:100/100),legend=FALSE)
plot(dem10P, col=rainbow(25,alpha=0.35), add=TRUE)

plot of chunk unnamed-chunk-5

Because of limitations on the API (using the rgbif package), I had to download a number of small samples to compile into a larger sample. I did this using the loop below and saved as a *.csv. The CSV is loaded in the next part of the Rmarkdown to continue the analysis. The new elevation measures are pulled from the Google API using the elevation() function of the rgbif package

coordsN <- NULL
googleSmpleN <- NULL
for(i in 1:200){
  bounds <- extent(dem30P)
  dem30Smple <- data.frame(sampleRandom(dem30P, size = 10, na.rm=TRUE, ext=bounds,xy=TRUE)) #random sample
  colnames(dem30Smple) <- c("longitude", "latitude", "elevation")
  coords <- cbind(dem30Smple$longitude, dem30Smple$latitude)

  googleSmple <- elevation(latitude = dem30Smple$latitude, longitude = dem30Smple$longitude)
  print(head(googleSmple) )# sample of Google API elevation data

  coordsN <- rbind(coordsN, coords)
  googleSmpleN <- rbind(googleSmpleN, googleSmple)
}

write.csv(coordsN, "C:/TEMP/elevR/coords.csv")
write.csv(googleSmpleN, "C:/TEMP/elevR/googleSmpleN.csv")

Below, the coordinates are used to draw sample at the exact lat/long from the three DEMs

coords <- read.csv("C:/TEMP/elevR/coords.csv")[,2:3]
dem30Smple <- data.frame(extract(dem30P, coords)) # extract coords sample
dem30Smple <- cbind(coords, dem30Smple)
colnames(dem30Smple) <- c("longitude", "latitude", "elevation")
dem10Smple <- data.frame(extract(dem10P, coords)) # extract coords sample
dem10Smple <- cbind(coords, dem10Smple)
colnames(dem10Smple) <- c("longitude", "latitude", "elevation")
srtmSmple <- data.frame(extract(demSRTMc, coords)) # extract coords sample
srtmSmple <- cbind(coords, srtmSmple)
colnames(srtmSmple) <- c("longitude", "latitude", "elevation")

Load the Google API results for the same lat/long coords

googleSmple <- read.csv("C:/TEMP/elevR/googleSmpleN.csv")[,2:4]
str(googleSmple) # sample of Google API elevation data from coords
## 'data.frame':    2000 obs. of  3 variables:
##  $ latitude : num  39.8 39.8 39.8 39.9 39.9 ...
##  $ longitude: num  -79.5 -79.4 -79.4 -79.4 -79.5 ...
##  $ elevation: num  716 634 603 709 398 ...

Exploring the results…

samples <- c("dem10Smple", "dem30Smple", "srtmSmple")
results <- matrix(ncol = 3, nrow = length(samples))
colnames(results) <- c("MAE", "Error_SD", "MAX") 
rownames(results) <- samples
errors <- data.frame()
sampleList <- list(dem10Smple, dem30Smple, srtmSmple)
for(i in seq_along(samples)){
  data <- sampleList[[i]]
  error <- abs(data$elevation - googleSmple$elevation) # absolute error
  error <- error[!is.na(error)]
  data_source <- rep(samples[i], length(error))
  errorTmp <- data.frame(error, data_source, stringsAsFactors=FALSE)
  errors <- rbind(errors, errorTmp)
  mae <- round(mean(error), 3) # mean absolute error
  maxEr <- round(max(error),3) # max error
  stdDev <- round(sd(error),3) # standard deviation of error
  results[i,1] <- mae
  results[i,2] <- stdDev
  results[i,3] <- maxEr
}

Results of analysis (measurements in feet)

print(results)
##               MAE Error_SD   MAX
## dem10Smple  2.498    2.009 16.06
## dem30Smple  6.045    4.323 66.20
## srtmSmple  10.728   11.175 80.61

Plot for histogram of sample errors (in feet)

ggplot(errors, aes(x = data_source, y = error, group = data_source)) +
  geom_boxplot()

plot of chunk unnamed-chunk-10

Conclusions

The Google data is better than I had expected. Some info on the web suggested it was all SRTM 90m; mainly because it has worldwide coverage and is at a resolution that is easy to deal with storage and retrieval. These results (for this small study area) show that the Google API data is closer to the higher resolution 1/3rd arc Second (~10 meter) DEM; that is encouraging. Initially I tweeted that the Google API data had a 54 feet MAE compared to a LiDAR derived DEM. I believe this was an overestimated MAE due to a projection error with the LiDAR DEM (wholly my fault). I did not reuse the LiDAR data (1 meter horizontal resolution) here because I have not fixed the projection error and it would take approximately 9 tiles (~450 Mb total) to get the same coverage as the 7.5" 10 and 30m DEMs. I will compare the LiDAR data when time allows.

It is likely the Google, being Google, pulls the best locally available data source and uses it in there API. I am sure the brains there have ways of dealing with the elevated data overheard of higher resolution data. In the end, I would likely use the Google API data for when elevation is not a critical component; but would run a quick check like this first to estimate the error.

If you see any big holes in this analysis, please contact mr.ecos (at) gmail.com