Modeling surfaces is an excellent way to explore and understand data. A surface is a visual representation of your data where values between known points are calculated. Typically there are two types of surfaces you would make in the Geographic Information Sciences, kernel density maps and surface interpolations. Since the results of these processes are very similar in appearance it is very important to know when and why you want to run these types of analysis.
A kernel density surface uses point data to represent the magnitude per are, or density, with the points available. Any Values associated with the point data are not considered in the analysis. As an example, kernel density maps are commonly used to understand where most crimes occur in a city.
For this section we will be exploring weather station data contained in the Global Historical Climate Network Daily (GHCND) from the Applied Climate Information System (ACIS), http://www.rcc-acis.org/index.html. This dataset contains data from weather stations all over the world. It is summary statistics from the years 1981 to 2010. It has each stations monthly minimum values, maximum values, and averages in both Celsius and Fahrenheit, as well as precipitation values and elevation.
In this exercise we will be looking at the stations in the African continent. First lets load in our libraries and data, and take a look at it. I am hosting our data on github. I recommend if you do a lot of collaboration to host your commonly used files on github.
library(sf) #st functions
library(ggplot2) #mapping
library(tidyverse) #cleaning
library(dplyr) #select,filter,ect.
library(rnaturalearth) #shapefiles of all countries
library(gstat) #for IDW
library(rgdal) #coordinates
#load data
worldtemps <- read.csv("https://wwiskes.github.io/datadump/worldtemps.csv")
head(worldtemps)[3:8]
## Lon Lat Elev..m. Station.Name Jan.Avg..Temp.C Feb.Avg..Temp.C
## 1 25.333 55.517 34.0 SHARJAH INTER. AIRP 17.844761 19.175028
## 2 25.255 55.364 10.4 DUBAI INTL 19.095698 20.366612
## 3 24.433 54.651 26.8 ABU DHABI INTL 18.491373 20.052231
## 4 24.262 55.609 264.9 AL AIN INTL 18.219758 20.817374
## 5 35.317 69.017 3366.0 NORTH-SALANG -8.863681 -9.975253
## 6 34.210 62.228 977.2 HERAT 3.687432 8.547313
This data set has a lot of columns and rows, so to look at the data here I used head() to limit the number of rows and subsetted the columns using [3:8]. You can see we have latitude and longitude columns, stations IDs, average monthly temperatures. I suggest you explore this dataset further and think about the different ways these data could be used.
In earlier lessons we have learned to do basic displays of data in ggplot. This lesson assumes you are fairly comfortable with using this library, so I won’t be explaining its basic functions. Lets take a look at where there are stations within the African continent. I recommend using the library rnatural earth for sourcing polygons of the world. We have already loaded that package in, so lets take a look at our point data over our polygons. First though, we have to take our station data and turn it into a spatial object, then separate back out the longitude and latitude columns since creating an SF object drops them (and we’ll need those columns later)
#cleaning up our station data
tempsSF <- st_as_sf(worldtemps, coords = c("Lat", "Lon"), crs = 4326)
tempsSF["longitude"] <- unlist(map(tempsSF$geometry,1))
tempsSF["latitude"] <- unlist(map(tempsSF$geometry,2))
#ggplot our data
ggplot(tempsSF)+
geom_sf(colour = "#1F85DE") +
geom_sf(data = ne_countries(returnclass = "sf"), alpha = 0, colour = "#000000")
But we’re only interested in Africa, so we’re going to assign the African continent to a variable and make a ggplot who’s limits are restricted to that area.
#assigning the RNaturalEarth polygon to an object
africa <- ne_countries(continent = "Africa", returnclass = "sf")
#plot
ggplot(tempsSF)+
scale_x_continuous(limits = c(-25, 55)) +
scale_y_continuous(limits = c(-40, 40)) +
geom_sf(colour = "#1F85DE") +
geom_sf(data = africa, alpha = 0, colour = "#000000")
Alright, lookin’ good. Now we’re ready to start making some surfaces! First we’re going to use the ggplot function “stat density 2d” to create a two dimensional heat map.
ggplot(tempsSF)+
scale_x_continuous(limits = c(-25, 55)) +
scale_y_continuous(limits = c(-40, 40)) +
stat_density2d_filled(aes(x=longitude, y=latitude, fill = ..density..), geom = "raster", contour = FALSE)+
scale_fill_distiller(palette = "Spectral", direction = -1)+
theme(
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position='none'
) +
geom_sf(data = africa, col = "#000000", alpha=0)
Ok, so what’s going on here? stat_density2d_filled is taking the point data and deriving a kernal density surface of the points density. I have chosen here to output this as a raster, but polygons are an option as well. Then it is colored using the scale_fill_distiller using the spectral palette.
But what is it showing? This map is representing the density of the point data, but is not showing any temperatures from the stations, instead where these stations were located. This may be useful for determining where the largest gaps are in our coverage area. For example Somalia has quite low coverage, as is true of the DRC, Angola, and northern Chad. If complete weather station coverage of Africa were the goal, this map would help guide the effort. However if this map were misinterpreted a person may think that the countries listed above had the lowest temperature. Planners and analysts have to be very careful what they are attempting to say with their data, since there are many ways to represent the data available.
The other surface we will be considering in this chapter is a surface interpolation. This analysis does consider the values of the point data and creates a surface by calculating the unknown values between known values and deriving a surface from them. Weather maps are used everyday to communicate weather conditions and temperatures. Recording values may not be easily accessible in all locations, and surface interpolations are a common tool to help predict the missing values based on known values.
This is much more complex than creating a kernel density surface where you can just simply feed your point data into the function. Rstudio doesn’t have the ability to wrap the raster creation into the creation of the surface like stat_density2d so we will have to create a blank raster and a few other things to run this tool. We will be using the gstat library and the idw function. IDW stands for inverse distance weighted and this is the most simple to run of surface interpolations. There are many more you can do in Rstudio, however that is outside of the scope of this chapter.
First it is necessary to assign a formula using the rgdal coordinates function which uses the formula ~x+y to assign worldtemps a set of coordinates
coordinates(worldtemps) = ~x+y
Next we create a blank raster to use as a template for the IDW interpolation. Expand grid is taking the range of latitude and longitude values and gridding them by 1. The “from” statements are setting the range of latitude and longitude for the ‘window’ in which our interpolation will take place. For finer or coarser resolution change the pixel size in the “by=1”, but beware, too fine of resolution will actually make the interpolation less accurate by increasing the distance between points. Now the grid has a range of values along the x and y axis, to make these coordinates we have to use the “rgdal” library to knit them into coordinates. And then finally we create a spatial grid object through gridded()<-TRUE.
grd <- expand.grid(x=seq(from=-25, to=55, by=1), y=seq(from=-40, to=40, by=1))
coordinates(grd) <- ~ x+y
gridded(grd) <- TRUE
Ok the leg work is now done and we can finally get to our interpolation! The idw() function of the gstat library will create an inverse weighted surface interpolation of our point data. I’m interested in the July average temperature in Fahrenheit, but you can choose any month you want. So here we pull that variable into our formula, we also assign the locations (point data) as well. The grd variable we just made is then used as our template raster. And we set our idp power as well. This last variable is probably the most interesting. There is no hard-and-fast rule about what power is going to be the best for your data. Generally a value between 1 and 5 is going to work the best. This is the power at which the interpolated data ‘looks’ at the point data available. A higher number will be more fitted and likely be more accurate, however may introduce some errors if too high. Our dataset here is pretty robust, so a power of 4 works well. Any power over 5 will work, but likely won’t look much different than a power of 5. Generally speaking you probably will have to play with different values when attempting this method on a new dataset. Lastly for ggplot to read the output of the idw function it has to be formatted as a data frame.
output <- idw(formula= worldtemps$Jul.Avg..Temp.F ~ 1, locations=worldtemps, newdata=grd, idp=4)
IDW = as.data.frame(output)
Now lets plot it. At this point you should be familiar enough with ggplot to understand what is happening here. But to briefly cover it, we display our IDW with geom_raster. The variable var1.pred is how the IDW function outputs information, you can open your IDW using view() to see how this is organized. Scale_fill_distiller is optional - just making things look good. So are the theme options. And finally we also display our Africa polygons with geom_sf().
ggplot(IDW)+
geom_raster(aes(x=x, y=y, fill=var1.pred))+
scale_fill_distiller(palette = "Spectral", direction = -1)+
theme(
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position='none'
) +
geom_sf(data = africa, col = "black", alpha=0)
Now that is an nice map that really shows you where high and low temperatures are most likely. Both of these methods - the kernal density surface and the IDW - have their place in the geosciences, however their application is much different. Hopefully this chapter has helped illuminate when and why to use these very different (but similar looking) types of analysis.
Alright! Let’s put this knowledge to use! Find a different region you want to run a surface interpolation on. Filter the RNaturalEarth data to only show that county/region. Re-write the grid raster to re-center the IDW around that region. You may need to use a different IDW power as well. Also, choose a different month in the worldtemps data to interpolate.