In this tutorial we will open some geospatial data that is stored in a netCDF file. We will select the variable of interest, and we will export the data to a GeoTIFF so that we can continue the analysis in R or other geospatial software.
First, we need some data to play with. As an example, we will use some data from the Harmonized World Soil Database. This data is available from the ORNL DAAC at the following link: https://doi.org/10.3334/ORNLDAAC/1247
This dataset provides selected soil parameters from the Harmonized World Soil Database (HWSD) v1.2, including additional calculated parameters such as area weighted soil organic carbon (kg C per m2), as high resolution NetCDF files.
The data citation is: Wieder, W.R., J. Boehnert, G.B. Bonan, and M. Langseth. 2014. Regridded Harmonized World Soil Database v1.2. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/1247
Specifically, we will use the file “T_SILT.nc4” from this dataset. This file provides gridded estimates of the fraction of the topcoil that is silt, as a percentage by weight.
Go download the file. You will need to log in using your NASA Earthdata login. Remember to place the file into the same working directory where your R script is located.
Reading data stored in netCDF files is quite straightforward in R, provided that you load the appropriate packages. You may need to install these packages if you have not used them before.
library(ncdf4) # package for netcdf manipulation
library(raster) # package for raster manipulation
library(rgdal) # package for geospatial analysis
library(ggplot2) # package for plotting
Use nc_open to read the data into a data structure I called nc_data. Print the metadata about this file to a text file.
nc_data <- nc_open('T_SILT.nc4')
# Save the print(nc) dump to a text file
{
sink('T_SILT.txt')
print(nc_data)
sink()
}
From this output we see that there is one variable: T_SILT that has two dimensions, lon and lat.
There are 10 global attributes which provide metadata information about the file.
We need to capture these data into R. The following code reads the latitudes and longitudes of each “T_SILT” observation and saves them in memory.
lon <- ncvar_get(nc_data, "lon")
lat <- ncvar_get(nc_data, "lat", verbose = F)
head(lon) # look at the first few entries in the longitude vector
## [1] -179.975 -179.925 -179.875 -179.825 -179.775 -179.725
Read in the data from the T_SILT variable and verify the dimensions of the array. There should be 7200 lons, and 3600 lats
ndvi.array <- ncvar_get(nc_data, "T_SILT") # store the data in a 2-dimensional array
dim(ndvi.array)
## [1] 7200 3600
Other pertinent information: Lets’s see what fill value was used for missing data.
fillvalue <- ncatt_get(nc_data, "T_SILT", "_FillValue")
fillvalue
## $hasatt
## [1] TRUE
##
## $value
## [1] -1
The fill value is -1.
All done reading in the data. We can close the netCDF file.
nc_close(nc_data)
So, now we have the entire array of T_SILT values for 7200 x 3600 grid cells in R. What can we do with it?
First, a little housekeeping. Let’s replace all those pesky fill values with the R-standard ‘NA’.
ndvi.array[ndvi.array == fillvalue$value] <- NA
To make it easier to work with, let’s save this data in a raster. Note that we provide the coordinate reference system “CRS” in the standard well-known text format. For this data set, it is the common WGS84 system.
r <- raster(t(ndvi.array), xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
We will need to transpose and flip to orient the data correctly. The best way to figure this out is through trial and error, but remember that most netCDF files record spatial data from the bottom left corner.
r <- flip(r, direction='y')
Finally, we can plot the raster to take a look at the percentage of silt in the topsoil (T_SILT values).
plot(r)
If this looks good, then let’s save it to a GeoTIFF file.
writeRaster(r, "T_SILT.tif", "GTiff", overwrite=TRUE)
Maybe you want to get the silt percentage only at your study location, such as the region surrounding the Toolik Lake Field Station in Alaska (Latitude: 68.6275, Longitude: -149.5975).
Extract data at the Toolik Lake study location from the raster using the ‘extract()’ function.
toolik_lon <- -149.5975
toolik_lat <- 68.6275
toolik_silt <- extract(r, SpatialPoints(cbind(toolik_lon,toolik_lat)), method='simple')
toolik_silt
##
## 18
The value of T_SILT at this location is 18%. The steps above could be repeated with a for() loop to get data at many point locations. You can also find online many resources for how to extract data for a buffer area around a point location, or for a shapefile.