In this tutorial we will open some geospatial data that is stored in a netCDF file. We will select the variable and time range of interest, and we will export the data to a GeoTIFF so that we can continue the analysis in R or other geospatial software.
First, we need some data to play with. As an example, we will use some data on trends in vegetation greenness in the Arctic. This data is available from the ORNL DAAC at the following link: https://doi.org/10.3334/ORNLDAAC/1275
This dataset provides normalized difference vegetation index (NDVI) data for the Arctic growing season derived primarily with data from Advanced Very High Resolution Radiometer (AVHRR) sensors onboard several NOAA satellites over the years 1982 through 2012. The NDVI data, which show vegetation activity, were averaged annually for the arctic growing season (June, July and August) in each year. The data are circumpolar in coverage at 8-km resolution and limited to greater than 20 degrees N.
The data citation is: Guay, K.C., P.S.A. Beck, and S.J. Goetz. 2015. Long-Term Arctic Growing Season NDVI Trends from GIMMS 3g, 1982-2012. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/1275
Specifically, we will use the file “gimms3g_ndvi_1982-2012.nc4”" from this dataset. Go download it. You will need to log in using your NASA Earthdata login.
Remember to place the file into the same working directory where your R script is located.
Reading data stored in netCDF files is quite straightforward in R, provided that you load the appropriate packages. You may need to install these packages if you have not used them before.
library(ncdf4) # package for netcdf manipulation
library(raster) # package for raster manipulation
library(rgdal) # package for geospatial analysis
library(ggplot2) # package for plotting
Use nc_open to read the data into a data structure I called nc_data. Print the metadata about this file to a text file.
nc_data <- nc_open('gimms3g_ndvi_1982-2012.nc4')
# Save the print(nc) dump to a text file
{
sink('gimms3g_ndvi_1982-2012_metadata.txt')
print(nc_data)
sink()
}
From this output we see that there are two variables: time_bnds, which contains the start and end date of each observation, and NDVI, which is the variable of interest. NDVI has three dimensions: [lon,lat,time].
There are four dimensions in the file: lat, lon, time, and “nv”, which is used to record the beginning and end of the time range. We can ignore “nv” and focus on the three dimensions that are used to organize the NDVI data.
There are 10 global attributes which provide metadata information about the file.
We need to capture these data in the lat, lon, and time dimensions. The following code reads the latitudes, longitudes, and time of each NDVI observation and saves them in memory.
lon <- ncvar_get(nc_data, "lon")
lat <- ncvar_get(nc_data, "lat", verbose = F)
t <- ncvar_get(nc_data, "time")
head(lon) # look at the first few entries in the longitude vector
## [1] -179.9583 -179.8750 -179.7917 -179.7083 -179.6250 -179.5417
Read in the data from the NDVI variable and verify the dimensions of the array. There should be 4320 lons, 840 lats, and 31 times
ndvi.array <- ncvar_get(nc_data, "NDVI") # store the data in a 3-dimensional array
dim(ndvi.array)
## [1] 4320 840 31
Other pertinent information about the NDVI variable: Lets’s see what fill value was used for missing data.
fillvalue <- ncatt_get(nc_data, "NDVI", "_FillValue")
fillvalue
## $hasatt
## [1] TRUE
##
## $value
## [1] -9999
The fill value is -9999.
All done reading in the data. We can close the netCDF file.
nc_close(nc_data)
So, now we have the entire array of NDVI values for 4320 x 840 grid cells over each of 31 years in R. What can we do with it?
First, a little housekeeping. Let’s replace all those pesky fill values with the R-standard ‘NA’.
ndvi.array[ndvi.array == fillvalue$value] <- NA
Let’s get one year of the NDVI data and plot it.
Time is the third dimension of the “ndvi.array”. The first time slice represents the growing season of 1982.
ndvi.slice <- ndvi.array[, , 1]
Just to make sure everything is working correctly, we can take a look at the dimensions of this time slice. The dimensions should be 4320 longitudes by 840 latitudes.
dim(ndvi.slice)
## [1] 4320 840
Ok, everything checks out, so we can go ahead and save this data in a raster. Note that we provide the coordinate reference system “CRS” in the standard well-known text format. For this data set, it is the common WGS84 system.
r <- raster(t(ndvi.slice), xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
We will need to transpose and flip to orient the data correctly. The best way to figure this out is through trial and error, but remember that most netCDF files record spatial data from the bottom left corner.
r <- flip(r, direction='y')
Finally, we can plot the raster to take a look at the NDVI in 1982. Remember this data is cut off below 20 degrees North.
plot(r)
If this looks good, then let’s save it to a GeoTIFF file.
writeRaster(r, "GIMMS3g_1982.tif", "GTiff", overwrite=TRUE)
Maybe you want to get a timeseries of NDVI at a study location, such as the Toolik Lake Field Station in Alaska (Latitude: 68.6275, Longitude: -149.5975).
First, we will need to convert the entire 3d array of data to a raster brick. Note, this step may take several minutes.
r_brick <- brick(ndvi.array, xmn=min(lat), xmx=max(lat), ymn=min(lon), ymx=max(lon), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
# note that you may have to play around with the transpose (the t() function) and flip() before the data are oriented correctly. In this example, the netcdf file recorded latitude on the X and longitude on the Y, so both a transpose and a flip in the y direction were required.
r_brick <- flip(t(r_brick), direction='y')
Extract timeseries of data at the Toolik Lake study location from the raster brick using the ‘extract()’ function.
toolik_lon <- -149.5975
toolik_lat <- 68.6275
toolik_series <- extract(r_brick, SpatialPoints(cbind(toolik_lon,toolik_lat)), method='simple')
This timeseries is in a simple vector indexed only by the raster layer ID, so let’s put it in an easier-to-use dataframe form and then plot the timeseries.
toolik_df <- data.frame(year= seq(from=1982, to=2012, by=1), NDVI=t(toolik_series))
ggplot(data=toolik_df, aes(x=year, y=NDVI, group=1)) +
geom_line() + # make this a line plot
ggtitle("Growing season NDVI at Toolik Lake Station") + # Set title
theme_bw() # use the black and white theme
Wow! The Toolik Lake site really is getting greener.
The authors of this dataset identified long-term trends in NDVI over the 31 year period from 1982 to 2012. Their paper in Global Change Biology found some interesting patterns:
Guay, K. C., Beck, P. S. A., Berner, L. T., Goetz, S. J., Baccini, A. and Buermann, W. (2014), Vegetation productivity patterns at high northern latitudes: a multi-sensor satellite data assessment. Glob Change Biol, 20: 3147-3158. https://doi.org/10.1111/gcb.12647
Let’s look at the difference in NDVI between 1982 and 2012.
The ‘ndvi.slice’ array has the data from 1982. Let’s get the data from 2012, the 31st time slice.
ndvi.slice.2012 <- ndvi.array[, , 31]
Now take the difference between them.
ndvi.diff <- ndvi.slice.2012 - ndvi.slice
Save the difference map as a raster.
r_diff <- raster(t(ndvi.diff), xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
Re-orient the data for geotiff.
r_diff <- flip(r_diff, direction='y')
And plot.
plot(r_diff)
The areas that were greener in 2012 compared to 1982 are represented by positive numbers (green in this color scheme). Note that this is not a proper timeseries analysis (like the authors did in their paper), it is only showing the difference between two particular years.