In this tutorial we will open some geospatial data that is stored in a netCDF file. We will select the variable and time range of interest, and we will export the data to a GeoTIFF so that we can continue the analysis in R or other geospatial software.

An example NetCDF dataset

First, we need some data to play with. As an example, we will use some data on trends in vegetation greenness in the Arctic. This data is available from the ORNL DAAC at the following link: https://doi.org/10.3334/ORNLDAAC/1275

This dataset provides normalized difference vegetation index (NDVI) data for the Arctic growing season derived primarily with data from Advanced Very High Resolution Radiometer (AVHRR) sensors onboard several NOAA satellites over the years 1982 through 2012. The NDVI data, which show vegetation activity, were averaged annually for the arctic growing season (June, July and August) in each year. The data are circumpolar in coverage at 8-km resolution and limited to greater than 20 degrees N.

The data citation is: Guay, K.C., P.S.A. Beck, and S.J. Goetz. 2015. Long-Term Arctic Growing Season NDVI Trends from GIMMS 3g, 1982-2012. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/1275

Specifically, we will use the file “gimms3g_ndvi_1982-2012.nc4”" from this dataset. Go download it. You will need to log in using your NASA Earthdata login.

Remember to place the file into the same working directory where your R script is located.

Load the required R packages

Reading data stored in netCDF files is quite straightforward in R, provided that you load the appropriate packages. You may need to install these packages if you have not used them before.

library(ncdf4) # package for netcdf manipulation
library(raster) # package for raster manipulation
library(rgdal) # package for geospatial analysis
library(ggplot2) # package for plotting

Read in the netCDF file contents

Use nc_open to read the data into a data structure I called nc_data. Print the metadata about this file to a text file.

nc_data <- nc_open('gimms3g_ndvi_1982-2012.nc4')
# Save the print(nc) dump to a text file
{
    sink('gimms3g_ndvi_1982-2012_metadata.txt')
 print(nc_data)
    sink()
}

From this output we see that there are two variables: time_bnds, which contains the start and end date of each observation, and NDVI, which is the variable of interest. NDVI has three dimensions: [lon,lat,time].

There are four dimensions in the file: lat, lon, time, and “nv”, which is used to record the beginning and end of the time range. We can ignore “nv” and focus on the three dimensions that are used to organize the NDVI data.

There are 10 global attributes which provide metadata information about the file.

We need to capture these data in the lat, lon, and time dimensions. The following code reads the latitudes, longitudes, and time of each NDVI observation and saves them in memory.

lon <- ncvar_get(nc_data, "lon")
lat <- ncvar_get(nc_data, "lat", verbose = F)
t <- ncvar_get(nc_data, "time")

head(lon) # look at the first few entries in the longitude vector

## [1] -179.9583 -179.8750 -179.7917 -179.7083 -179.6250 -179.5417

Read in the data from the NDVI variable and verify the dimensions of the array. There should be 4320 lons, 840 lats, and 31 times

ndvi.array <- ncvar_get(nc_data, "NDVI") # store the data in a 3-dimensional array
dim(ndvi.array)

## [1] 4320  840   31

Other pertinent information about the NDVI variable: Lets’s see what fill value was used for missing data.

fillvalue <- ncatt_get(nc_data, "NDVI", "_FillValue")
fillvalue

## $hasatt
## [1] TRUE
## 
## $value
## [1] -9999

The fill value is -9999.

All done reading in the data. We can close the netCDF file.

nc_close(nc_data)

Working with the data

So, now we have the entire array of NDVI values for 4320 x 840 grid cells over each of 31 years in R. What can we do with it?

First, a little housekeeping. Let’s replace all those pesky fill values with the R-standard ‘NA’.

ndvi.array[ndvi.array == fillvalue$value] <- NA

Let’s get one year of the NDVI data and plot it.

Time is the third dimension of the “ndvi.array”. The first time slice represents the growing season of 1982.

ndvi.slice <- ndvi.array[, , 1]

Just to make sure everything is working correctly, we can take a look at the dimensions of this time slice. The dimensions should be 4320 longitudes by 840 latitudes.

dim(ndvi.slice)

## [1] 4320  840

Ok, everything checks out, so we can go ahead and save this data in a raster. Note that we provide the coordinate reference system “CRS” in the standard well-known text format. For this data set, it is the common WGS84 system.

r <- raster(t(ndvi.slice), xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))

We will need to transpose and flip to orient the data correctly. The best way to figure this out is through trial and error, but remember that most netCDF files record spatial data from the bottom left corner.

r <- flip(r, direction='y')

Plotting

Finally, we can plot the raster to take a look at the NDVI in 1982. Remember this data is cut off below 20 degrees North.

plot(r)

Saving to a GeoTIFF

If this looks good, then let’s save it to a GeoTIFF file.

writeRaster(r, "GIMMS3g_1982.tif", "GTiff", overwrite=TRUE)

Extract data at a study site

Maybe you want to get a timeseries of NDVI at a study location, such as the Toolik Lake Field Station in Alaska (Latitude: 68.6275, Longitude: -149.5975).

First, we will need to convert the entire 3d array of data to a raster brick. Note, this step may take several minutes.

r_brick <- brick(ndvi.array, xmn=min(lat), xmx=max(lat), ymn=min(lon), ymx=max(lon), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))

# note that you may have to play around with the transpose (the t() function) and flip() before the data are oriented correctly. In this example, the netcdf file recorded latitude on the X and longitude on the Y, so both a transpose and a flip in the y direction were required.
r_brick <- flip(t(r_brick), direction='y')

Extract timeseries of data at the Toolik Lake study location from the raster brick using the ‘extract()’ function.

toolik_lon <- -149.5975
toolik_lat <- 68.6275
toolik_series <- extract(r_brick, SpatialPoints(cbind(toolik_lon,toolik_lat)), method='simple')

This timeseries is in a simple vector indexed only by the raster layer ID, so let’s put it in an easier-to-use dataframe form and then plot the timeseries.

toolik_df <- data.frame(year= seq(from=1982, to=2012, by=1), NDVI=t(toolik_series))
ggplot(data=toolik_df, aes(x=year, y=NDVI, group=1)) +
  geom_line() + # make this a line plot
  ggtitle("Growing season NDVI at Toolik Lake Station") +     # Set title
  theme_bw() # use the black and white theme

Wow! The Toolik Lake site really is getting greener.

Difference in NDVI between two time periods

The authors of this dataset identified long-term trends in NDVI over the 31 year period from 1982 to 2012. Their paper in Global Change Biology found some interesting patterns:

Guay, K. C., Beck, P. S. A., Berner, L. T., Goetz, S. J., Baccini, A. and Buermann, W. (2014), Vegetation productivity patterns at high northern latitudes: a multi-sensor satellite data assessment. Glob Change Biol, 20: 3147-3158. https://doi.org/10.1111/gcb.12647

Let’s look at the difference in NDVI between 1982 and 2012.

The ‘ndvi.slice’ array has the data from 1982. Let’s get the data from 2012, the 31st time slice.

ndvi.slice.2012 <- ndvi.array[, , 31]

Now take the difference between them.

ndvi.diff <- ndvi.slice.2012 - ndvi.slice

Save the difference map as a raster.

r_diff <- raster(t(ndvi.diff), xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))

Re-orient the data for geotiff.

r_diff <- flip(r_diff, direction='y')

And plot.

plot(r_diff)

The areas that were greener in 2012 compared to 1982 are represented by positive numbers (green in this color scheme). Note that this is not a proper timeseries analysis (like the authors did in their paper), it is only showing the difference between two particular years.

How to open and work with NetCDF data in R

Alison Boyer, ORNL DAAC

7/21/2017