#introduction

In this tutorial, we will explore NASA’s global net primary productivity (NPP) data set.

I have downloaded the data for 2015 and saved them in a folder ‘data/npp’ as ‘csv’ files. Each file has 3,600 rows and 1,800 columns, containing the NPP for each grid cell (0.1 x 0.1 degrees resolution) for that month. The value ‘99999’ represents cells with no data (ocean, desert, etc).

We need to load the frequently used packages below. Other packages will be loaded on the fly with ‘::’

library(data.table) # Data wrangling
library(ggplot2) # Plotting
library(magrittr) # The pipe
library(RColorBrewer) # Color palettes
library(doFuture) # Parallel computation
library(magick) # Combine images into GIF
library(knitr)
library(metR)
library(rmarkdown)

Our workhorse for data analysis is the ‘data.table’ package. It is super fast, a necessary condition for working with large global data sets.

#visualizing the data

Exploring data for one month

First, we will read the data, convert it to long format and remove the non-data cells.

npp <- fread(
  #read the csv file
  'data/nasa-npp/npp-2015-01.csv',
  #It doesn't have a header, so we manually assign the longitude for each columnn
  col.names = as.character(seq(-179.95,179.95,0.1)))
# Assign latitude for each row
npp[, lat := seq(89.95,-89.95,-0.1)] 
#convert to long format
npp %<>% melt(
  id.vars = 'lat',
  variable.name = 'long',
  variable.factor = FALSE,
  value.name = 'npp')
  #Remove non-data cells
npp <- npp[npp != 99999]
  #we had to use character for column names, now we convert back to numeric
npp[, long := as.numeric(long)]

Let’s take a glance at the data set.

head(npp)

##      lat    long   npp
## 1: 67.75 -179.95 -0.03
## 2: 67.65 -179.95 -0.03
## 3: 67.55 -179.95 -0.03
## 4: 67.45 -179.95 -0.03
## 5: 67.35 -179.95 -0.03
## 6: 67.25 -179.95 -0.03

tail(npp)

##       lat   long  npp
## 1: -16.15 179.95 3.43
## 2: -16.45 179.95 3.72
## 3: -16.55 179.95 3.81
## 4: -16.85 179.95 4.14
## 5: -16.95 179.95 5.35
## 6: -17.05 179.95 3.75

nppRange <- range(npp$npp) #what does npp$npp mean inside range?
nppRange

## [1] -1.0  6.5

We will reproduce the NASA maps but with one change. I don’t like their color palettes; the areas with negative NPP do not stand out, and the break between brown and green is not at zero. So we’ll use a modified palette.

#our palette is based on the Brown to Blue-Green (BrBG) palette in Brewer
brn <- brewer.pal(3,'BrBG')[1]
mid <- brewer.pal(3, 'BrBG')[2]
grn <- brewer.pal(9, 'Greens')[9]

#plot
ggplot(npp) +
  geom_tile(aes(long, lat, fill=npp), width =.1, height=0.1, color = NA)+
  coord_quickmap(expand = FALSE, xlim=c(-180, 180), ylim = c(-90, 90))+
  scale_fill_gradientn(
    name = 'Net primary productivity [gC/m\u00b2/day]',
    colours = c(brn, mid, grn),
    values = scales::rescale(c(nppRange[1],0,nppRange[2])),
    breaks = c(-1,0,6.5),
    guide = guide_colorbar(
      title.position = 'top',
      title.hjust = 1,
      limits = nppRange,
      ticks = FALSE,
      barwidth = 15))+
  
  labs(title = 'January 2015') +
  cowplot::theme_map(font_size = 10)+
  theme(
    legend.position = 'bottom',
    legend.justification = 'left',
    panel.background = element_rect('grey'))

We can zoom into a particular region.

DT <- npp[long %between% c(-140, -50) & lat %between% c(10,60)]
ggplot(DT) +
  geom_tile(aes(long, lat, fill = npp)) +
  scale_fill_gradientn(
    name = 'Net primary productivity [gC/m\u00b2/day]',
    colours = c(brn, mid, grn),
    values = scales::rescale(c(min(npp$npp), 0, max(npp$npp)), c(0,1),
range(DT$npp)),
 guide = guide_colorbar(
  title.position= 'top',
  title.hjust=1,
  limits = nppRange,
  barwidth = 15)) +
  coord_quickmap(expand = FALSE) +
  labs(title = 'January 2015') +
  cowplot::theme_map(font_size = 15)+
  theme(
    legend.position = 'bottom',
    legend.justification = 'right',
    panel.background = element_rect('grey'))

Animating over time

Now we read all 12 months and pack them into a long data set.

months <- 1:12
npp2015 <- 
      lapply(months, function(m) { #loop over the months
        mchar <- if (m<10) paste0('0', m) else m
        file <- paste0('data/nasa-npp/npp-2015-', mchar, '.csv')
        DT <- fread(file, col.names = as.character(seq(-179.95, 179.95, 0.1)))
        DT[, lat := seq(89.95, -89.95, -0.1)]
        DT %<>% melt(
          id.vars = 'lat',
          variable.name = 'long',
          variable.factor = FALSE,
          value.name = 'npp')
        DT[npp != 99999]
      
    }) %>% 
      rbindlist(idcol = 'month') # Merge into one long data.table
npp2015[, long := as.numeric(long)]

head(npp2015, 20)

##     month   lat    long   npp
##  1:     1 67.75 -179.95 -0.03
##  2:     1 67.65 -179.95 -0.03
##  3:     1 67.55 -179.95 -0.03
##  4:     1 67.45 -179.95 -0.03
##  5:     1 67.35 -179.95 -0.03
##  6:     1 67.25 -179.95 -0.03
##  7:     1 67.15 -179.95 -0.03
##  8:     1 67.05 -179.95 -0.03
##  9:     1 66.95 -179.95 -0.03
## 10:     1 66.85 -179.95 -0.03
## 11:     1 66.75 -179.95 -0.03
## 12:     1 66.65 -179.95 -0.03
## 13:     1 66.55 -179.95 -0.03
## 14:     1 66.45 -179.95 -0.03
## 15:     1 66.35 -179.95 -0.03
## 16:     1 66.25 -179.95 -0.03
## 17:     1 66.15 -179.95 -0.03
## 18:     1 66.05 -179.95 -0.03
## 19:     1 65.95 -179.95 -0.03
## 20:     1 65.85 -179.95 -0.03

tail(npp2015, 20)

##     month    lat   long   npp
##  1:    12  62.85 179.85 -0.03
##  2:    12  62.75 179.85 -0.03
##  3:    12  62.65 179.85 -0.03
##  4:    12  52.05 179.85  0.00
##  5:    12  51.95 179.85  0.00
##  6:    12 -16.15 179.85  2.72
##  7:    12 -16.25 179.85  3.22
##  8:    12 -16.45 179.85  3.75
##  9:    12 -16.55 179.85  4.64
## 10:    12 -16.65 179.85  3.93
## 11:    12 -16.75 179.85  4.61
## 12:    12 -16.95 179.85  3.13
## 13:    12 -18.55 179.85  3.16
## 14:    12 -18.65 179.85  2.93
## 15:    12 -16.15 179.95  2.96
## 16:    12 -16.45 179.95  2.87
## 17:    12 -16.55 179.95  3.22
## 18:    12 -16.85 179.95  3.75
## 19:    12 -16.95 179.95  4.46
## 20:    12 -17.05 179.95  3.31

Next, we create an animation. For small data sets, we would normally use the gganimate package, but our data set is too big. We will have to generate the frames individually (one frame per month), and use another tool to merge them into a GIF or a video.

Rendering the images takes a long time, so we’ll run them in parallel to speed things up. First, we set up the parallel backend.

registerDoFuture()
plan(multiprocess)

Now, we can generate the images.

months <- 1:12
nppRange <- range(npp$npp)
# options(future.globals.maxSize = Inf)
pl <- foreach(m = months) %dopar% {
  p <- ggplot(npp2015[month == m]) +
    geom_tile(aes(long, lat, fill = npp), width = 0.1, height = 0.1) +
    scale_fill_gradientn(
      name = 'Net primary productivity [gC/m\u00b2/day]',
      colours = c(brn, mid, grn),
      values = scales::rescale(c(nppRange[1], 0, nppRange[2])),
      breaks = c(-1,0,6.5),
      guide = guide_colorbar(
        title.position = 'top',
        title.hjust = 1,
        limits = nppRange,
        ticks = FALSE,
        barwidth = 15)) +
    coord_quickmap(expand = FALSE, xlim = c(-180,180), ylim = c(-90,90)) +
    labs(title = paste(month.name[m], 2015)) +
    cowplot::theme_map(font_size = 20) +
    theme(
      legend.position = 'bottom',
      legend.justification = 'right',
      panel.background = element_rect('black'))
  mchar <- if (m <= 9) paste0('0', m) else m
  ggsave(
    paste0('figures/2015-', mchar, '.png'),
    p,
    width = 8, height = 5, unit = 'in'
  )
}

Once the images are rendered, we can merge them to a GIF file.

list.files('figures', '*.png', full.names = TRUE) %>%  # get a list of image files
  image_read() %>%                    # reads each image file
  image_join() %>%                    # joins image
  image_animate(fps = 2) %>%          # animates, can opt for number of loops
  image_write('figures/npp-2015.gif') # combine

Relationship to temperature

We will use CRU TS v4.04. Specifically, we’ll use the global monthly land surface temperature data, which are provided in a netCDF file. This format is a little trickier to read than CSV file. Before reading, we need to look at the file structure.

metR::GlanceNetCDF('data/cru-temp/cru_ts4.04.2011.2019.tmp.dat.nc')

## ----- Variables ----- 
## tmp:
##     near-surface temperature in degrees Celsius
##     Dimensions: lon by lat by time
## stn:
##     stn
##     Dimensions: lon by lat by time
## 
## 
## ----- Dimensions ----- 
##   lon: 720 values from -179.75 to 179.75 degrees_east
##   lat: 360 values from -89.75 to 89.75 degrees_north
##   time: 108 values from 2011-01-16 to 2019-12-16

So temperature is stored in the variable named tmp, indexed by lon, lat, and time. We can read the file now.

cruT <- metR::ReadNetCDF(
file = 'data/cru-temp/cru_ts4.04.2011.2019.tmp.dat.nc',
vars = 'tmp',
subset = list(time = c('2015-01-01', '2015-12-31')) # Read data for 2015 only
)

Let’s look at the data structure that we just read.

head(cruT[!is.na(tmp)], 10)

##           time    lat    lon      tmp
##  1: 2015-01-16 -55.75 -68.25 7.700000
##  2: 2015-01-16 -55.75 -67.75 7.800000
##  3: 2015-01-16 -55.75 -67.25 7.800000
##  4: 2015-01-16 -55.25 -70.75 7.100000
##  5: 2015-01-16 -55.25 -70.25 7.300000
##  6: 2015-01-16 -55.25 -69.75 7.200000
##  7: 2015-01-16 -55.25 -69.25 7.400000
##  8: 2015-01-16 -55.25 -68.75 7.100000
##  9: 2015-01-16 -55.25 -68.25 7.800000
## 10: 2015-01-16 -55.25 -67.75 8.400001

tail(cruT[!is.na(tmp)], 10)

##           time   lat    lon   tmp
##  1: 2015-12-16 83.75 -33.25 -32.9
##  2: 2015-12-16 83.75 -32.75 -32.6
##  3: 2015-12-16 83.75 -32.25 -32.1
##  4: 2015-12-16 83.75 -31.75 -31.9
##  5: 2015-12-16 83.75 -31.25 -32.2
##  6: 2015-12-16 83.75 -30.75 -31.0
##  7: 2015-12-16 83.75 -30.25 -30.9
##  8: 2015-12-16 83.75 -29.75 -30.7
##  9: 2015-12-16 83.75 -29.25 -30.2
## 10: 2015-12-16 83.75 -28.75 -29.8

For this example, I will do a very simple comparison between hemispheric average NPP and hemispheric average temperature for the year 2015. Let’s calculate the hemispheric average temperature.

cruTmean <- cruT[, .(tmp = mean(tmp, na.rm = TRUE)),
                 by = .(hemi = fifelse(lat < 0,
                                       'Southern Hemisphere',
                                       'Northern Hemisphere'),
                        month = month(time))]

cruTmean[order(hemi, month)]

##                    hemi month          tmp
##  1: Northern Hemisphere     1 -6.319620655
##  2: Northern Hemisphere     2 -4.973303200
##  3: Northern Hemisphere     3 -0.386430372
##  4: Northern Hemisphere     4  5.550515588
##  5: Northern Hemisphere     5 11.754243604
##  6: Northern Hemisphere     6 16.874073399
##  7: Northern Hemisphere     7 18.839402440
##  8: Northern Hemisphere     8 17.506045641
##  9: Northern Hemisphere     9 13.449340337
## 10: Northern Hemisphere    10  7.171283770
## 11: Northern Hemisphere    11  0.004829245
## 12: Northern Hemisphere    12 -4.192614612
## 13: Southern Hemisphere     1 24.748854426
## 14: Southern Hemisphere     2 24.767316530
## 15: Southern Hemisphere     3 23.885845824
## 16: Southern Hemisphere     4 22.014871575
## 17: Southern Hemisphere     5 20.098193684
## 18: Southern Hemisphere     6 18.571855472
## 19: Southern Hemisphere     7 17.939328406
## 20: Southern Hemisphere     8 19.476424354
## 21: Southern Hemisphere     9 21.103934640
## 22: Southern Hemisphere    10 23.017357567
## 23: Southern Hemisphere    11 24.032967890
## 24: Southern Hemisphere    12 24.900679724
##                    hemi month          tmp

Similarly, we can calculate the hemispheric average NPP.

npp2015mean <- npp2015[, .(npp = mean(npp)), 
                       by = .(month, 
                              hemi = fifelse(lat < 0, 
                                             'Southern Hemisphere', 
                                             'Northern Hemisphere'))]

npp2015mean[order(hemi, month)]

##     month                hemi       npp
##  1:     1 Northern Hemisphere 0.5124864
##  2:     2 Northern Hemisphere 0.4561966
##  3:     3 Northern Hemisphere 0.5630542
##  4:     4 Northern Hemisphere 0.8864052
##  5:     5 Northern Hemisphere 1.6491595
##  6:     6 Northern Hemisphere 2.6311189
##  7:     7 Northern Hemisphere 2.5203965
##  8:     8 Northern Hemisphere 1.9975864
##  9:     9 Northern Hemisphere 1.2374127
## 10:    10 Northern Hemisphere 0.7111843
## 11:    11 Northern Hemisphere 0.5379686
## 12:    12 Northern Hemisphere 0.5421211
## 13:     1 Southern Hemisphere 2.1594275
## 14:     2 Southern Hemisphere 2.1474459
## 15:     3 Southern Hemisphere 2.2599240
## 16:     4 Southern Hemisphere 2.2561804
## 17:     5 Southern Hemisphere 2.2686584
## 18:     6 Southern Hemisphere 2.2204884
## 19:     7 Southern Hemisphere 2.1670531
## 20:     8 Southern Hemisphere 1.9231178
## 21:     9 Southern Hemisphere 1.7138871
## 22:    10 Southern Hemisphere 1.5433307
## 23:    11 Southern Hemisphere 1.7929973
## 24:    12 Southern Hemisphere 1.6904601
##     month                hemi       npp

ggplot() +
  geom_line(aes(month, npp, colour = 'NPP'), npp2015mean, size = 1) +
  geom_line(aes(month, tmp / 10, colour = 'T'), cruTmean, size = 1) +
  scale_x_continuous(
    name = NULL,
    expand = c(0, 0),
    breaks = 1:12,
    labels = month.abb) +
  scale_y_continuous(
    name = 'NPP [gC/m\u00b2/day]',
    limits = c(-1,3),
    expand = c(0,0.5),
    sec.axis = sec_axis(
      name = 'T [\u00b0C]',
      trans = ~. * 10)) +
  scale_colour_manual(values = c('steelblue', 'darkorange')) +
  facet_wrap(vars(hemi)) +
  theme(
    legend.position = 'none',
    panel.background = element_blank(),
    panel.spacing.x = unit(2, 'lines'),
    panel.grid = element_line('gray90'),
    panel.grid.minor.x = element_blank(),
    panel.border = element_rect(NA, 'gray90', 0.1),
    strip.background = element_blank(),
    strip.text = element_text(face = 'bold', size = 11),
    axis.line.x = element_line('black'),
    axis.text.y.left = element_text(colour = 'steelblue'),
    axis.line.y.left = element_line('steelblue'),
    axis.ticks.y.left = element_line('steelblue'),
    axis.title.y.left = element_text(
      colour = 'steelblue',
    margin = margin(r = 1, unit = 'lines')),
    axis.line.y.right = element_line('darkorange'),
    axis.ticks.y.right = element_line('darkorange'),
    axis.text.y.right = element_text(colour = 'darkorange'),
    axis.title.y.right = element_text(
      colour = 'darkorange', 
      margin = margin(l = 1, unit = 'lines')),
      
    )

# Discussion

The hemispheric analysis shows us some very general patterns, but already we see some differences between the hemispheres. Here are some suggestions for you to explore.

Why is the Southern Hemisphere different? Regarding NPP, my guess is this comes down to the dispersal of tree-bearing land across the southern and northern hemisphere. 68% of the total land mass on the planet is in the Northern hemisphere including all of North America, Europe, the vast majority of Asia, and 2/3rds of Africa. Only 1/3rd of Africa, most of South America, Australia, and Antarctica lie below the Equator. Of the 32% of land that lives beneath the equator, 1/3rd of it is in Antarctica where no trees grow. In addition, most of the tree-bearing land in the Southern Hemisphere lies near the Equator, (Brazil & Southern Africa, which have warm weather year round. This means that trees are engaging in photosynthesis at a near constant rate year round, creating a stable and positive average NPP. Conversely, in the Northern hemisphere the majority of the tree-bearing land is further away the Equator and experiences seasonal variation in photosynthesis rates.
What else controls NPP? Can you do a similar analysis with e.g. precipitation? Yes, see below.

Relationship to precipitation

metR::GlanceNetCDF('data/cru-pre/cru_ts4.04.2011.2019.pre.dat.nc')

## ----- Variables ----- 
## pre:
##     precipitation in mm/month
##     Dimensions: lon by lat by time
## stn:
##     stn
##     Dimensions: lon by lat by time
## 
## 
## ----- Dimensions ----- 
##   lon: 720 values from -179.75 to 179.75 degrees_east
##   lat: 360 values from -89.75 to 89.75 degrees_north
##   time: 108 values from 2011-01-16 to 2019-12-16

So precipitation is stored in the variable named pre, indexed by lon, lat, and time. We can read the file now.

cruP <- metR::ReadNetCDF(
file = 'data/cru-pre/cru_ts4.04.2011.2019.pre.dat.nc',
vars = 'pre',
subset = list(time = c('2015-01-01', '2015-12-31')) # Read data for 2015 only
)

Let’s look at the data structure that we just read.

head(cruP[!is.na(pre)], 20)

##           time    lat    lon  pre
##  1: 2015-01-16 -55.75 -68.25 44.1
##  2: 2015-01-16 -55.75 -67.75 41.0
##  3: 2015-01-16 -55.75 -67.25 41.8
##  4: 2015-01-16 -55.25 -70.75 62.7
##  5: 2015-01-16 -55.25 -70.25 53.2
##  6: 2015-01-16 -55.25 -69.75 37.6
##  7: 2015-01-16 -55.25 -69.25 37.6
##  8: 2015-01-16 -55.25 -68.75 35.1
##  9: 2015-01-16 -55.25 -68.25 33.3
## 10: 2015-01-16 -55.25 -67.75 32.9
## 11: 2015-01-16 -55.25 -67.25 34.6
## 12: 2015-01-16 -55.25 -66.75 35.3
## 13: 2015-01-16 -54.75 -71.75 86.3
## 14: 2015-01-16 -54.75 -71.25 37.1
## 15: 2015-01-16 -54.75 -70.75 35.1
## 16: 2015-01-16 -54.75 -70.25 33.8
## 17: 2015-01-16 -54.75 -69.75 31.9
## 18: 2015-01-16 -54.75 -69.25 29.6
## 19: 2015-01-16 -54.75 -68.75 27.5
## 20: 2015-01-16 -54.75 -68.25 24.6

tail(cruP[!is.na(pre)], 20
     )

##           time   lat    lon pre
##  1: 2015-12-16 83.25 -24.75   0
##  2: 2015-12-16 83.75 -37.75   0
##  3: 2015-12-16 83.75 -37.25   0
##  4: 2015-12-16 83.75 -36.75   0
##  5: 2015-12-16 83.75 -36.25   0
##  6: 2015-12-16 83.75 -35.75   0
##  7: 2015-12-16 83.75 -35.25   0
##  8: 2015-12-16 83.75 -34.75   0
##  9: 2015-12-16 83.75 -34.25   0
## 10: 2015-12-16 83.75 -33.75   0
## 11: 2015-12-16 83.75 -33.25   0
## 12: 2015-12-16 83.75 -32.75   0
## 13: 2015-12-16 83.75 -32.25   0
## 14: 2015-12-16 83.75 -31.75   0
## 15: 2015-12-16 83.75 -31.25   0
## 16: 2015-12-16 83.75 -30.75   0
## 17: 2015-12-16 83.75 -30.25   0
## 18: 2015-12-16 83.75 -29.75   0
## 19: 2015-12-16 83.75 -29.25   0
## 20: 2015-12-16 83.75 -28.75   0

cruP[pre == 0]

##              time    lat    lon pre
##     1: 2015-01-16 -39.25 -73.25   0
##     2: 2015-01-16 -39.25 -72.75   0
##     3: 2015-01-16 -39.25 -72.25   0
##     4: 2015-01-16 -39.25 -71.75   0
##     5: 2015-01-16 -39.25 -71.25   0
##    ---                             
## 42353: 2015-12-16  83.75 -30.75   0
## 42354: 2015-12-16  83.75 -30.25   0
## 42355: 2015-12-16  83.75 -29.75   0
## 42356: 2015-12-16  83.75 -29.25   0
## 42357: 2015-12-16  83.75 -28.75   0

cruPmean <- cruP[, .(pre = mean(pre, na.rm = TRUE)), 
                 by = .(hemi = fifelse(lat < 0, 
                                       'Southern Hemisphere', 
                                       'Northern Hemisphere'),
                        month = month(time))]

cruPmean[order(hemi, month)]

##                    hemi month       pre
##  1: Northern Hemisphere     1  33.88864
##  2: Northern Hemisphere     2  29.49807
##  3: Northern Hemisphere     3  35.44227
##  4: Northern Hemisphere     4  41.09119
##  5: Northern Hemisphere     5  51.27454
##  6: Northern Hemisphere     6  67.58117
##  7: Northern Hemisphere     7  76.40879
##  8: Northern Hemisphere     8  79.67092
##  9: Northern Hemisphere     9  64.63240
## 10: Northern Hemisphere    10  54.40435
## 11: Northern Hemisphere    11  46.95746
## 12: Northern Hemisphere    12  38.15645
## 13: Southern Hemisphere     1 148.39037
## 14: Southern Hemisphere     2 135.20839
## 15: Southern Hemisphere     3 141.78250
## 16: Southern Hemisphere     4 111.37134
## 17: Southern Hemisphere     5  82.15246
## 18: Southern Hemisphere     6  53.03238
## 19: Southern Hemisphere     7  49.14856
## 20: Southern Hemisphere     8  45.00326
## 21: Southern Hemisphere     9  44.04685
## 22: Southern Hemisphere    10  65.50596
## 23: Southern Hemisphere    11  98.10447
## 24: Southern Hemisphere    12 124.60706
##                    hemi month       pre

Similarly, we can calculate the hemispheric average NPP.

npp2015mean <- npp2015[, .(npp = mean(npp)), 
                       by = .(month, 
                              hemi = fifelse(lat < 0, 
                                             'Southern Hemisphere', 
                                             'Northern Hemisphere'))]

npp2015mean[order(hemi, month)]

##     month                hemi       npp
##  1:     1 Northern Hemisphere 0.5124864
##  2:     2 Northern Hemisphere 0.4561966
##  3:     3 Northern Hemisphere 0.5630542
##  4:     4 Northern Hemisphere 0.8864052
##  5:     5 Northern Hemisphere 1.6491595
##  6:     6 Northern Hemisphere 2.6311189
##  7:     7 Northern Hemisphere 2.5203965
##  8:     8 Northern Hemisphere 1.9975864
##  9:     9 Northern Hemisphere 1.2374127
## 10:    10 Northern Hemisphere 0.7111843
## 11:    11 Northern Hemisphere 0.5379686
## 12:    12 Northern Hemisphere 0.5421211
## 13:     1 Southern Hemisphere 2.1594275
## 14:     2 Southern Hemisphere 2.1474459
## 15:     3 Southern Hemisphere 2.2599240
## 16:     4 Southern Hemisphere 2.2561804
## 17:     5 Southern Hemisphere 2.2686584
## 18:     6 Southern Hemisphere 2.2204884
## 19:     7 Southern Hemisphere 2.1670531
## 20:     8 Southern Hemisphere 1.9231178
## 21:     9 Southern Hemisphere 1.7138871
## 22:    10 Southern Hemisphere 1.5433307
## 23:    11 Southern Hemisphere 1.7929973
## 24:    12 Southern Hemisphere 1.6904601
##     month                hemi       npp

ggplot() +
  geom_line(aes(month, npp, colour = 'NPP'), npp2015mean, size = 1) +
  geom_line(aes(month, pre / 100 + 1, colour = 'P'), cruPmean, size = 1) +
  scale_x_continuous(
    name = NULL,
    expand = c(0, 0),
    breaks = 1:12,
    labels = month.abb) +
  scale_y_continuous(
    name = 'NPP [gC/m\u00b2/day]',
    limits = c(-1, 3),
    expand = c(0, 0.5),
    sec.axis = sec_axis(
      name = 'Pre mm/month',
      trans = ~(. -1) * 100)) +
  scale_colour_manual(values = c('steelblue', 'darkorange')) +
  facet_wrap(vars(hemi)) +
  theme(
    legend.position = 'none',
    panel.background = element_blank(),
    panel.spacing.x = unit(2, 'lines'),
    panel.grid = element_line('gray90'),
    panel.grid.minor.x = element_blank(),
    panel.border = element_rect(NA, 'gray90', 0.1),
    strip.background = element_blank(),
    strip.text = element_text(face = 'bold', size = 11),
    axis.line.x = element_line('black'),
    axis.text.y.left = element_text(colour = 'steelblue'),
    axis.line.y.left = element_line('steelblue'),
    axis.ticks.y.left = element_line('steelblue'),
    axis.title.y.left = element_text(
      colour = 'steelblue', 
      margin = margin(r = 1, unit = 'lines')),
    axis.line.y.right = element_line('darkorange'),
    axis.ticks.y.right = element_line('darkorange'),
    axis.text.y.right = element_text(colour = 'darkorange'),
    axis.title.y.right = element_text(
      colour = 'darkorange', 
      margin = margin(l = 1, unit = 'lines')),
  )

I am only showing one year of data. What if we expand the analysis to many more years? Did separately. Had to remove in order to incorporate precipitation data. Later I learned that I could have put in a filter :(. Next time I’ll know :)
What about influences of climate phenomena such as ENSO? Ideally, you would normalize for ENSO, but I imagine this is very difficult to separate ENSO forcings from man-made forcings from random forcings.
If we break down the analysis to the regional level, what would you expect? The more you zoom in on one area, the greater variation we should expect between that area and other areas nearby, and over time. For example, California has gone through a series of droughts in the South, but now also in the North. But the North America overall is probably near average rainfall.
What do you think about the areas of negative production?

Perhaps this has to do with land-use? In Brazil and Africa, the negative areas lie near large jungles / forests. Could it be that on the outskirts the land has been converted to pastureland/savannah or some other carbon producing use? The US and Australia have negative NPP periods as well which perhaps correspond to drought and or wildfire activity. Finally, India has a significant negative NPP during the summer, drought?

This tutorial only shows some very basic ideas to get you started. I hope it inspired you. Feel free to explore the suggested questions above, or follow your own curiosity.

Net Primary Productivity

Tutorial for SUSC95020

Benjamin Hogan mirroring Hung Nguyen

Exploring data for one month

Animating over time

Relationship to temperature

Relationship to precipitation