Introduction

Grass, forbs, and Shrubs. Rangelands cover more than 54% of the world’s surface, performing key roles in carbon storage, and providing 70% of the world’s livestock feed source. (Brown & Thorpe, 2008). Rangelands are among the most threatened and forsaken ecosystems in the USA, with less than 10% of climate goals stirring towards their protection and over 2.1 million acres lost just in 2019 (ILRI, 2021). Among rangelands, the American Southwest has unique challenges of augmenting wildfire occurrence and increasing aridity due to climate change (Seager et al., 2007). Efficacy in the management, among other conservation tactics should be used to preserve the desert rangelands of the southwest USA. Among these efforts is Grass-Cast. This tool can predict rangelands’ biomass using 30 years of satellite data to prevent overgrazing and plan actions for subsequent seasons. Assessment of the accuracy of the prediction is focused on the Great plains in the USA (Hartman et al., 2020), but further information for the efficacy in the southwest is still needed.

The following project uses data available in Grass-Cast Explorer. I will compare weekly data of predicted ANPP (above-ground net primary productivity) and NDVI (Normalized difference vegetation index) from each forecast during the growing season and compare it to the observable ANPP. Then, we will choose an appropriate statistical model to assess the predictability of the tool, with satellite data and on-ground data. We hope that this project will support the team’s work behind Grass Cast by giving them a fresh perspective on the tool forecast capabilities for the southwest. We will also hope that ranchers and government agencies find in Grass Cast a mature tool that will help them plan the use and conservation of rangelands across the southwest.

Forecast

The forecast analysis of Grass-Cast is based on the paper by Hartman et al, and is strongly correlated with the the weather forecast and history of NOAA(National Oceanic and Atmospheric Administration), mostly the precipitation (Figure 1). This input is add in to the DayCent model, and ecosystem-level biochemical model that simulates various ecological process, the main value extracted from this model is evapotranspiration (AET). Later, the model takes information of NDVI (Normalized Difference Vegetation Index) from the MODIS sensor in the Terra satellite, NDVI measures the greenness or the the state of the vegetation, where higher levels of NDVI means healthier vegetation. Finally, three ANPP (Above ground net primary production) scenarios are computed (average, above average and bellow average) measuring the biomass in pounds per acre.

Figure1

Data

The forecast data is from the Grass-Cast platform can be accessed online (National Drought Mitigation Center, 2022).

Weather data is from NOAA climate data online platform, taking precipitation data from New Mexico and Arizona from November 2019 to December 2021 in millimeters (mm) (NOAA, 2022).

Methodology

For the southwest, there are two growing seasons Spring (April-May) and Summer(August-September), the spring is mostly correlated with the winter rains (November-March) and the summer is correlated with the monsoon season (June-September), (Smith et al., 2019).

In this experiment we are trying to understand how well the forecast correlates to the final observation of Grass Cast and also see the effects of rainfall in the final forecast. To accomplish this objective I will take coefficient of determination (R2) between the percent difference in ANPP from each forecast during the growing season compared to the percent difference in ANPP for the final forecast. The result will be a line graft of a correlation from 8 forecast of 2020 and the 12 forecast of 2021 in the three different scenarios to the final observation of the year.

Beside this, I will interpolate rainfall data from NOAA from each growing season to explain the possible reasons for the data to change correlations during time.

Results

Correlation

All year

#upload the data
GC <- read.csv(file = "data/az_nm_2000_2020.csv", head = TRUE, sep=",")
gfg_data <- read.csv(file = "data/grass_cast_20-21.csv", head = TRUE, sep=",")

#Filters Year and select columns for ANPP and slips the forecast in lists
ANPP_FORECAST <- subset(gfg_data, select=c("gridID","Year","Forecast","NPP_predict_below",
                                           "NPP_predict_avg","NPP_predict_above","Order")) 

ANPP_FORECAST_2020 <- subset(ANPP_FORECAST, Year == '2020')
ANPP_FORECAST_2020 <- split(ANPP_FORECAST_2020,ANPP_FORECAST_2020$Forecast)

ANPP_FORECAST_2021 <- subset(ANPP_FORECAST, Year == '2021') 
ANPP_FORECAST_2021 <- split(ANPP_FORECAST_2021,ANPP_FORECAST_2021$Forecast)

#this functions takes the latest data and compares to the every other forecast date and see how well is the forecast predicting 
fc <- function(n, x, y) {
  
  a <- cor(y[[n]]$NPP_predict_above, y[[x]]$NPP_predict_above)
  b <- cor(y[[n]]$NPP_predict_avg, y[[x]]$NPP_predict_avg)
  c <- cor(y[[n]]$NPP_predict_below, y[[x]]$NPP_predict_below)

  ANPP.r <- matrix(c(a,b,c),ncol = 3,byrow = TRUE)
  colnames(ANPP.r) <- c('Abv','Avg','Blw')
  ANPP.r <- as.data.frame(ANPP.r) 
  #changes row names to date
  row.names(ANPP.r) <- as.Date(names(y[x]), format = "%m/%d/%Y")  
  
  return(ANPP.r)
}


#fucntion to correlate the data
CorrDF <- function(a,b,data){

#Creates list for loop and loops
##For loop to have all the rsquer in the list
r.list <- list()
for(i in a:b) {
  r.list[[i]] <- fc(b, i, data)
  
}
#Changes from list to DF
Corr <- bind_rows(r.list)

#add column with the row names
Corr$Forecast <- row.names(Corr)

#pivot longer and change column names
Corr <- Corr %>% pivot_longer(cols = c(Abv,Avg,Blw), names_to = "Time",values_to = "R_squre")

#change colnames 
colnames(Corr) <- c('Date','Level','Rsqure') 

#Format dates
Corr$Date <- as.Date(Corr$Date, format="%Y-%m-%d")

return(Corr)

}

All_cor_2020s <- CorrDF(1,8,ANPP_FORECAST_2020)
All_cor_2021s <- CorrDF(1,10,ANPP_FORECAST_2021)

#function to greate graphs 
ggpl <- function(x){
  
    a <- ggplot(data = x, aes(x=Date)) +
    
      geom_line(aes(y=Rsqure, group=Level, colour =Level), size=0.3) +
    
      geom_point(aes(y=Rsqure, group=Level, colour =Level), size=2) +
      
      theme_bw() +
    
      scale_x_date(date_breaks = "2 week", date_labels ="%b/%d") +
    
      scale_color_manual(values = c("33339","#CC6600","firebrick")) +
      
      labs (x = "Date of Forecast", y= "R squer of linear model") +
      
      theme(
      axis.text.x = element_text(angle = 0, vjust =0, hjust=0),
      axis.title.x =element_blank(),
      axis.title.y =element_text(face = "italic",size = 8),)
    
    return(a)
}

#Creates the line graphs and ads annotations
Cor_2020 <- ggpl(All_cor_2020s) +
  ylim(0,1) +
  labs (x = "Date of Forecast", y= "R squered of linear model", 
    title = "Dryest in history: 2020 forecast correlation")+
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Cor_2021 <- ggpl(All_cor_2021s) +
  ylim(0,1) +
  labs (x = "Date of Forecast", y= "R squered of linear model", 
    title = "The big moonson: 2021 forecast correlations")+
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

#### Seasonal

##Picks the dates to make the correlation 
All_cor_2020s_Spring <- CorrDF(1,4,ANPP_FORECAST_2020)
All_cor_2021s_Spring <- CorrDF(1,4,ANPP_FORECAST_2021)
All_cor_2020s_Summer <- CorrDF(5,8,ANPP_FORECAST_2020)
All_cor_2021s_Summer <- CorrDF(5,10,ANPP_FORECAST_2021)


#Creates the line graph
Cor_2020_Spring <- ggpl(All_cor_2020s_Spring) +
  ylim(0.2,1) +
  labs (x = "", y= "R squered", 
    title = "Spring 2020")+
  theme_minimal() +
  theme(axis.text.x = element_text(vjust = 0.5, hjust=1), legend.position="none") 

Cor_2021_Spring <- ggpl(All_cor_2021s_Spring) + 
  ylim(0.6,1) +
  labs (x = "", y= "R squered", 
    title = "Spring 2021")+
  theme_minimal() +
  theme(axis.text.x = element_text(vjust = 0.5, hjust=1),legend.position="none")


Cor_2020_Summer <- ggpl(All_cor_2020s_Summer) +
  ylim(0.2,1) +
  labs (x = "", y= "", 
    title = "Summer 2020")+
  theme_minimal() +
  theme(axis.text.x = element_text(vjust = 0.5, hjust=1))

Cor_2021_Summer <- ggpl(All_cor_2021s_Summer) +
  ylim(0.6,1) +
  labs (x = "", y= "", 
    title = "Summer 2021")+
  theme_minimal() +
  theme(axis.text.x = element_text(vjust = 0.5, hjust=1))
ggarrange(Cor_2020_Spring, Cor_2020_Summer, Cor_2021_Spring, Cor_2021_Summer,
                                 ncol =2, nrow = 2)

Forecast Maps

## Call features from state and county
states <- map_data("state")
county <- map_data("county")

#Creates Data frames and filters states 
state_df <- states %>%  filter(str_detect(region, c("arizona","new mexico")))
county_df <- county %>%  filter(str_detect(region, c("arizona","new mexico")))

#changes state maps to simple forms to make it eay
states_sf <- sf::st_as_sf(maps::map("state", plot = FALSE, fill = TRUE))
county_sf <- sf::st_as_sf(maps::map("county", plot = FALSE, fill = TRUE)) 

#select only AZ and NM Partial matching in the row ID
south_west_s1 <- states_sf %>% filter(str_detect(ID, c("arizona")))
south_west_s2 <- states_sf %>% filter(str_detect(ID, c("new mexico")))
south_west_s <- rbind(south_west_s1, south_west_s2)

#couties for each state
south_west_nm <- county_sf %>% filter(str_detect(ID,"new mexico"))
south_west_az <- county_sf %>% filter(str_detect(ID,"arizona"))
south_west_c <- rbind(south_west_nm, south_west_az)

######
###Create Centroids of counties to use for labels

coordinates(county_df) <- c("long", "lat")

# Get centroids
ctrs <- lapply(unique(county_df$group), 
               function(x) gCentroid(SpatialPoints(county_df[county_df$group==x,])))
ctrsout <- setNames( ctrs , unique(county_df$group ) )

##Create data frame 
centroid_df <- do.call(rbind, lapply(ctrsout, data.frame, stringsAsFactors=FALSE))

#finds and  adds unique values
uniq_logical <- !duplicated(county_df$group)
centroid_df$county <- county_df$subregion[uniq_logical]

#change names of collums
names(centroid_df) <- c("lon", "lat", 'county')
centroid_sf <-  st_as_sf(centroid_df, coords = c("lon", "lat")) %>% 
  st_set_crs(st_crs(south_west_s))

#########
#Prep data for maps

#We use 2020 year to get the lat and long
ANPP_GEOM <- subset(GC, year == '2020', select=c("gridID","lon","lat"))

#Filters Year and select columns for ANPP and drops NA values
ANPP_ALL <- subset(gfg_data, select=c("gridID", "Forecast","NPP_predict_below","NPP_predict_avg","NPP_predict_above","meanANPPgrid")) %>% 
  drop_na()

#Slips the forecast file in to lists
ANPP_ALLFORECAST <- split(ANPP_ALL, ANPP_ALL$Forecast)

##Choose Date for Forecast to draw 
#in every line drops NA values, transform df to sf object and changes the coordinates system to mach others

Geoms <- function(x){
  ANPP_date <- merge(ANPP_GEOM, ANPP_ALLFORECAST[[x]], by = "gridID", all = TRUE) %>% drop_na() %>% st_as_sf(coords = c("lon", "lat")) %>% st_set_crs(st_crs(south_west_s))
  
  return(ANPP_date)
}


ANPP_5152020 <- Geoms(2) #First forecast from 2020 
ANPP_9012020 <- Geoms(18) #Last Observation from 2020
ANPP_4142021 <- Geoms(1) #First forecast from 2021 
ANPP_8242021 <- Geoms(15) #Last Observation from 2021 


######
#DraMaps
ANNP_maps <- function(x,y,z){
  
 a <-  ggplot() +
  geom_sf() +
  geom_sf(data = x, mapping = aes(col=y)) +
  geom_sf(data = south_west_s, fill = NA, color=alpha("#003366",0.4), size=2.0) + 
  geom_sf(data = south_west_c, fill= NA, color=alpha("#3399FF",0.4), size=1.0) +
  geom_label_repel(data=centroid_df, aes(lon, lat,label = county,group = county),
                   size=2,nudge_x =0, nudge_y =0) + 
    scale_colour_viridis_c(aesthetics = "col", option = "cividis", direction = -1, limits=c(0, 800)) +
   theme_void() +
    theme(
      plot.title=element_text(family= "Courier",face="bold", size=18, colour = "#CC6600"),
      plot.subtitle = element_text(face="italic", size=12),
      ) +
   labs(title=paste("ANPP forecast for",str_split(unique(x$Forecast),"/", simplify = TRUE)[3]), 
         subtitle = paste0(unique(x$Forecast)," ",z), col="ANNP \n (pounds/acre) \n")
   
 
 return(a)
 
}

#Draws maps for 2020
ANPP_515202_plotAbo <- ANNP_maps(ANPP_5152020,ANPP_5152020$NPP_predict_above,
                                 "Above   Average") 

ANPP_515202_plotAvg <- ANNP_maps(ANPP_5152020,ANPP_5152020$NPP_predict_avg,"Average") 

ANPP_515202_plotBel <- ANNP_maps(ANPP_5152020,ANPP_5152020$NPP_predict_below,
                                 "Below Average") 

ANPP_9012020_plotfinal <- ANNP_maps(ANPP_9012020,ANPP_9012020$NPP_predict_avg,
                                    "Final Observation")

######
#Draws maps for 2021
ANPP_4142021_plotAbo <- ANNP_maps(ANPP_4142021,ANPP_4142021$NPP_predict_above,
                                  "Above Average") 

ANPP_4142021_plotAvg <- ANNP_maps(ANPP_4142021,ANPP_4142021$NPP_predict_avg,
                                  "Average") 

ANPP_4142021_plotBel <- ANNP_maps(ANPP_4142021,ANPP_4142021$NPP_predict_below, 
                                  "Below Average") 

ANPP_8242021_plotfinal <- ANNP_maps(ANPP_8242021,ANPP_8242021$NPP_predict_avg,
                                    "Final Observation") 

In the first observations of 2020, we see good estimations for the last season in all the scenarios but, finally the last observation turns out to be quite yellow. This explains the low correlation we see at the begging of correlation graph.

The opposite happend in 2021, the estimations where bad in the three scenarios, while the last one was pretty good. The grey points show missing data. Even more, the correlation for this year where good but they fluctuated more than in 2021. Because of this, I had the idea of showing the rains and breakdown what happens and why the first observation where so different from the first forecast, specially for the 2020 data.

### Anomalies Map

##Calculate Anomalies

pct_diff <- function(x){

  x$zscore_NPP_below = (x$NPP_predict_below - x$meanANPPgrid)/sd(x$NPP_predict_below)
  
  x$zscore_NPP_avg = (x$NPP_predict_avg - x$meanANPPgrid)/sd(x$NPP_predict_avg)
    
  x$zscore_NPP_above = (x$NPP_predict_above -x$meanANPPgrid)/sd(x$NPP_predict_above)
  
  return(x)
}

#Add Z scores and Differences
ANPP_5152020 <- pct_diff(ANPP_5152020)
ANPP_9012020 <- pct_diff(ANPP_9012020)
ANPP_4142021 <- pct_diff(ANPP_4142021)
ANPP_8242021 <- pct_diff(ANPP_8242021)

#DraMaps
Anomaly_maps <- function(x,y,z){
  
 a <-  ggplot() +
  geom_sf() +
  geom_sf(data = x, mapping = aes(col=y)) +
  geom_sf(data = south_west_s, fill = NA, color=alpha("#003366",0.4), size=2.0) + 
  geom_sf(data = south_west_c, fill= NA, color=alpha("#3399FF",0.4), size=1.0) +
  geom_label_repel(data=centroid_df, aes(lon, lat,label = county,group = county),
                   size=2,nudge_x =0, nudge_y =0) + 
    scale_colour_viridis_c(aesthetics = "col", option ="inferno", direction = -1, limits=c(-3, 3)) +
   theme_void() +
    theme(
      plot.title=element_text(family= "Courier",face="bold", size=18, colour = "#3399FF"),
      plot.subtitle = element_text(face="italic", size=12),
      ) +
   labs(title=paste("ANPP anomaly for",str_split(unique(x$Forecast),"/", simplify = TRUE)[3]), 
         subtitle = paste0(unique(x$Forecast)," ",z), col="Z score")
   
 
 return(a)
 
}

ANPP_515202_anomallyAbo <- Anomaly_maps(ANPP_5152020,ANPP_5152020$zscore_NPP_above,
                                 "Above Average") 
ANPP_515202_anomallyAvg <- Anomaly_maps(ANPP_5152020,ANPP_5152020$zscore_NPP_avg,
                                 "Average") 
ANPP_515202_anomallybel <- Anomaly_maps(ANPP_5152020,ANPP_5152020$zscore_NPP_below,
                                 "Bellow Average") 
ANPP_9012020_anomallfinal <- Anomaly_maps(ANPP_9012020,ANPP_9012020$zscore_NPP_below,
                                    "Final Observation")

ANPP_4142021_anomallyAbo <- Anomaly_maps(ANPP_4142021,ANPP_4142021$zscore_NPP_above,
                                 "Above Average") 
ANPP_4142021_anomallyAvg <- Anomaly_maps(ANPP_4142021,ANPP_4142021$zscore_NPP_avg,
                                 "Average") 
ANPP_4142021_anomallybel <- Anomaly_maps(ANPP_4142021,ANPP_4142021$zscore_NPP_below,
                                 "Bellow Average") 
ANPP_8242021_anomallfinal <- Anomaly_maps(ANPP_8242021,ANPP_8242021$zscore_NPP_below,
                                    "Final Observation")

Interpolating rain maps

#Call the data
rain_sum <- read.csv(file = "data/rain_sum.csv", head = TRUE, sep=",")

#there are some packed that only work with this CRS
TA <- CRS("+proj=utm +zone=12 +datum=WGS84")


#Change sf to sp and transfor the coordintate 
south_west_s_sp <- sf::as_Spatial(south_west_s) %>% sp::spTransform(TA)

#filter the data - November 2019 to March 2020
rain_4spring_2019 <- rain_sum %>% filter(year == 2019, month %in% c(11,12))
rain_4spring_2020 <- rain_sum %>% filter(year == 2020, month %in% c(1,2,3))
rain_4spring2020 <- rbind(rain_4spring_2020,rain_4spring_2019) %>%  sf::st_as_sf(coords = c("longitude", "latitude")) %>% st_set_crs(st_crs(south_west_s)) %>%  sf::as_Spatial() %>% sp::spTransform(TA)

#filter the data - November 2020 to March 2021
rain_4spring_2020s <- rain_sum %>% filter(year == 2020, month %in% c(11,12))
rain_4spring_2021 <- rain_sum %>% filter(year == 2021, month %in% c(1,2,3))
rain_4spring2021 <- rbind(rain_4spring_2020s,rain_4spring_2021) %>% sf::st_as_sf(coords = c("longitude", "latitude")) %>% st_set_crs(st_crs(south_west_s)) %>% sf::as_Spatial() %>% sp::spTransform(TA)

#filter the data - June to September 2020
rain_4summer2020 <- rain_sum %>% filter(year == 2020, month %in% c(6,7,8,9)) %>% sf::st_as_sf(coords = c("longitude", "latitude")) %>% st_set_crs(st_crs(south_west_s)) %>%  sf::as_Spatial() %>% sp::spTransform(TA)

#filter the data - June to September 2020
rain_4summer2021 <- rain_sum %>% filter(year == 2021, month %in% c(6,7,8,9)) %>% sf::st_as_sf(coords = c("longitude", "latitude")) %>% st_set_crs(st_crs(south_west_s)) %>%  sf::as_Spatial() %>% sp::spTransform(TA)

Inter <- function(x,y,my){
    #interpolation1
ra <- voronoi(x) #data to be interpolated
sw <- aggregate(y) #boundries
ra_sw <- intersect(ra, sw) #SpatialPolygons
##Raster the result
r <- raster(y, res=10000)
raster_ra_sw <- rasterize(ra_sw, r, field='prcp')

#this is the formula for the interpolation
gs <- gstat(formula=prcp~1, data = x, locations=sw, nmax=30 ,set=list(idp = 0))
nn <- interpolate(r, gs)
nnmsk <- mask(nn, ra_sw)

#For drawing in GGPLOt
nnmsk_pts <- rasterToPoints(nnmsk, spatial = TRUE) 

# Then to a 'conventional' dataframe
nnmsk_df  <- data.frame(nnmsk_pts)

#Change it to a St data
nnmsk_sf <- st_as_sf(nnmsk_df, coords = c("x", "y")) 

ggplot() +
  geom_sf(data = nnmsk_sf , aes(col=var1.pred)) +
  ggtitle(paste("Precipitation from",my)) +
  theme_void() +
  scale_color_viridis_c(aesthetics = "col", option="D", direction = -1, 
                       name = "Rainfall \n (mm)", limits=c(0, 180))
}

Once we see the rain, we can have an idea of why the correlations where low at the beginning of 2020 but later they become more stable than 2021. The winter precipitation for 2020 was good, and the models for the 05/15 where very promising.

## [inverse distance weighted interpolation]

Once the precipitation forecast where included in beginning of June and they showed very low precipitation for the monsoon season, the forecast for that day showed high correlation since, that is because the precipitation for the summer of 2020 end it up with very low rainfall across the southwest and this scenario was stable during all the forecast season after.

FALSE [inverse distance weighted interpolation]

The forecast for 2021 showed low ANNP at the begging of the season, because the winter rains that feed the spring growth where low. The correlation stared greater than 2020 probably because the final observation was not as different.

## [inverse distance weighted interpolation]

Still we saw a more variable correlation for this season, probably because the monsoon of 2021 was the greatest in history and the model has hard time to adjust to extremes. By the end of the summer, the rains where above normal and the correlation took a long time to adjust to the observation. The above scenario keep the higher correlation across all forecast in 2021.

## [inverse distance weighted interpolation]

Conclusions

There is a lot to understand about Forecast and Productivity for dry lands, as in every model, the extremes are hard to predict, the 2020 and 2021 seasons are excellent study cases to understand how well this models work unexpected situations. The rains for winter are easier to predict than the monsoon season (Smith et al., 2019), and the model gets more accurate as the monsoon season forecast improves or shows below normal precipitation from the start.

References

Brown, J. R., & Thorpe, J. (2008). Climate Change and Rangelands: Responding Rationally to Uncertainty. 4.

Hartman, M. D., Parton, W. J., Derner, J. D., Schulte, D. K., Smith, W. K., Peck, D. E., Day, K. A., Del Grosso, S. J., Lutz, S., Fuchs, B. A., Chen, M., & Gao, W. (2020). Seasonal grassland productivity forecast for the U.S. Great Plains using Grass‐Cast. Ecosphere, 11(11). https://doi.org/10.1002/ecs2.3280

ILRI. (2021). Rangelands ATLAS. International Livestock Research Institute. https://www.rangelandsdata.org/atlas/sites/default/files/2021-05/Rangelands%20Atlas.pdf

National Drought Mitigation Center. 2022. [Data Set] available online: https://grasscast.unl.edu/Archive.aspx

NOAA. (2021). GHCN (Global Historical Climatology Network) [Data Set] available online: https://www.ncdc.noaa.gov/cdo-web/

Seager, R., Ting, M., Held, I., Kushnir, Y., Lu, J., Vecchi, G., Huang, H.-P., Harnik, N., Leetmaa, A., Lau, N.-C., Li, C., Velez, J., & Naik, N. (2007). Model Projections of an Imminent Transition to a More Arid Climate in Southwestern North America. Science, 316(5828), 1181–1184. https://doi.org/10.1126/science.1139601

Smith, W. K., Dannenberg, M. P., & Yang, D. (2019). Remote sensing of dryland ecosystem structure and function_ Progress, challenges, and opportunities. Remote Sensing of Environment, 23.