By: Adam Rivers

July 2019


Research challenge

Frost damage costs American farmers more than any other weather hazard. In annual crops early frost can kill seedlings and late frost can damage fruits and grains before harvest. The best information available on frost-free dates comes from the 30-year climatological average frost-free dates published by the National Oceanographic and Atmospheric admiration and updated every decade. The current NOAA frost-free data represent a historical average centered on the year 1995. Unfortunately, the global temperature has risen 0.59°C since 1995. There is also some evidence of increased variability in frost dates, which is critically important for mitigating risk in planting decisions. Work has been done modeling frost free-dates in the 50-100 year time horizon. There is a critical need for accurate predictions of the timing and probability of frost for the next planting season.

Objective

The Postdoc will develop a predictive model of frost-free dates and probabilities for future growing seasons. The data will be disseminated to the public through an interactive web application.

National program and objective:

NP 301: Plant Genetic Resources, Genomics and Genetics Improvement; Components

Approach

Data for the model will come from the National Oceanographic and Atmospheric Administration (NOAA). Specifically, US Daily Climate Normal Data a will be supplemented with data from the Local Climatological Dataset and homogeneity corrected using software developed by NOAA for the US Historical Climatology Network. The postdoc will develop Bayesian generalized mixed models with geospatial smoothing to create robust estimates of the frost-free dates and variability.


Preliminary analysis

A quick pilot analysis was done for this proposal to estimate the magnitude of change in frost free dates. NOAA Daily Climate Normal data set from 1951-2010 was used to analyze the frost-free dates at 5,134 stations where frost occurred every year. This data differs from the more accurate data set used for calculating the NOAA climate normals. That data set applied homogeneity correction to account for systemic changes (urbanization, station moves etc.).The frost free date climate normals are not simple means but are calculated by a bootstrapping approach that samples across years and before and after the observed frost free date. This is a state-of-the-art method but it is designed to calculate the mean and variance of the frost free dates assuming the data are not trended. Key findings from the preliminary data are:

Preliminary findings

  1. There is a strong regional difference in the timing of frost free dates, with the east showing earlier spring dates and later fall dates (see maps below).
  2. The average station is predicted to have frost free dates 3 days earlier in the spring and and 6 days later the fall of 2020 than it did on average over the period 1981-2020 with local station changes as large as 2-3 weeks.
  3. Increased variability in frost free dates has been was hypothesized, but that was not a common occurrence in most regions.
  4. Aggregated national trends mask these changes because a shortened western growing season offsets a lengthened eastern growing season.

Next steps

  1. A more sophisticated model incorporating bootstrapping and estimator shrinkage should be done to improve the accuracy of the model at the station level
  2. The underling data should be homogeneity corrected.
  3. Data from 2010 to 2019 should be added.
  4. An empirical model testing framework that evaluates model error should be developed.
  5. A web application for understanding accessing and understanding the data needs to be designed.

Source data

The source data set used in the analysis is the Daily Filled Adjusted Tmin data spanning 1951-2010. Data were downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/supplemental/source-datasets/daily-filled-adjusted-tmin.dat.gz. Data were initially parsed in Python to create the spring and fall Tmin data containing just the first and last dates of each year where the temperature fell below 0ºC every year.

# Setup
library(tidyverse)
library(reshape2)
setwd("~/Documents/frost_free/")
dates <- seq(1981,2010)

Import spring data

spring <- read.table("spring.csv", sep=",", header=T)
rownames(spring)<-spring$X
spring<- spring[,-1]
filteredspring <- spring %>% drop_na() 

Import fall data

fall <- read.table("fall.csv", sep=",", header=T)
rownames(fall)<-fall$X
fall<- fall[,-1]

filteredfall <- fall %>% drop_na() 

Looking at trend patterns of individual stations

General trends do not show a lot but it is possible that these trends are larger at specific sites. The next step is to look at trends across individual stations. we fit a GLM for each station time series and look at the distribution of slopes and predicted frost free dates based on the models.

Looking for regional differences

allstations <-read.table("allstations-clean.txt", header=F, sep="\t")
springloc <- merge(spring2020gap, allstations, by.x="row.names", by.y="V1")
library(ggmap)
library(maps)
library(mapdata)
bc_bbox <- make_bbox(lat = V2, lon = V3, data = springloc)
bc_big <- get_map(location = bc_bbox, source = "google", maptype = "terrain-background")
maptype = "terrain-background" is only available with source = "stamen".
resetting to source = "stamen"...
ggmap(bc_big) + 
  geom_point(data = springloc, mapping = aes(x = V3, y = V2, color = -pdate), size=0.9) + scale_colour_gradient2(trans="pseudo_log") +
  ggtitle("Predicted change in spring frost free date in 2020 versus mean from 1981-2010")

allstations <-read.table("allstations-clean.txt", header=F, sep="\t")
fallloc <- merge(fall2020gap, allstations, by.x="row.names", by.y="V1")
library(ggmap)
library(maps)
library(mapdata)
bc_bbox <- make_bbox(lat = V2, lon = V3, data = fallloc)
bc_big <- get_map(location = bc_bbox, source = "google", maptype = "terrain-background",color="bw")
maptype = "terrain-background" is only available with source = "stamen".
resetting to source = "stamen"...
ggmap(bc_big) + 
  geom_point(data = fallloc, mapping = aes(x = V3, y = V2, color = pdate), size=0.9) + scale_colour_gradient2(trans="pseudo_log") +
  ggtitle("Predicted change in fall frost free date in 2020 versus mean from 1981-2010")

Conclusions about regional data

There do seem to be strong regional trends in the Changes in frost free dates, with strong warming in the Southeast.

Changes in variance

The variability of Frost dates is also an important factor to mitigating the risk of frost damage to crops. To analyze that we take the de-trended data (residuals) from the GLM at each site and apply a test for equality of variances for the periods 1981-1995 and 1996-2010.


spring_res <-  apply(x, 2, function(x.col) residuals(glm(x.col ~ dates, na.action=na.omit)))
springvr <- (apply(spring_res, 2, function(spring_res.col) (var.test(spring_res.col[1:15],spring_res.col[16:30], alternative="greater"))$estimate ))
ggplot(data.frame(springvr), aes(x=log(springvr,2))) + geom_histogram(bins=60) + 
geom_vline(aes(xintercept=mean(log(springvr, 2))), color="blue", linetype="dashed", size=1) +
xlab("Variance ratio") + ylab("Count") + ggtitle("Variance log 2 ratio for de-trended data from  1981-1995 and 1996-2010")

springpval <- (apply(spring_res, 2, function(spring_res.col) (var.test(spring_res.col[1:15],spring_res.col[16:30], alternative="two.sided"))$p.value))
spv.df<-data.frame(springpval)
spv.df$stations <- rownames(spv.df)
sig.stations <-spv.df[spv.df$springpval<0.05,]
# Most stations had sigifigantly greater variation in the later period
springvar <- merge(data.frame(springvr), allstations, by.x="row.names", by.y="V1")
springvarandp <- merge(springvar,sig.stations, by.x="Row.names", by.y="stations")
ggmap(bc_big) + 
  geom_point(data = springvarandp, mapping = aes(x = V3, y = V2, color = log(springvr,2)), size=2) + scale_colour_gradient2() +
  ggtitle("Stations with statistically significant changes in log variance ratios between 1981-1995 and 1996-2010")

ggmap(bc_big) + 
  geom_point(data = springvar, mapping = aes(x = V3, y = V2, color = log(springvr,2)), size=0.9) + scale_colour_gradient2()+
  ggtitle("Log variance ratios between 1981-1995 and 1996-2010")

Conclusions on variance

There does not appear to be a strong shift in variance when comparing de-trended data from 1981-1995 and 1996-2010. There are some regional patterns in the minor shift observed. Potential artifacts driving (such as poorer fit for one period) need to be investigated before any stronger conclusion can be made.

General conclusions

There may be regionally important shifts in the timing of frost free dates by up to several weeks. These preliminary methods need to be confirmed using bootstrapping methods, to minimize the stochastic nature of “last event” data like frost free days. Developing models for local stations rather than large scale trends will require methods like bootstrapping and shrinkage to reduce variability. Finally, models developed need to be evaluated for their error in cross-fold test /train validation experiments and the data needs to be made available in a user-freindly web application.

---
title: "Exploring the need for a predictive frost-free date model and web application to improve planting decisions"
output: html_notebook
---
### By: Adam Rivers
### July 2019

*** 

# Research challenge

Frost damage costs American farmers more than any other weather hazard. In annual crops early frost can kill seedlings and late frost can damage fruits and grains before harvest. The best information available on frost-free dates comes from the 30-year climatological average frost-free dates published by the National Oceanographic and Atmospheric admiration and updated every decade. The current NOAA frost-free data represent a historical average centered on the year 1995. Unfortunately, the global temperature has risen 0.59°C since 1995. There is also some evidence of increased variability in frost dates, which is critically important for mitigating risk in planting decisions.  Work has been done modeling frost free-dates in the 50-100 year time horizon. There is a critical need for accurate predictions of the timing and probability of frost for the next planting season. 

# Objective

The Postdoc will develop a predictive model of frost-free dates and probabilities for future growing seasons. The data will be disseminated to the public through an interactive web application.

# National program and objective: 

NP 301: Plant Genetic Resources, Genomics and Genetics Improvement; Components

# Approach

Data for the model will come from the National Oceanographic and Atmospheric Administration (NOAA). Specifically, US Daily Climate Normal Data a will be supplemented with data from the Local Climatological Dataset and homogeneity corrected using software developed by NOAA for the US Historical Climatology Network.  The postdoc will develop Bayesian generalized mixed models with geospatial smoothing to create robust estimates of the frost-free dates and variability. 

*** 

# Preliminary analysis

A quick pilot analysis was done for this proposal to estimate the magnitude of change in frost free dates. NOAA Daily Climate Normal data set from 1951-2010 was used to analyze the frost-free dates at 5,134 stations where frost occurred every year. This data differs from the more accurate data set used for calculating the NOAA climate normals.  That data set applied homogeneity correction to account for systemic changes (urbanization, station moves etc.).The frost free date climate normals are not simple means but are calculated by a bootstrapping approach that samples across years and before and after the observed frost free date. This is a state-of-the-art method but it is designed to calculate the mean and variance of the frost free dates assuming the data are not trended. Key findings from the preliminary data are:

## Preliminary findings
1. **There is a strong regional difference in the timing of frost free dates, with the east showing earlier spring dates and later fall dates (see maps below).**
2. The average station is predicted to have frost free dates 3 days earlier in the spring and and 6 days later the fall of 2020 than it did on average over the period 1981-2020 with local station changes as large as 2-3 weeks.
3. Increased variability in frost free dates has been was hypothesized, but that was not a common occurrence in most regions.
4. Aggregated national trends mask these changes because a shortened western growing season offsets a lengthened eastern growing season.

# Next steps
1. A more sophisticated model incorporating bootstrapping and estimator shrinkage should be done to improve the accuracy of the model at the station level
2. The underling data should be homogeneity corrected.
3. Data from 2010 to 2019 should be added.
4. An empirical model testing framework that evaluates model error should be developed.
4. A web application for understanding accessing and understanding the data needs to be designed.

# Source data 
The source data set used in the analysis is the Daily Filled Adjusted Tmin data spanning 1951-2010. Data were downloaded from `ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/supplemental/source-datasets/daily-filled-adjusted-tmin.dat.gz`. Data were initially parsed in Python to create the spring and fall Tmin data containing just the first and last dates of each year where the temperature fell below 0ºC every year.

```{R}
# Setup
library(tidyverse)
library(reshape2)
setwd("~/Documents/frost_free/")
dates <- seq(1981,2010)
```
## Import spring data
```{R}
spring <- read.table("spring.csv", sep=",", header=T)
rownames(spring)<-spring$X
spring<- spring[,-1]
filteredspring <- spring %>% drop_na() 
```

## Import fall data
```{R}
fall <- read.table("fall.csv", sep=",", header=T)
rownames(fall)<-fall$X
fall<- fall[,-1]

filteredfall <- fall %>% drop_na() 
```


# Look at national spring and fall trends

```{R}
# Convert DF to matrix
x <- t(as.matrix(filteredspring))
#Calculate per station mean date
xmean<- apply(x, 2, mean)
# Subrtact mean from date
xnorm <- x - xmean

xlabeled <- data.frame(year=seq(1951,2010),xnorm)
xlong <- melt(xlabeled, id.vars="year")
# Plot centered dates
ggplot(xlong, aes(x=as.factor(year), y=value)) +
  geom_boxplot() +
  stat_summary(fun.y=mean, geom="point", shape=23, size=2) +
  theme(axis.text.x = element_text(angle = 90)) +
  xlab("Year") + ylab(" Days away from mean day") + ggtitle("Annual deviations from station mean in spring frost free date 1951-2010")
```


```{R}
# Convert DF to matrix
x <- t(as.matrix(filteredfall))
#Calculate per station mean date
xmean<- apply(x, 2, mean)
# Subrtact mean from date
xnorm <- x - xmean

xlabeled <- data.frame(year=seq(1951,2010),xnorm)
xlong <- melt(xlabeled, id.vars="year")
# Plot centered dates
ggplot(xlong, aes(x=as.factor(year), y=value)) +
  geom_boxplot() +
  stat_summary(fun.y=mean, geom="point", shape=23, size=2) +
  theme(axis.text.x = element_text(angle = 90)) +
  xlab("Year") + ylab("Days away from mean") + ggtitle("Annual deviations from station mean in fall frost free date 1951-2010")
```

### Conclusions about national trends
The aggregated station data do not show trends relative to the large inter-annual fluctuations in frost date.

***

# Looking at trend patterns of individual stations

General trends do not show a lot but it is possible that these trends are larger at specific sites. The next step is to look at trends across individual stations. we fit a GLM  for each station time series and look at the distribution of slopes and predicted frost free dates based on the models.

## Spring trends 1981-2010
```{R}
model2020function <- function(x.col) {
  mod1 <- glm(x.col ~ dates, na.action=na.omit)
  est1 <- predict(mod1, newdata=data.frame(dates=c(2020)))
  return(mean(x.col)- est1)
}

dates <- seq(1981,2010)
x <- t(as.matrix(filteredspring[,31:60]))
springcoefs <-  t(apply(x, 2, function(x.col) glm(x.col ~ dates, na.action=na.omit)$coef))
sdf <-as.data.frame(springcoefs)
```

```{R}
ggplot(sdf, aes(x=dates)) + geom_histogram(bins=60) + 
geom_vline(aes(xintercept=mean(dates)), color="blue", linetype="dashed", size=1) +
xlab("Slope") + ylab("Count") + ggtitle("Slopes of station fits to spring frost free date 1981-2010")
```

```{R}
spring2020gap <-  data.frame(pdate= apply(x, 2, model2020function))
mean(spring2020gap$pdate)
```
```{r}
ggplot(spring2020gap, aes(x=pdate)) + geom_histogram(bins=60) + 
geom_vline(aes(xintercept=mean(pdate)), color="blue", linetype="dashed", size=1) +
xlab("Days") + ylab("Count") + ggtitle("Predicted offset of 2020 spring frost free date from fit of frost free dates, 1981-2010")

```

### Fall trends 1981-2010

```{R}
dates <- seq(1981,2010)
x <- t(as.matrix(filteredfall[,31:60]))
fallcoefs <-  t(apply(x, 2, function(x.col) glm(x.col ~ dates, na.action=na.omit)$coef))
fdf <-as.data.frame(fallcoefs)
```

```{R}
ggplot(fdf, aes(x=dates)) + geom_histogram(bins=60) + 
geom_vline(aes(xintercept=mean(dates)), color="blue", linetype="dashed", size=1) + 
  xlab("Slope") + ylab("Count") + ggtitle("Slopes of station fits to Spring frost free date 1981-2010")
```

Both Spring and Fall frost free dates show a warming trend where the frost free date is coming sooner in the spring and later in the fall. This trend can be put into the number of days shifted by fitting the model.


```{R}
fall2020gap <-  data.frame(pdate= apply(x, 2, model2020function))
```
```{r}
ggplot(fall2020gap, aes(x=pdate)) + geom_histogram(bins=60) + 
geom_vline(aes(xintercept=mean(pdate)), color="blue", linetype="dashed", size=1) +
xlab("Days") + ylab("Count") + ggtitle("Predicted offset of 2020 fall frost free date from fit of frost free dates, 1981-2010")

mean(fall2020gap$pdate)
```

### Conclusions of per station trend data

Both Spring and Fall frost free dates show a warming trend. The Spring frost free date is coming 3 days sooner and the fall frost free date is coming 6 days later. This trend can be put into the number of days shifted by fitting the model.

There does appear to be an overall shift 


# Looking for regional differences

```{r}
allstations <-read.table("allstations-clean.txt", header=F, sep="\t")
springloc <- merge(spring2020gap, allstations, by.x="row.names", by.y="V1")
```

```{r}
library(ggmap)
library(maps)
library(mapdata)
bc_bbox <- make_bbox(lat = V2, lon = V3, data = springloc)
bc_big <- get_map(location = bc_bbox, source = "google", maptype = "terrain-background")
ggmap(bc_big) + 
  geom_point(data = springloc, mapping = aes(x = V3, y = V2, color = -pdate), size=0.9) + scale_colour_gradient2(trans="pseudo_log") +
  ggtitle("Predicted change in spring frost free date in 2020 versus mean from 1981-2010")

```


```{r}
allstations <-read.table("allstations-clean.txt", header=F, sep="\t")
fallloc <- merge(fall2020gap, allstations, by.x="row.names", by.y="V1")
```

```{r}
library(ggmap)
library(maps)
library(mapdata)
bc_bbox <- make_bbox(lat = V2, lon = V3, data = fallloc)
bc_big <- get_map(location = bc_bbox, source = "google", maptype = "terrain-background",color="bw")
ggmap(bc_big) + 
  geom_point(data = fallloc, mapping = aes(x = V3, y = V2, color = pdate), size=0.9) + scale_colour_gradient2(trans="pseudo_log") +
  ggtitle("Predicted change in fall frost free date in 2020 versus mean from 1981-2010")
```
### Conclusions about regional data
There do seem to be strong regional trends in the Changes in frost free dates, with strong warming in the Southeast.

# Changes in variance

The variability of Frost dates is also an important factor to mitigating the risk of frost damage to crops. To analyze that we take the de-trended data (residuals) from the GLM at each site and apply a test for equality of variances for the periods 1981-1995 and 1996-2010.
```{r}

spring_res <-  apply(x, 2, function(x.col) residuals(glm(x.col ~ dates, na.action=na.omit)))
springvr <- (apply(spring_res, 2, function(spring_res.col) (var.test(spring_res.col[1:15],spring_res.col[16:30], alternative="greater"))$estimate ))

```

```{r}
ggplot(data.frame(springvr), aes(x=log(springvr,2))) + geom_histogram(bins=60) + 
geom_vline(aes(xintercept=mean(log(springvr, 2))), color="blue", linetype="dashed", size=1) +
xlab("Variance ratio") + ylab("Count") + ggtitle("Variance log 2 ratio for de-trended data from  1981-1995 and 1996-2010")
```

```{r}
springpval <- (apply(spring_res, 2, function(spring_res.col) (var.test(spring_res.col[1:15],spring_res.col[16:30], alternative="two.sided"))$p.value))
spv.df<-data.frame(springpval)
spv.df$stations <- rownames(spv.df)
sig.stations <-spv.df[spv.df$springpval<0.05,]
# Most stations had sigifigantly greater variation in the later period
```

```{r}
springvar <- merge(data.frame(springvr), allstations, by.x="row.names", by.y="V1")
springvarandp <- merge(springvar,sig.stations, by.x="Row.names", by.y="stations")
ggmap(bc_big) + 
  geom_point(data = springvarandp, mapping = aes(x = V3, y = V2, color = log(springvr,2)), size=2) + scale_colour_gradient2() +
  ggtitle("Stations with statistically significant changes in log variance ratios between 1981-1995 and 1996-2010")
```

```{r}
ggmap(bc_big) + 
  geom_point(data = springvar, mapping = aes(x = V3, y = V2, color = log(springvr,2)), size=0.9) + scale_colour_gradient2()+
  ggtitle("Log variance ratios between 1981-1995 and 1996-2010")
```


## Conclusions on variance 

There does not appear to be a strong shift in variance when comparing de-trended data from 1981-1995 and 1996-2010. There are some regional patterns in the minor shift observed. Potential artifacts driving (such as poorer fit for one period) need to be investigated before any stronger conclusion can be made. 

# General conclusions

There may be regionally important shifts in the timing of frost free dates by up to several weeks. These preliminary methods need to be confirmed using bootstrapping methods, to minimize the stochastic nature of "last event" data like frost free days. Developing models for local stations rather than large scale trends will require methods like bootstrapping and shrinkage to reduce variability. Finally, models developed need to be evaluated for their error in cross-fold test /train validation experiments and the data needs to be made available in a user-freindly web application.
