Lauren Steely, @MadreDeZanjas
March 2017 (updated August 2017)
Epistemic status: Not pretending this is a rigorous scientific study. This was more about seeing if I could get any insight out of relatively simple data analysis using easily available data. And also to develop a workflow for future analyses using Rmarkdown, RPubs, and plot.ly.
In early 2017, Udall and Overpeck’s recent paper on “hot drought” in the Colorado River basin got quite a bit of press. Their message, paraphrased, amounts to “It’s the temperature, stupid.” U&O divide droughts into “precipitation-dominated droughts” and “temperature-dominated droughts”. Precip-dominated drought is our classical conception of drought – a period of well-below-average precipitation. But U&O also identify a more recent style of drought where runoff is low, yet precip is only a little below average. The driver in these is high temperatures. Overpeck says that in the most recent drought, Colorado River flows decreased more than predicted by precipitation alone because temperature played an outsized role.
How might this apply to California? Climate scientists predict that warmer temperatures will have two effects on precipitation in California:
The combined result of these effects is that reservoirs will see more inflow occuring earlier in the year. This, as we saw at Oroville this year, can create problems.
But can we see this effect in the historical data? What, historically, has been the effect of temperature on the timing of runoff into reservoirs?
I began by downloading reservoir inflow data from DWR and importing it into R for analysis. The data go back as far as 1996, giving us about 20 years of flows to work with. Plotting the time series and zooming in on any particular year, we can see the general pattern of inflow: winter storms bring sharp spikes of runoff from December to April, then melting snowpack produces a broad pulse of flow that sustains the reservoir until mid-summer (click and drag horizontally to zoom in on a year):
.
.The huge peak is from the 1997 El Nino, one of the most powerful in recorded history. Stacking the time series, the variations in timing become evident. Large storms occur sporadically before April 1 (dashed red line), after which a broad pulse of meltwater fills the reservoir throughout the spring and summer.
.
How should we quantify the timing of runoff? One way is to find the ‘centroid’ of the inflow into the reservoir. For reservoirs that dam rivers, we can imagine the inflow curve for each water year as a skewed bell curve representing the river’s base flow, onto which is superimposed transient peaks from the winter storms. We wish to find the date at which 50% of the inflow for that water year has occured. One way to do this is to turn the inflow curves into cumulative inflow curves, rescale them all to 0–1, and then find the date that corresponds to 0.5, the midpoint of the cumulative curve.
In the charts that follow, I used the water year, which starts on October 1, rather than calendar year. Precipitation in California is highly seasonal, with most precip falling between November and April. It makes sense to start counting inflows at the beginning of the wet season rather than in the middle.
.
.There’s quite a lot of variability in the total inflow from year to year. Lake Oroville has a capacity of 3.5 MAF, but during the stormy winter months DWR limits it to 2.8 MAF to allow space for flood control. That restriction is loosened after April 1, allowing snowmelt to top off the reservoir. In wet years, Oroville receives much more inflow than its 3.5 MAF capacity and has to spill into the Feather River. In dry years such as the 2011-16 drought, it receives much less.
Normalizing the curves to [0-1] gives:
.It’s now fairly simple to find the date where the cumulative flow reaches 0.5 for each water year. Since we’re interested in the effect of temperature on this date, I downloaded some temperature data from the RAWS Quincy Road station, which lies on a tributary of the North Fork of the Feather River, square in the middle of the Lake Oroville watershed. Ideally we’d want to find some additional data to get an average from different points around the Oroville watershed.
With the average daily air temperature data I computed the mean temperature for each water year. To zero in on the effect of temperature on precip mode and snowpack melting, I calculated the mean temp for the water year using just the eight months of November through June, when most precip and melting is occuring. For lack of a better term, I’ll call this the “runoff generating period of the water year”.
.
.Mean annual temperatures varied from 40.7 to 46.3 oF.
Finally, we’re ready to plot the date of 50% reservoir inflow against the mean temperature for the water year. The prediction is that warmer temperatures will, ceti paribus, lead to earlier inflows. Here’s the scatter plot (ignore the 2017 in the y-axis label):
.
The date of 50% inflow for Oroville has varied from Jan 8 to April 28, but most years it occurs sometime in March. In the northern Sierra, peak snowpack occurs in late March on average, and DWR considers April 1 to be the end of major precipitation and the beginning of snowpack melting. A couple years, 1997 and 2013, are outliers. It seems one or two very large storms can have a large effect on the peak timing.
For every 1 degree F increase in mean temperature during the runoff generating part of the water year (Nov–June), the date of peak inflow is 8–15 days earlier (p = 0.002, R2 = 0.37):
##
## Call:
## lm(formula = julian ~ wy_avgtemp, data = fif)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.805 -8.651 3.184 14.395 30.505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 604.63 148.40 4.074 0.000647 ***
## wy_avgtemp -11.76 3.27 -3.598 0.001919 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.94 on 19 degrees of freedom
## Multiple R-squared: 0.4052, Adjusted R-squared: 0.3739
## F-statistic: 12.94 on 1 and 19 DF, p-value: 0.001919
The effect is even weakly visible if we arrange the inflow curves by temperature:
. .
The snowpack serves as an enormous ‘free’ reservoir for California, storing nearly as much water (5-20 MAF) as all our artificial reservoirs combined and releasing it steadily throughout the spring and summer months when demand is highest. The Northern Sierra snowpack builds throughout the winter, reaching a peak in late March: On April 1, DWR measures the depth of the snowpack and estimates how much water will be available through the State Water Project for that year. If the snowpack peaks earlier due to less snow and faster melting, there may be less water available later in the summer when demand is highest. In this analysis, I did not consider other factors that may affect the rate of snowmelt, such as forest thinning due to wildfire or dust settling out of the atmosphere onto the snow surface, where it reduces albedo and heats the snow..
This exercise suggests a deeper question that I hadn’t thought to ask before: Are some reservoirs more temperature-sensitive than others? We could imagine that reservoirs that are situated closer to their source waters would show a stronger temperature dependence than lower-elevation reservoirs, or that reservoirs fed by more snowpack vs rain would also be more sensitive. This reminds me of the work that Dr. Naomi Tague and some of my former Bren colleagues are doing on snow- vs rain- dominated watersheds. Temperature sensitivity of reservoirs could have some interesting policy implications – e.g. which reservoirs do we prioritize for infrastructure improvements or re-operation?
It would be very nice if we had a longer historical record to analyze. DWR may have old reservoir data sitting around in some form, and some enterprising analyst could probably work with them to get it. But more importantly, the data that is online is difficult to access and time consuming to process. There’s no way to download a raw csv file, so instead I had to scrape the web page. The CIMIS weather data, by contrast, is easy to access through their API; I even wrote an R tool to do it in a single line of code.
With all the discussion recently about the need for better water data, one concrete thing agencies like DWR could do is to develop APIs for all of their public-facing databases. That would allow civic-minded developers, scientists, and open data evangelists to develop tools and better front-ends for that data. Transit is a good example of this. By urging cities to standardize their public transit information and expose it through an API, Google was able to provide a much smoother user experience through Google Maps than what the cities had been providing.
In terms of the science, there are other response variables we could analyze. Instead of reservoir inflows, we could look at Sierra streamflows or snowpack SWE (snow water equivalent). But reservoirs are nice because they aggregate multiple effects (rain vs snow dominance, timing of snowmelt) and drain large areas of alpine watershed. We could also look at other predictors such as solar radiation, or at least refine our temperature data to get a more respresentative estimate of temperature than the crude annual mean used here. There may also be a nonlinear response between temperature and runoff that this model doesn’t consider. If you have ideas for how to improve this analysis, please let me know.
Made with love using R, RStudio, RMarkdown, the tidyverse, plot.ly, and ggjoy.. R code for this analysis can be found on Github here.