require(tidyverse)
require(ggplot2)
require(ggrepel)
require(ggridges)
require(patchwork)
require(zoo)
theme_set(theme_minimal())
The dataset I choose is a 650 year record of grape harvest dates (GHD) of 27 regions in Western Europe. I find it when searching for climate datasets on Data is Plural. It comes from (Daux, Valérie, et al) and the researchers analyse to its pattern to verify the impact of climate changing. I have never seen such a “historical” dataset before and find it very interesting - a long term dataset should be very precious. Also, the average global temperature is increase steadily in past hundred years, and I wonder if I can see a somehow related trend in GHD series
The data consist of two parts. - The main data frame: 650 years x 27 regions. Each row is a year and each column is a region. The values in the data frame is the regional mean number of days between the grape harvest date and September 1st. - The location data frame: A sheet providing the longitude and latitude of 27 regions.
The raw data contains some unnecessary metadata and headers so I remove them manually and upload the two part as two csv to Box.
ghd = read_csv("https://uwmadison.box.com/shared/static/phoz9eco2dpfk5inpquipibwsm7i00em.csv", show_col_types = FALSE)
tail(ghd, 3)
## # A tibble: 3 x 28
## year Alsace Auvergne `Auxerre-Avalon` `Beaujolais and Maco~ Bordeaux Burgundy
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2005 29 NA NA NA -4.9 13.5
## 2 2006 NA NA NA NA 16.1 16
## 3 2007 NA NA NA NA NA NA
## # ... with 21 more variables: Champagne 1 <dbl>, Champagne 2 <dbl>,
## # Gaillac- South-West <dbl>, Germany <dbl>, High Loire Valley <dbl>,
## # Ile de France <dbl>, Jura <dbl>, Languedoc <dbl>, Low Loire Valley <dbl>,
## # Luxembourg <dbl>, Maritime alps <dbl>, Northern Italy <dbl>,
## # Northern Lorraine <dbl>, Northern Rhone valley <dbl>, Savoie <dbl>,
## # Spain <dbl>, Southern Lorraine <dbl>, Southern Rhone valley <dbl>,
## # Switzerland (Leman Lake) <dbl>, Various South-East <dbl>, ...
location = read_csv("https://uwmadison.box.com/shared/static/y1ln5tqxar0ex4finhsn2d3gc3o7x031.csv", show_col_types = FALSE)
head(location, 3)
## # A tibble: 3 x 3
## Location Latitude Longitude
## <chr> <dbl> <dbl>
## 1 Alsace 48.2 7.28
## 2 Auvergne 45.6 3.17
## 3 Auxerre-Avalon 47.8 3.57
zoo:rollapply.pivot_longer to tidy the data set.loc_order = location %>%
drop_na() %>%
arrange(desc(Latitude)) %>%
pull(Location)
ghd_roll = ghd %>%
mutate(across(!year, rollapply, width=20, FUN=mean, fill=NA)) %>%
pivot_longer(!year, names_to="Location", values_to = "Date") %>%
mutate(Location = factor(Location, loc_order)) %>%
drop_na(Location)
ghd_pivot = pivot_longer(ghd, !year, names_to="Location", values_to = "Date") %>%
mutate(Location = factor(Location, loc_order)) %>%
drop_na(Location)
The first plot is about the time series. The dates every change randomly so scatter plot is preferred. Moving average is plotted on top of the scatter plot of the raw data.
ghd_pivot %>%
ggplot(aes(year, Date)) +
geom_line(data = ghd_roll, colour="red", size=2) +
geom_point(colour="grey", alpha=0.3) +
facet_wrap("Location", ncol = 4)
## Plot 2: Histogram of To better understand the difference between regions, I made another histogram of all the regions sorted ascendingly by latitude.
ghd_pivot %>%
# filter(year > 1800) %>%
ggplot() +
geom_boxplot(aes(Location, Date)) +
coord_flip()
I did not expect the dates fluctuate so drastically. In fact many records before 1800 are just like random noises. However, after moving average we can see many regions have a general descending trend. This means the harvest date gradually become earlier in the last century. This may because the warmer climate let the grape ripe earilier, or maybe something else. Due to the limited records and other variables, we cannot determine the exact reason.
Another finding is that the latitude is not that related to the harvest date as well. Although there is a slight trend of later GHD in the high latitude region, regions around Mediterranean Sea have even latter GHD
Daux, Valérie, et al. “An open-access database of grape harvest dates for climate research: data description and quality assessment.” Climate of the Past 8.5 (2012): 1403-1418.