Detecting Drought of NRW

Studyproject for the Westfaelische Wilhelms-University (Germany)

Required libraries

library(rmarkdown)
library(sp)
library(sf)
library(R.utils)
library(raster)
library(rgdal)
library(stringr)
library(ggplot2)

Introduction

In this project we want to determine the effect of the drought 2018 in Nordrhine-Westphalia (NRW), Germany. We will focus on the fitness, using the NDVI (Normalised Difference Vegetation Index). Our reference data will be from 2012.

Getting the data

At first we need satellite data of NRW. Therefore we use the earth explorer by USGS (https://earthexplorer.usgs.gov/), because it is freely available. Data from 2012 and 2018 are available by Landsat 7 and we need at least band 3 (red) and band 4 (near infrared (NIR)) to calculate the NDVI. Here we will also download band 1 (blue) und band 2 (green). For NRW we need about 7 tiles, then we can put them together and crop them to the size of the state. For detecting the drought the data of end of august/ end of summer will be used.
2012 tiles (path/row): 195/24, 195/25, 196/24, 196/25, 197/23, 197/24, 197/25
2018 tiles (path/row): 196/23, 196/24, 196/25, 197/23, 197/24, 197/25, 198/24

A shapefile of NRW is available here (also free): https://webcache.googleusercontent.com/search?q=cache:00e4m2OiTskJ:https://www.bezreg-koeln.nrw.de/brk_internet/geobasis/topographie_sonderkarten/verwaltungsgrenzen/index.html+&cd=1&hl=de&ct=clnk&gl=de
The shapefile has a different coordinate reference system (crs), so we have to change it first.

#nrw shapefile
nrw <- read_sf("data/dvg1bld_nw.shp")
nrw <- st_transform(nrw, crs="+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs   
                    +ellps=WGS84 +towgs84=0,0,0")  #crs of the satelite data

#untar tiles
dir.create("tiles")
files <- list.files(pattern = "tar.gz$")
for (i in seq(files)) {
untar(files, exdir = "./tiles")  
}

#put together tiles and crop to nrw for each band
#run loop for each year and save respectively

#2012
x <- list.files(path = "./2012", pattern = ".tif$") #getting all .tif files
x <- substr(x,0, nchar(x)-5)   #delete the last 5 letters for the loop

#2018
x <- list.files(path = "./2018", pattern = ".tif$")
x <- substr(x,0, nchar(x)-5)

#preloop for transforming crs of some tiles with different crs
tile <- "name_of_tile_with_different_crs"
z <- 1:4  #each band
rest <- list()
for (i in seq(z)) {
  rest[[i]] <- stack(paste0(main,tile,z[[i]],".tif")) 
  rest[[i]] <- projectRaster(rest[[i]], crs= "+proj=utm +zone=32 +datum=WGS84  
                             +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")
  rest[[i]] <- crop(rest[[i]], nrw)
  rest[[i]] <- mask(rest[[i]], nrw)
}

#loop for downloading the bands 
y <- 1:4
ger <- list()
ger_fin <- list()

for (i in seq(y)) {
  for (j in seq(x)) {
    ger[[j]] <- stack(paste0("data/",x[j],y[i],".tif"))
    ger[[j]] <- crop(ger[[j]], nrw)
    ger[[j]] <- mask(ger[[j]], nrw)
    print(ger[[j]])
  }
  ger2 <-mosaic(ger[[1]],ger[[2]],ger[[3]],ger[[4]],ger[[5]],rest1[[i]],  
                fun=max, rm.na=T)
  print(ger2)
  ger_fin[[i]] <- ger2
}

#save loop result for each year
ger12 <- stack(ger_fin) #2012
ger18 <- stack(ger_fin) #2018

plot(ger18[[3]], main="NRW red band august 2018")

The data of the NRW forest is provided by the German “Umweltbundesamt” as Corine data. The data is available as vector oder raster at the Copernicus website of the EU: (https://land.copernicus.eu/pan-european/corine-land-cover/clc2018/fetch-land-file?hash=25415838bad86fe986648fb7545d1643401650b7). Here we are using the raster data due to the size.

#corine data
corine12 <- raster("data/CLC2018_CLC2012_V2018_20.tif")
corine18 <- raster("data/CLC2018_CLC2018_V2018_20.tif")
corine <- list(corine12, corine18)
nrw <- st_transform(nrw, crs="+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000   
                  +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 ")

corine_cut <- list()

for (i in seq(corine)) {
  corine_cut[[i]] <- crop(corine[[i]], nrw)
  corine_cut[[i]] <- mask(corine_cut[[i]], nrw)
}

plot(corine_cut[[1]], breaks = c(100, 200,300,400,500,600), col=terrain.colors(5),  
     main="NRW CORINE 2012")

Processing raster data

Now we can calculate the NDVI with the satellite data: (NIR-red)/(NIR+red)
There are clouds and shades on the satellite picture, but this will be exclude by the index. Clouds have a very low NDVI (Roerink et al., 2000 [DOI:10.1080/014311600209814]) and for a high density vegetation such as forest, the NDVI should be at least 0.3 (Sahebjalal and Dashtekian, 2013 [DOI:10.5897/AJAR11.1825]). The threshold of 0.3 for vegetation is also used in other studies (Gandhi et al., 2015 [DOI:10.1016/j.procs.2015.07.415]). Shade will influence the intensity of red as well as NIR, so the difference stays the same.

ndvi12 <- (ger12$layer.4-ger12$layer.3)/(ger12$layer.4+ger12$layer.3)
ndvi18 <- (ger18$layer.4-ger18$layer.3)/(ger18$layer.4+ger18$layer.3)

plot(ndvi12, main="NDVI 2018")

The corine layer includes all land uses, so the forest data has to be selected. LABEL3 will show in the legend which CLC_CODE is for forest.

liste <- read.csv("data/clc2000legend.csv")

liste[str_detect(liste$LABEL3, "forest"),] #all rows with the word "forest"

##    GRID_CODE LEVEL1 LEVEL2 LEVEL3 CLC_CODE                        LABEL1
## 22        22      2      4      4      244            Agricultural areas
## 23        23      3      1      1      311 Forest and semi natural areas
## 24        24      3      1      2      312 Forest and semi natural areas
## 25        25      3      1      3      313 Forest and semi natural areas
##                              LABEL2              LABEL3         RGB
## 22 Heterogeneous agricultural areas Agro-forestry areas 242-204-166
## 23                          Forests Broad-leaved forest 128-255-000
## 24                          Forests   Coniferous forest 000-166-000
## 25                          Forests        Mixed forest 077-255-000

Then we can use the CLC_CODE`s to create a seperate layer for forest and crop it again.

wald12 <- corine_cut[[1]] %in% c(311:312, 244)
wald12 <- mask(wald12, nrw)

wald18 <- corine_cut[[2]] %in% c(311:312, 244)
wald18 <- mask(wald18, nrw)

plot(wald12, main="Forest in NRW 2012")

For comparing the fitness of the forest between 2012 and 2018 it is better to use areas where we have forest in both years. Therefore we change the values of our layers for exlcuding non forest cells. Every cell,which is not TRUE , is NA. Then we can mask one layer.

unique(values(wald12)) #getting every cell value once

## [1]    NA FALSE  TRUE

wald12[wald12!= TRUE] <- NA
wald18[wald18!= TRUE] <- NA

wald <- mask(wald12, wald18)

The forest layer has a different resolution and crs than the NDVI data. To calculate with both, we have to equal them. With projectRaster() the first layer will be transformed to the resolution and crs of the second layer.

ndvi12 <- projectRaster(ndvi12, wald)
ndvi18 <- projectRaster(ndvi18, wald)

For combining the forest with the NDVI data, the NDVI layer can be masked by the forest layer. NDVI values with <0.3 will be exclude as well, because there is probably no vegetation or the cell is to much influenced by clouds.

ndvi12 <- mask(ndvi12,wald)
ndvi12[ndvi12<0.3] <- NA   #excluding all cells with no vegetation data


ndvi18 <- mask(ndvi18,wald)
ndvi18[ndvi18<0.3] <- NA

ndvi12

## class      : RasterLayer 
## dimensions : 2404, 2526, 6072504  (nrow, ncol, ncell)
## resolution : 100, 100  (x, y)
## extent     : 4031300, 4283900, 3029600, 3270000  (xmin, xmax, ymin, ymax)
## crs        : +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : 0.3000002, 1  (min, max)

plot(ndvi12, main="NDVI of forest vegetation 2012")
plot(nrw$geometry, add=T)

Data Analysing

So now we can compare 2012 and 2018. The higher the NDVI the fitter the vegetation or a higher cover of vegetation. We can assume that the cover has not changed between this 6 years, so if we subtract 2018 from 2012 and the value is negative, the vegetation was less fit.

#creating data table with mean and standard error of 2012 and 2018
a <- round(cellStats(ndvi18, stat='mean', na.rm=TRUE), 4)  #4 numbers after 0
b <- round(cellStats(ndvi18, stat='sd', na.rm=TRUE),4)
c <- round(cellStats(ndvi12, stat='mean', na.rm=TRUE),4)
d <- round(cellStats(ndvi12, stat='sd', na.rm=TRUE),4)

stat <- cbind("stat"=c("mean", "sd"), "2012"=c(c,d), "2018"=c(a,b))
stat

##      stat   2012     2018    
## [1,] "mean" "0.4099" "0.3992"
## [2,] "sd"   "0.0749" "0.0815"

veg <- ndvi18- ndvi12
veg <- mask(veg, nrw)

Now we know that the mean of 2018 is lower than 2012, but to prove that the drought had a significant effect we have to do some tests.

var.test(values(ndvi12), values(ndvi18)) #test homogen or not

## 
##  F test to compare two variances
## 
## data:  values(ndvi12) and values(ndvi18)
## F = 0.84569, num df = 119040, denom df = 107090, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.8358753 0.8556211
## sample estimates:
## ratio of variances 
##          0.8456921

hist(ndvi12) #test normal distributed or not

The data is not homogen, because the p-value of the var.test() was <0.05 and the histogram shows that our data is not normally distributed. So we need an non-parametric test like Kuskal-Wallis test.

kruskal.test(values(ndvi12), values(ndvi18))

## 
##  Kruskal-Wallis rank sum test
## 
## data:  values(ndvi12) and values(ndvi18)
## Kruskal-Wallis chi-squared = 68088, df = 68101, p-value = 0.5139

The p-value is >0.05, that means the values of 2018 is not signigicant lower than 2012 and the the forest is maybe less fit, but not significant, after the drought of 2018.

Even if 2018 is not significantly dryer than 2012, it is possible that there are spatial trends. For more tests we need to transform the raster file to a data table, including x and y coordinates.

table <- as.data.frame(veg, xy=T, na.rm=T) #without NA cells
head(table)

##              x       y      layer
## 193825 4216150 3262350 0.19267358
## 193826 4216250 3262350 0.14751797
## 196351 4216150 3262250 0.15323382
## 196792 4260250 3262250 0.07499711
## 204360 4259250 3261950 0.01996919
## 206889 4259550 3261850 0.07995936

length(table$x)

## [1] 68274

The length of the table is 68274, for each pixel one value. The resolution, the size of each pixel, is 100 x 100 m (displayed above in the layer information). That implies the forest area in NRW, we are investigating, has the size of 6,827,400 m\(^{2}\).
With the data table we can create a generalized linear model (glm) and if the Pr-value is <0.05 the trend is significant.

M1 <- glm(layer~x, data=table) #longitude
M1 <- coefficients(M1) #getting coeff. for trend line
summary(glm(layer~x, data=table))

## 
## Call:
## glm(formula = layer ~ x, data = table)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.67108  -0.03038   0.00311   0.03010   0.72268  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.420e+00  2.274e-02  -62.47   <2e-16 ***
## x            3.397e-07  5.508e-09   61.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.006668145)
## 
##     Null deviance: 480.61  on 68273  degrees of freedom
## Residual deviance: 455.25  on 68272  degrees of freedom
## AIC: -148324
## 
## Number of Fisher Scoring iterations: 2

M2 <- glm(layer~y, data=table) #latitude
M2 <- coefficients(M2)
summary(glm(layer~y, data=table))

## 
## Call:
## glm(formula = layer ~ y, data = table)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.68366  -0.03471   0.00933   0.03396   0.74355  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.185e+00  1.874e-02  -63.24   <2e-16 ***
## y            3.727e-07  5.984e-09   62.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.006661265)
## 
##     Null deviance: 480.61  on 68273  degrees of freedom
## Residual deviance: 454.78  on 68272  degrees of freedom
## AIC: -148394
## 
## Number of Fisher Scoring iterations: 2

Both trends are significant. That means the forests got dryer in the west than east and dryer in the south than in the north between 2012 and 2018.

Results

The forests of NRW are not influenced significantly due to the drought of 2018, but it is possible that the damage will be significant after the summer of 2019, because this year was also very dry. Furthermore it seems that the forest in the south west was more vurnable to the heat as in the noth east of NRW. This trend should be analysed more (considering topography, type of forest etc.) for further information and possible prevention for coming periods of hot weather.

plot(veg, breaks=c(-0.5,-0.25,0,0.25,0.5),main="NRW Forest NDVI change between 2012 and 2018", 
     col=c("red", "orange", "dodgerblue", "darkblue"))
plot(nrw$geometry, add=T)

ggplot(data=table, aes(x=x, y=layer))+
  stat_summary(fun.y="mean", geom = "bar", position = "dodge") +
  xlab("Longitude") + 
  ylab("NDVI change")+
  ylim(-0.2,0.4)+
  ggtitle("NRW Forest NDVI change between 2012 and 2018 depending on Longitude" )+
  geom_abline(intercept = M1[1], slope = M1[2])

ggplot(data=table, aes(x=y, y=layer))+
  stat_summary(fun.y="mean", geom = "bar", position = "dodge") +                      
  xlab("Latitude") + 
  ylab("NDVI change")+
  ylim(-0.2,0.2)+
  geom_abline(intercept = M2[1], slope = M2[2])+
  ggtitle("NRW Forest NDVI change between 2012 and 2018 depending on Latitude")+
  coord_flip()

Build with 3.5.1