GEOG2021_UNUK Procedures

Preparing R for Use

Start R or R Studio and enter the following to see what packages are loaded in your R session. An R package is a set of libraries and dependencies which you can get from package repositories. Users install one or more packages from the internet and, once it has been downloaded and installed to your local computer, you must then import it as a library into R.

(.packages())

Unless you have downloaded and installed packages previously, chances are that you won’t have access to many of the libraries needed to perform the analyses we will examine today. The following code can be used to download and install each of the required packages in one simple step.

install.packages(c("data.table", "broom", "dplyr", "zoo", "sandwich", "MASS", "quadprog", "tseries", "strucchange", "fracdiff", "forecast", "iterators", "codetools", "foreach", "bfast", "raster", "sf"), repos='http://cran.us.r-project.org')

Alternatively, you can download and install packages one at at time by following this example for the data.table package:

install.packages("data.table", repos='http://cran.us.r-project.org')

Now that the required packages have been installed, you must load them in the current R session to access the capabilities, code, and documentation that each contains. You can load multiple libraries at once by creating a list of the desired libraries, then load them all at once with “lapply”.

my_packages <- c("data.table", "broom", "dplyr", "zoo", "sandwich", "MASS", "quadprog", "tseries", "strucchange", "fracdiff", "forecast", "iterators", "codetools", "foreach", "bfast", "raster", "sf")

lapply(my_packages, library, character.only = TRUE)

Extract Multivalues to Points

Next, extract each vegetation index (VI) value for the preferred landuse/landcover type to their respective centroids. This process collects the time series of VI values for each pixel in a format that we can input into the BFAST algorithm.

First, make sure the following libraries are loaded into your current R session:

library(raster)
library(sf)

Next, set the working directory to the folder that contains your satellite images to create a list of of the available images, then use the list to create a raster stack.

setwd("D:/Research/Projects/BFAST_2001_2025/MOD13Q1/NDVI")
f <- list.files(pattern = "tif$")
ras <- lapply(f, raster)
rstack <- stack(ras)

Finally, read in a shapefile containing the centroids to analyze, extract each of the underlying cell values from the raster stack, and write the output to a CSV file.

setwd("D:/Research/Projects/BFAST_2001_2025/BFAST/tiles")
tilepts <- st_read("Centroids_462.shp")  #
start_time <- Sys.time()
ext <- extract(rstack, tilepts)
write.table(ext, "D:/Research/Projects/BFAST_2001_2025/BFAST/extract/Centroids_462.csv", sep=",", col.names=FALSE)

Importing data into R, and its analysis, can be a memory-intensive operation. To alleviate potential memory problems that might slow down the system, I recommend clearing any objects in memory before moving on to the next step.

rm(list=ls())

Check for NA Values in the Time Series

The BFAST algorithm does not tolerate any missing values in a time series. Despite using a 16-day maximum value composite to calculate the VI, occasionally NA values can still be recorded for a given date (e.g., clouds).

If needed, load the two libraries required for this process:

library(raster)
library(sf)

For the CSV file you just created, let’s check to see if it contains any NA values. Set the working director to the folder containing the CSV file and modify the file variable to be the CSV file name.

setwd("D:/Research/Projects/BFAST_2001_2025/BFAST/extract")
file = "Centroids_463.csv"
x <- read.csv(file = file,header=FALSE, sep=",", na.strings=c(NA, "NA", ""))
na_count <- sapply(x, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)
sum_na1 <- sum(na_count)
data <- t(na.approx(t(x)))
na_count2 <- sapply(data, function(y) sum(length(which(is.na(y)))))
na_count2 <- data.frame(na_count2)
sum_na2 <- sum(na_count2)
print(sum_na1)
print(sum_na2)

If the last number produced by print(sum_na2) is 0, then there are no NA values. You can skip the next code block and move on to the BFAST Analysis section.

However, if the last number is greater than 0, run the following code to replace any NA values with the mean of valid values surrounding the NA.

if(sum_na2 > 0){
data <- t(na.locf(t(data), fromLast = FALSE))
na_count3 <- sapply(data, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count3)
sum_na3 <- sum(na_count3)
print(sum_na3)
}

If you needed to run the code block above to address NA values, then you will also need to write a new output CSV file with the modified VI value. Note that the recommended output file name for the centroids indicates that NA values have been “filled”.

write.table(data, file = paste(file_path_sans_ext(file), "_filled.csv"), row.names = FALSE, col.names = FALSE, sep = ",")

Once again, we’ll remove any objects in R to clear memory.

rm(list=ls())

BFAST Analysis

Now that we have a CSV file containing complete time series VI values for a set of image pixels, we can finally use BFAST to perform a temporal decomposition. BFAST takes the time series of values and extracts the underlying seasonal, trend, and “noise” components from the signal. You can learn more about the application of BFAST by reading this paper by Hutchinson et al., 2015.

As we have done at the beginning of each processing step, make sure that the following libraries are loaded in your current R session:

library(zoo)
library(sandwich)
library(MASS)
library(quadprog)
library(tseries)
library(strucchange)
library(fracdiff)
library(forecast)
library(iterators)
library(codetools)
library(foreach)
library(bfast)

Next, read in the data file - the CSV file you extracted (and likely filled) earlier. Be sure to modify the code below to reflect the path where the data file can be found, as well as the filename and datafile variables to specify the correct name of your data file.

setwd("D:/Research/Projects/BFAST_2001_2025/BFAST/extract/")
filename <- "Centroids_462"
datafile <- "Centroids_462 _fill.csv"  #Note the space before _fill.csv
inidata<-read.table(datafile, header=FALSE, sep = ",", dec = ".")
mdata<-as.matrix(inidata)
tpdata<-mdata
vmax<-dim(mdata)
vmax[1]
vmax[2]

THe following code implements BFAST. Note that this processing step can take a long time as it iterates across each row of your datafile (i.e., a pixel with all VI values in the time series). If you’d rather note see the VI and BFAST output plots, you can comment out the two “plot” lines in the code below.

Note in the code below that it specifies an annual time series beginning in 2001 with 23 images in each year.

for(count in 1:vmax[1]){
poly_id<-tpdata[count,1]  
ndvi<-tpdata[count,2:vmax[2]] 
plot(ndvi)
tsdata<-ts(ndvi,frequency=23,start=c(2001,1))
dim(tsdata)<-NULL
rdist<-23/length(tsdata)
fits<-bfast(tsdata,h=rdist,season="harmonic",max.iter=1)
plot(fits)
fits2<-fits$Time
ts_trend_break_time<-t(fits2[1])
fits3<-fits$Magnitude
ts_trend_break_magnitude<-t(fits3[1])
fits4<-fits$output
fits4a<-fits4[[1]]$Vt.bp
fits4adata<-as.matrix(fits4a)
fits4amax<-dim(fits4adata)
ts_trend_nbbreak<-t(fits4amax[1])
results1<-ts_trend_break_time
aLine<-t(c(poly_id,results1))
write.table(aLine, file=paste0(filename, "_trend_breaks_time.txt"), append=TRUE,quote=FALSE,sep=",", eol="\n",na="NA", dec=".",row.names=FALSE,col.names=FALSE,qmethod=c("escape","double"))
results2<-ts_trend_break_magnitude
aLine<-t(c(poly_id,results2))
write.table(aLine,file=paste0(filename, "_trend_breaks_magnitude.txt"), append=TRUE,quote=FALSE, sep=",",eol="\n", na="NA", dec=".",row.names=FALSE,col.names=FALSE,qmethod=c("escape","double"))
results3<-ts_trend_nbbreak
aLine<-t(c(poly_id,results3))
write.table(aLine,file=paste0(filename, "_trend_nbbreaks.txt"),append=TRUE,quote=FALSE,sep=",", eol="\n",na="NA", de=".", ,row.names=FALSE,col.names=FALSE,qmethod=c("escape","double"))
fits4b<-fits4[[1]]$Tt
results4<-fits4b
aLine<-t(c(poly_id,results4))
write.table(aLine,file=paste0(filename, "_trend_bfast.txt"),append=TRUE,quote=FALSE, sep=",",eol="\n",na="NA",dec=".", row.names=FALSE,col.names=FALSE,qmethod=c("escape","double"))
fits4c<-fits4[[1]]$Wt.bp
fits4cdata<-as.matrix(fits4c)
fits4cmax<-dim(fits4cdata)
ts_season_nbbreak<-t(fits4cmax[1])
results5<-ts_season_nbbreak
aLine<-t(c(poly_id,results5))
write.table(aLine,file=paste0(filename, "_season_nbbreaks.txt"),append=TRUE,quote=FALSE, sep=",",eol="\n", na="NA", de=".", row.names=FALSE,col.names=FALSE,qmethod=c("escape","double"))
ts_season_breaks_time<-t(fits4cdata)
results6<- ts_season_breaks_time
aLine<- t(c(poly_id,results6))
write.table(aLine,file=paste0(filename, "_season_breaks_time.txt"),append=TRUE,quote=FALSE,sep=",", eol="\n",na="NA", de=".",row.names=FALSE,col.names=FALSE,qmethod=c("escape","double"))
fits4d<-fits4[[1]]$St
results7<-fits4d
aLine<-t(c(poly_id,results7))
write.table(aLine,file=paste0(filename, "_season_bfast.txt"),append=TRUE,quote=FALSE,sep=",",eol="\n", na="NA",dec=".", row.names=FALSE,col.names=FALSE,qmethod=c("escape","double"))}

Notice that the code snippet above produces seven total output files (e.g., “_trend_bfast.txt) where the centroid file name is appended to the beginning of the output file name. It is these files that contain the BFAST results.

As before , also remove active objects to clear memory.

rm(list=ls())

Aggregate BFAST Results

As a final processing step in R, we will combine each of the seven output files produced before into a single file. In addition, a Student’s t test will be applied to determine the direction and statistical significance of any detected trends.

Make sure the following libraries are loaded in your current R session:

library(data.table)
library(broom)
library(dplyr)

In the code below, be sure to edit the setwd, tile, and outfolder variables to your situtation.

setwd("D:/Research/Projects/BFAST_2001_2025/BFAST/extract")
tile = "Centroids_462"
file1 = "_trend_nbbreaks.txt"
file2 = "_trend_breaks_time.txt"
file3 = "_trend_breaks_magnitude.txt"
file4 = "_trend_bfast.txt"
outfolder = "D:/Research/Projects/BFAST_2001_2025/BFAST/results/"
outfile = "_results.csv"

#---------BFAST Number of Breaks---------
file = paste0(tile, file1)
dat <- read.csv(file = file, header=FALSE, sep=",")
setnames(dat, old=c("V1","V2"), new=c("ID","NBR"))

#---------BFAST Time of Largest Break---------
file = paste0(tile, file2)
x <- read.csv(file=file, header=FALSE, sep=",")
setnames(x, old=c("V1", "V2"), new=c("ID", "Time"))
x$Season <- ifelse(x$Time == "NA", "NA", trunc(x$Time/23))
x$Period <- ifelse(x$Time == "NA", "NA", (x$Time - x$Season * 23))
dat <- cbind(dat, x$Season)
dat <- cbind(dat, x$Period)
setnames(dat, old=c("x$Season", "x$Period"), new=c("SEASON", "PERIOD"))

#---------Correct Number of Breaks for NA Times---------
dat$PERIOD[is.na(dat$PERIOD)] <- 0
dat$SEASON[is.na(dat$SEASON)] <- 0
dat$temp <- ifelse(dat$SEASON == 0, 1, 0)
dat$NBR <- dat$NBR - dat$temp
dat <- select(dat, -temp)

#---------BFAST Magnitude of Largest Break---------
file = paste0(tile, file3)
x <- read.csv(file=file, header=FALSE, sep=",")
setnames(x, old=c("V1", "V2"), new=c("ID", "MAG"))
dat <- cbind(dat, x$MAG)
setnames(dat, old="x$MAG", new="MAG")

#---------BFAST Trend---------
file = paste0(tile, file4)
x <- read.csv(file = file, header=FALSE, sep=",")
x <- x[,2:254]
ydata <- t(x)
xdata <- seq(1,253, 1)
fit <- lm(ydata ~ xdata)
remove(x) #to save memory
df <- tidy(fit)
xdata <- df[ which(df$term == "xdata"), ]
remove(df) #to save memory
SLOPE <- xdata$estimate
PVALUE <- xdata$p.value
dat <- cbind(dat, SLOPE)
dat <- cbind(dat, PVALUE)
SIG <- ifelse(dat$PVALUE < 0.05, 1, 0)
dat <- cbind(dat, SIG)
SIGN <- ifelse(sign(dat$SLOPE) == 1, "Positive", "Negative")
dat <- cbind(dat, SIGN)
TREND <- with(dat, ifelse(SIG == 0, "Stable", ifelse(SIGN == "Positive", "Positive", "Negative")))
dat <- cbind(dat, TREND)
dat <- select(dat, -SIG)
dat <- select(dat, -SIGN)
rm(list="fit", "PVALUE", "SIG", "SIGN", "TREND", "xdata", "ydata", "SLOPE")

#---------Write Output Spreadsheet---------
write.table(dat, paste0(outfolder, tile, outfile), sep=",", col.names=NA, row.names=TRUE)
end_time <- Sys.time()
end_time - start_time

The file produced contains, in a single page, the complete results from BFAST. Because each row represents a pixel from the centroid input, this file can be joined back to the centroids to create rasters for each BFAST outcome.

For the last time, clear your memory! You can then exit R (or R Studio) and start ArcGIS Pro.

rm(list=ls())