data.tabledata.tableff package.data.tabledata.tabledata.table - Modify Datedata.table - Group By - sumdata.table - Modify type for a group of variablesdata.table - Group By Multiple Columns - sumdata.table - Sub assign by referencedata.table - setkeyleaflet for Rleaflet for Rleaflet for R - Adding Circlesleaflet for R - Plot Mean Annual Rainfall Magnitudeleaflet for R - Plot Number of Daily Rainfall Eventsleaflet for R - Cluster Geospatial Data of SwedenThe present document works as draft user manual that aims to help understand/ apply most of the examples shared during the lectures. In the following sections there is available information that will help you re-produce efficiently the results that will be followed in the presentations.
In the first lecture, we intend to work with manipulating environmental datasets using data.table. We begin with simple operations to more complex ones. There will also take place an introduction to parallel computing with R in real case studies.
In lecture 2, we will use all the information obtained from lecture 1 and use it in order to walk through an introduction to spatial visualization using a set of very interesting libraries.
The main goal of the sessions that you will attend is to work on real case studies. For this reason, you will be given a set of environmental datasets (mostly precipitation records) available from the European Climate Assessment webpage. For ease of time, all the educational dataset can be downloaded directly from the following path
Now that you have downloaded the folder with the presentation datasets, you may create a new folder, extract the downloaded files into the new folder and open a new r session. You may set the working diretory to the local file where you placed the downloaded data. That was it! You are ready to follow the presentation steps.
In Lecture 1, we make a short introduction to large - data management using the very well-known library data.table. All participants should take the following installation steps in order to re-produce the examples of Lecture 1.
data.tabledata.table.data.tableIn this section we present a fast and user friendly way to import delimited files in R using fread.
# Load library
library(data.table)
# Define Path of the available data
dataPath <- "D:/ECA Data/"
# Read dataset with default settings
data <- fread(paste0(dataPath,"3SwedishStationsData.csv"))
# Read dataset after specifying the existence of headers
data <- fread(paste0(dataPath,"3SwedishStationsData.csv"), header=TRUE)
# Read dataset and define separator
data <- fread(paste0(dataPath,"3SwedishStationsData.csv"), header=TRUE, sep=",")
# Read specific number of rows of the dataset (-1 means all rows)
data <- fread(paste0(dataPath,"3SwedishStationsData.csv"), header=TRUE, sep=",",
nrows=10)
# Read but select specific rows (select date and records from station 1 )
data <- fread(paste0(dataPath,"3SwedishStationsData.csv"), header=TRUE,
sep=",", nrows=10, select=c("DATE","RR_STATION_1"))
# If you want to have a look at the structure of the data
str(data)
When loading an external dataset in R, it is advised to identify all the available variables of the dataset as.character and later to modify them accordingly. This will prevent R from identifying the variables type automatically itself.
# Load library
library(data.table)
# Define path of the data
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load data
dataset <- read.table(paste0(dataPath,"RR_STAID000001.txt") ,skip=20, sep=",",header=T,
stringsAsFactors = FALSE, colClasses = "character")
# Make dataset as data.table
dataset <- data.table(dataset)
str(dataset)
ff package.The ff package is able to load dataset on disk and use fast functions for access.
library(ff)
dataPath <- "D:/ECA Data/"
data <- read.csv.ffdf(file=paste0(dataPath,"totalData.csv"), header=T, VERBOSE=T)
You may also find a very interesting article on how to import datasets in R through www.r-bloggers.com.
data.tableA copy() may be required when doing dt_names = names(DT). Due to R’s copy-on-modify, dt_names still points to the same location in memory as names(DT). Therefore modifying DT by reference now, say by adding a new column, dt_names will also get updated. To avoid this, one has to explicitly copy: dt_names <- copy(names(DT)).
data1 <- copy(data)
data.tableSubset is a very important operation as it is used often when it comes to perform a data manipulation procedure. For example, it often to exclude missing variables using multiple criteria.
# Load library
library(data.table)
# Read csv data
"D:/ECA Data/"
dataset<-fread(paste0(dataPath,"3SwedishStationsData.csv"),colClasses = "character")
# Apply data.table operation
dataset <- dataset[, lapply(.SD, as.numeric), .SDcols=c("RR_STATION_1","RR_STATION_2","RR_STATION_3"),
by=c("DATE")]
# Subset by year
dataset[year(DATE)==2008]
# Subset by month
dataset[month(DATE)==1]
# Subset dataset where non-missing values exist
dataset[!is.na(RR_STATION_1) & RR_STATION_2 & RR_STATION_3]
# Subset and return specific columns
dataset[!is.na(RR_STATION_1) & RR_STATION_2 & RR_STATION_3, RR_STATION_1]
data.table - Modify Date# Load library
library(data.table)
# Define path of the data
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load data
dataset <- read.table(paste0(dataPath,"RR_STAID000001.txt") ,skip=20, sep=",",header=T,
stringsAsFactors = FALSE, colClasses = "character")
# Make dataset as data.table
dataset <- data.table(dataset)
# Modify date format
dataset[,DATE:=as.Date(as.character(DATE), format("%Y%m%d"))]
data.table - Group By - sumdataset[,RR:= as.numeric(RR)]
dataset[RR != -9999 ,sum(RR), by=month(DATE)]
data.table - Modify type for a group of variables# Load library
library(data.table)
# Read csv data
dataset<-fread(paste0(dataPath,"3SwedishStationsData.csv"),colClasses = "character")
# Apply data.table operation
dataset <- dataset[, lapply(.SD, as.numeric), .SDcols=c("RR_STATION_1","RR_STATION_2","RR_STATION_3"),
by=c("DATE")]
str(dataset)
data.table - Group By Multiple Columns - sumCalculate the monthly precipitation excluding missing records:
# Load library
library(data.table)
# Read csv data
dataset<-fread(paste0(dataPath,"3SwedishStationsData.csv"),colClasses = "character")
# Apply data.table operation
dataset <- dataset[, lapply(.SD, as.numeric), .SDcols=c("RR_STATION_1","RR_STATION_2","RR_STATION_3"),
by=c("DATE")]
monthlyMean <- dataset[RR_STATION_1 != -9999 | RR_STATION_2 != -9999 | RR_STATION_3 != -9999,
lapply(.SD, sum, na.rm=T), by=month(DATE)]
head(monthlyMean)
# Create a data.table of 100 columns and 10 rows
testDataset <- data.table(matrix(rnorm(1000, mean=10, sd=1), ncol=100))
# Create a gourping factor variable
testDataset[,GroupingFactor:= sample(x = c("A","B","C"),nrow(testDataset), replace = T)]
# Find the sum of each column by grouping factor
testDataset[, lapply(.SD, sum, na.rm=T), by=GroupingFactor]
If you want to make this operation to a specific set of columns:
# Sum of the first 10 columns by grouping factor
testDataset[, lapply(.SD, sum, na.rm=T), .SDcols=paste0("V",1:10), by=GroupingFactor]
Can you calculate the monthly standard deviation of raindall also?
data.table - Sub assign by referenceAn advanced sub-assign operation would be assigning missing values to station_2 and station_3 in cases where station_1 has missing records too.
# Load library
library(data.table)
# Read csv data
dataset<-fread(paste0(dataPath,"3SwedishStationsData.csv"),colClasses = "character")
# Apply data.table operation
dataset <- dataset[, lapply(.SD, as.numeric), .SDcols=c("RR_STATION_1","RR_STATION_2","RR_STATION_3"),
by=c("DATE")]
dataset[is.na(RR_STATION_1), c("RR_STATION_2","RR_STATION_3"):=NA]
data.table - setkeyThe setkey operation is used to sort a data.table. For example:
# Sorts data.table by date
setkey(dataset,DATE)
You may also sort the dataset according to multiple columns. The setkey operation is used to in order to perform data.table joins. It is also easy to learn by example:
example(data.table)
In this section the following packages must be installed:
install.packages("foreach",dependencies = TRUE)) # Install foreach packageinstall.packages("doParallel",dependencies = TRUE)) # Install doParallel packageExample: Create a function that produces a matrix with elements coming from the Normal distribution and calculate the sum of all elements
sumMatrix <- function(n) { return(sum(matrix(rnorm(n)))) }
Create a for loop that will execute this computation for 1 up to 50 matrices with the respective dimension:
x <- list()
system.time({
for (i in 1:50){
x[[i]] <- sumMatrix(i)
}
result <- unlist(x)
})
Question: Can we reproduce this example with lapply?
What if we had to calculate sum of all elements of 50000 matrices?
system.time(results <-unlist(lapply(1:50000, sumMatrix)))
A parallel version of the for loop operation:
library(foreach)
library(doParallel)
cores <- 3 # Define number of cores
cl<-makeCluster(cores) # Create cores
registerDoParallel(cl) # Register cores
system.time(res <- foreach(i=1:50000) %dopar% sumMatrix(i)) # Parallelize for loop
stopCluster(cl) # Stop cluster
We will try to optimize the computation time of the daily rainfall trend for all the available stations of a country. We begin with the sequential approach:
system.time({
library(data.table)
library(stringr)
library(leaflet)
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote="", colClasses="character"))
# Subset greek meteorological stations
stations <- tempMapping[CN=="SE"]
stations[,STAID:=gsub(" ","",STAID)]
# Return greek stations
stations <- stations[,STAID]
# Load each one meteorological station
stationList <- list()
for (idx in stations) {
stationCodec <- str_pad(idx, pad = 0, width = 6, side = "left") # Create station code
# Create station full name of dataset
stationDataName <- paste0("RR_STAID",stationCodec,".txt")
# Import dataset
stationData <- data.table(read.table(paste0(dataPath,stationDataName), skip=20, sep=",",
header=T, stringsAsFactors = FALSE, colClasses = "character"))
stationData[,STAID:=gsub(" ","",STAID)]
# Make RR and Q_RR as numeric
stationData[,c("RR","Q_RR"):=lapply(.SD, as.numeric), .SDcols=c("RR","Q_RR")]
# Drop missing records
stationData <- stationData[RR!=-9999]
# Manipulate Date
stationData[,DATE:=as.Date(as.character(DATE), format("%Y%m%d"))]
setkey(stationData, DATE)
# Calculate Mean Annual Rainfall Amount
stationData[,DailyTrend:=1:nrow(stationData)]
results <- summary(lm(formula = RR~DailyTrend, data = stationData))
stationList[[idx]] <- c(idx,results$coefficients[2])
}
totalTrends<-do.call(rbind,stationList)
totalTrends <- data.table(totalTrends)
setnames(totalTrends, c("STAID","Slope"))
totalTrends[,Slope:=as.numeric(Slope)]
})
Elapsed computation time 184 seconds. The following code implements the parallelized approach:
library(foreach)
library(doParallel)
library(data.table)
library(stringr)
cores <- 3 # Define number of cores
cl<-makeCluster(cores) # Create cores
registerDoParallel(cl) # Register cores
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote="", colClasses="character"))
# Subset greek meteorological stations
stations <- tempMapping[CN=="SE"]
stations[,STAID:=gsub(" ","",STAID)]
# Return greek stations
stations <- stations[,STAID]
dailyTrendFunction <- function(idx) {
library(data.table)
library(stringr)
stationCodec <- str_pad(idx, pad = 0, width = 6, side = "left") # Create station code
# Create station full name of dataset
stationDataName <- paste0("RR_STAID",stationCodec,".txt")
# Import dataset
stationData <- data.table(read.table(paste0(dataPath,stationDataName), skip=20, sep=",",
header=T, stringsAsFactors = FALSE, colClasses = "character"))
stationData[,STAID:=gsub(" ","",STAID)]
# Make RR and Q_RR as numeric
stationData[,c("RR","Q_RR"):=lapply(.SD, as.numeric), .SDcols=c("RR","Q_RR")]
# Drop missing records
stationData <- stationData[RR!=-9999]
# Manipulate Date
stationData[,DATE:=as.Date(as.character(DATE), format("%Y%m%d"))]
setkey(stationData, DATE)
# Calculate Mean Annual Rainfall Amount
stationData[,DailyTrend:=1:nrow(stationData)]
results <- summary(lm(formula = RR~DailyTrend, data = stationData))
stationList <- c(idx,results$coefficients[2])
return(stationList)
}
# Perform parallelization algorithm
system.time(res <- foreach(idx=stations) %dopar% dailyTrendFunction(idx)) # Parallelize for loop
# Stop cluster
stopCluster(cl)
Elapsed computation time 92 seconds.
First, we begin with the sequential calculation of dailly rainfall correlation between all the available greek stations (26 stations).
library(data.table)
library(stringr)
library(corrplot)
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote="", colClasses="character"))
# Subset greek meteorological stations
stations <- tempMapping[CN=="GR"]
stations[,STAID:=gsub(" ","",STAID)]
# Return greek stations
stations <- stations[,STAID]
# Load each one meteorological station
stationList <- list()
for (idx in stations) {
stationCodec <- str_pad(idx, pad = 0, width = 6, side = "left") # Create station code
# Create station full name of dataset
stationDataName <- paste0("RR_STAID",stationCodec,".txt")
# Import dataset
stationData <- data.table(read.table(paste0(dataPath,stationDataName), skip=20, sep=",",
header=T, stringsAsFactors = FALSE, colClasses = "character"))
stationData[,STAID:=gsub(" ","",STAID)]
# Make RR and Q_RR as numeric
stationData[,c("RR","Q_RR"):=lapply(.SD, as.numeric), .SDcols=c("RR","Q_RR")]
# Drop missing records
stationData <- stationData[RR!=-9999]
# Manipulate Date
stationData[,DATE:=as.Date(as.character(DATE), format("%Y%m%d"))]
# Import station data into a list
stationList[[idx]] <- stationData
}
# Combine all datasets into one
totalData<-do.call(rbind,stationList)
# Make dataset data.table
totalData <- data.table(totalData)
# Convert rainfall series to numeric
totalData[,RR:=as.numeric(RR)]
# save results
saveRDS(totalData,paste0(dataPath,"totalData.RDS"))
# Manipulate dataset in the appropriate form
dcastDataset<-dcast.data.table(totalData,formula = "DATE ~ STAID", value.var="RR")
# Compute correlation
correlValues <- data.table(cor(dcastDataset[,2:25,with=F], method = "pearson", use="pairwise.complete.obs"))
corrplot.mixed(cor(correlValues))
However, in Greece there are only 26 meteorological stations that measure daily rainfall precipitation. The elapsed computation time took approaximately 3.45 seconds. If we were to apply the same approach to all the available stations of Sweden would the code presented just above work efficiently? In a local laptop it took more than 2 hours and the task was later killed.
We proceed step by step to the parallel computing of the correlation of all the available meterorological stations in Finland:
library(data.table)
library(stringr)
library(corrplot)
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote="", colClasses="character"))
# Subset greek meteorological stations
stations <- tempMapping[CN=="FI"]
stations[,STAID:=gsub(" ","",STAID)]
# Return greek stations
stations <- stations[,STAID]
# Load each one meteorological station
stationList <- list()
for (idx in stations) {
stationCodec <- str_pad(idx, pad = 0, width = 6, side = "left") # Create station code
# Create station full name of dataset
stationDataName <- paste0("RR_STAID",stationCodec,".txt")
# Import dataset
stationData <- data.table(read.table(paste0(dataPath,stationDataName), skip=20, sep=",",
header=T, stringsAsFactors = FALSE, colClasses = "character"))
stationData[,STAID:=gsub(" ","",STAID)]
# Make RR and Q_RR as numeric
stationData[,c("RR","Q_RR"):=lapply(.SD, as.numeric), .SDcols=c("RR","Q_RR")]
# Drop missing records
stationData <- stationData[RR!=-9999]
# Manipulate Date
stationData[,DATE:=as.Date(as.character(DATE), format("%Y%m%d"))]
# Import station data into a list
stationList[[idx]] <- stationData
}
# Combine all datasets into one
totalData<-do.call(rbind,stationList)
# Make dataset data.table
totalData <- data.table(totalData)
# Convert rainfall series to numeric
totalData[,RR:=as.numeric(RR)]
# save results
# Manipulate dataset in the appropriate form
dcastDataset<-dcast.data.table(totalData,formula = "DATE ~ STAID", value.var="RR")
dcastDataset[,DATE:=NULL]
dcastDataset <-as.matrix(dcastDataset)
# Compute correlation
rm(totalData)
x<-dcastDataset
rm(dcastDataset)
library(ff)
library(data.table)
nblocks <- 40
ncols <- ceiling(ncol(x)/nblocks)
chunks <- rep(1:nblocks, each=ncols)[1:ncol(x)]
chunkMap <- data.table(Column=1:ncol(x), Chunk=chunks)
chunkCombination <- expand.grid(unique(chunks),unique(chunks))
chunkCombination <- t(apply(chunkCombination,1,sort))
chunkCombination <- unique(chunkCombination)
# corMAT <- ff(vmode = "single", dim = c(ncol(x),ncol(x)))
corMAT <- matrix(0, ncol = ncol(x), nrow = ncol(x) )
calcCorrel <- function(idx,x,chunkCombination,chunkMap,corMAT){
library(ff)
library(data.table)
group1 <- chunkMap[Chunk==chunkCombination[idx,1], Column]
group2 <- chunkMap[Chunk==chunkCombination[idx,2], Column]
cat("Chunk",chunkCombination[idx,1],"with chunk",chunkCombination[idx,2],"\n")
tempCor <- cor(x[,group1], x[,group2], use="pairwise.complete.obs")
corMAT[group1,group2] <- tempCor
corMAT[group2,group1] <- t(tempCor)
gc()
return(corMAT)
}
library(foreach)
library(doParallel)
cores <- 4 # Define number of cores
cl<-makeCluster(cores) # Create cores
registerDoParallel(cl) # Register cores
system.time(res <- foreach(idx=1:nrow(chunkCombination)) %dopar% calcCorrel(idx,x,chunkCombination,chunkMap,corMAT)) # Parallelize for loop
stopCluster(cl) # Stop cluster
In this section, the following packages must be downloaded:
install.packages("ggmap", dependencies = TRUE) # Install ggmap packageinstall.packages("stringr", dependencies = TRUE) # Install stringr packageinstall.packages("ggplot2", dependencies = TRUE) # Install ggplot2 packagelibrary(ggmap)
library(ggplot2)
# Get Map
map_Location <- get_map(location = c(lon = 28.12, lat = 36.4) , zoom = 8, scale = 2,
language = "en-EN", source = "google", maptype = "roadmap")
# Plot the points on the map
tempPlot <- ggmap(map_Location) +
geom_point(aes(x = 28.12, y = 36.4, fill = "red", alpha = 0.8),
size = 4, shape = 21) +
guides(fill=FALSE, alpha=FALSE, size=FALSE) +
xlab("Longitude") +
ylab("Latitude")
library(data.table)
library(ggplot2)
dataPath <- "./data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote=""))
# Subset greek meteorological stations
greekStations <- tempMapping[CN=="GR"]
# Manipulate longitude and latitude function
findLoc <- function(x) {
tempLoc <- str_split(x, pattern = ":") # Split character string based on ":"
# Manipulate Longitude and Latitude
x.out <- as.numeric(tempLoc[[1]][1]) + as.numeric(tempLoc[[1]][2])/60 +
as.numeric(tempLoc[[1]][3])/3600
return(x.out)
}
# Apply function in order to manipulate longitude and latitude
greekStations[,latUpd:=sapply(LAT,findLoc)][,longUpd:=sapply(LON,findLoc)]
greekStations[,LatLong:=paste0(latUpd,":",longUpd)]
# Find if there are duplicated stations as the data series are blended
greekStations <- greekStations[!duplicated(LatLong)]
# Get map
map_Location <- get_map(location = c(lon = mean(greekStations$longUpd),
lat = mean(greekStations$latUpd)) , zoom = 4, language = "en-EN",
source = "google", maptype = "roadmap")
# Plot all map points
tempPlot <- ggmap(map_Location) +
geom_point(data = greekStations, aes(x = longUpd, y = latUpd, fill = "red", alpha = 0.8),
size = 4, shape = 21, color="red") +
guides(fill=FALSE, alpha=FALSE, size=FALSE) +
xlab("Longitude") + ylab("Latitude")
ggmap - Mean Annual Rainfall Amountlibrary(data.table)
library(stringr)
library(leaflet)
library(ggmap)
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote="", colClasses="character"))
# Subset greek meteorological stations
greekStations <- tempMapping[CN=="GR"]
greekStations[,STAID:=gsub(" ","",STAID)]
# Return greek stations
greekStationsIds <- greekStations[,STAID]
# Load each one meteorological station
rainfallList <- list()
for (idx in greekStationsIds) {
stationCodec <- str_pad(idx, pad = 0, width = 6, side = "left") # Create station code
# Create station full name of dataset
stationDataName <- paste0("RR_STAID",stationCodec,".txt")
# Import dataset
stationData <- data.table(read.table(paste0(dataPath,stationDataName), skip=20, sep=",",
header=T, stringsAsFactors = FALSE, colClasses = "character"))
stationData[,STAID:=gsub(" ","",STAID)]
# Make RR and Q_RR as numeric
stationData[,c("RR","Q_RR"):=lapply(.SD, as.numeric), .SDcols=c("RR","Q_RR")]
# Drop missing records
stationData <- stationData[RR!=-9999]
# Manipulate Date
stationData[,DATE:=as.Date(as.character(DATE), format("%Y%m%d"))]
# Calculate Mean Annual Rainfall Amount
AnnualRainfall <- stationData[,list(AnnualRainfall=sum(RR)),
by=list(STAID,year(DATE))]
setnames(AnnualRainfall, "year", "Year")
rainfallList[[idx]] <- AnnualRainfall
}
# Combine all the results into one dataset:
totalAnnualRainfall <- rbindlist(rainfallList)
# Subset dataset
subsetAnnualRainfall <- totalAnnualRainfall[Year>=1955 & Year<=2004,
list(MeanAnnualRainfall=mean(AnnualRainfall)), by=c("STAID")]
# Function that mannipulates longitude and latitude
findLoc <- function(x) {
tempLoc <- str_split(x, pattern = ":")
x.out <- as.numeric(tempLoc[[1]][1]) + as.numeric(tempLoc[[1]][2])/60 +
as.numeric(tempLoc[[1]][3])/3600
return(x.out)
}
# Find out longitude and latitude for each station
subsetAnnualRainfall <- merge(greekStations,subsetAnnualRainfall, by=c("STAID"))
# Manipulate Longitude and latitude
subsetAnnualRainfall[,latUpd:=sapply(LAT,findLoc)][,longUpd:=sapply(LON,findLoc)]
subsetAnnualRainfall[,LatLong:=paste0(latUpd,":",longUpd)]
subsetAnnualRainfall[,STANAME:=gsub(" ","",STANAME)]
#Get Map
map_Location <- get_map(location = c(lon = mean(subsetAnnualRainfall$longUpd),
lat = mean(subsetAnnualRainfall$latUpd)) ,
zoom = 6, language = "en-EN",
source = "google", maptype = "roadmap")
# Plot all map points
tempPlot <- ggmap(map_Location) + geom_point(data = subsetAnnualRainfall,
aes(x = longUpd, y = latUpd, color=MeanAnnualRainfall), size=5) +
xlab("Longitude") + ylab("Latitude")
leaflet for RIn this section we introduce the leaflet package that creates javascript interactive maps. In order to be able to apply the code presented in this section you need to download and install the following packages:
install.packages(devtools, dependencies = TRUE)devtools::install_github("rstudio/leaflet")library(leaflet)
m <- leaflet() %>% setView(lng =28.1, lat = 36.4, zoom = 12)
m %>% addTiles()
library(leaflet)
m <- leaflet() %>% setView(lng =28.12, lat = 36.4, zoom = 14)
m %>% addTiles() %>% addMarkers(lng =28.12, lat = 36.4,
popup="Rhodes Meteorological Station")
leaflet for RIn this part we plot as before all the available meteorological stations of Greece and their mean annual rainfall magitude:
library(data.table)
library(stringr)
library(leaflet)
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote="", colClasses="character"))
# Subset greek meteorological stations
greekStations <- tempMapping[CN=="GR"]
greekStations[,STAID:=gsub(" ","",STAID)]
# Return greek stations
greekStationsIds <- greekStations[,STAID]
# Load each one meteorological station
rainfallList <- list()
for (idx in greekStationsIds) {
stationCodec <- str_pad(idx, pad = 0, width = 6, side = "left") # Create station code
# Create station full name of dataset
stationDataName <- paste0("RR_STAID",stationCodec,".txt")
# Import dataset
stationData <- data.table(read.table(paste0(dataPath,stationDataName), skip=20, sep=",",
header=T, stringsAsFactors = FALSE, colClasses = "character"))
stationData[,STAID:=gsub(" ","",STAID)]
# Make RR and Q_RR as numeric
stationData[,c("RR","Q_RR"):=lapply(.SD, as.numeric), .SDcols=c("RR","Q_RR")]
# Drop missing records
stationData <- stationData[RR!=-9999]
# Manipulate Date
stationData[,DATE:=as.Date(as.character(DATE), format("%Y%m%d"))]
# Calculate Mean Annual Rainfall Amount
AnnualRainfall <- stationData[,list(AnnualRainfall=sum(RR)),
by=list(STAID,year(DATE))]
setnames(AnnualRainfall, "year", "Year")
rainfallList[[idx]] <- AnnualRainfall
}
# Combine all the results into one dataset:
totalAnnualRainfall <- rbindlist(rainfallList)
# Subset dataset
subsetAnnualRainfall <- totalAnnualRainfall[Year>=1955 & Year<=2004,
list(MeanAnnualRainfall=mean(AnnualRainfall)), by=c("STAID")]
# Function that mannipulates longitude and latitude
findLoc <- function(x) {
tempLoc <- str_split(x, pattern = ":")
x.out <- as.numeric(tempLoc[[1]][1]) + as.numeric(tempLoc[[1]][2])/60 +
as.numeric(tempLoc[[1]][3])/3600
return(x.out)
}
# Find out longitude and latitude for each station
subsetAnnualRainfall <- merge(greekStations,subsetAnnualRainfall, by=c("STAID"))
# Manipulate Longitude and latitude
subsetAnnualRainfall[,latUpd:=sapply(LAT,findLoc)][,longUpd:=sapply(LON,findLoc)]
subsetAnnualRainfall[,LatLong:=paste0(latUpd,":",longUpd)]
subsetAnnualRainfall[,STANAME:=gsub(" ","",STANAME)]
stationsMap <- leaflet(data = subsetAnnualRainfall) %>% addTiles() %>%
addMarkers(~longUpd, ~latUpd, popup = ~STANAME)
stationsMap
leaflet for R - Adding CirclesstationsMap <- leaflet(data = subsetAnnualRainfall) %>% addTiles() %>%
addCircles(~longUpd, ~latUpd, popup = ~STANAME, color = c('red'))
stationsMap
leaflet for R - Plot Mean Annual Rainfall Magnitude countrymappping <- leaflet(subsetAnnualRainfall) %>%
addTiles(group = "OSM (default)") %>%
addProviderTiles("Stamen.Toner", group = "Toner") %>%
addProviderTiles("Stamen.TonerLite", group = "Toner Lite") %>%
# Overlay groups
addCircles(~longUpd, ~latUpd, ~10*MeanAnnualRainfall, stroke = F, group = "Mean Annual Rainfall", color="red") %>%
addMarkers(~longUpd, ~latUpd) %>%
addLayersControl(
baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
overlayGroups = c("Mean Annual Rainfall"),
options = layersControlOptions(collapsed = FALSE))
countrymappping
leaflet for R - Plot Number of Daily Rainfall EventsdataPath <- "D:/ECA Data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote="", colClasses="character"))
# Subset greek meteorological stations
greekStations <- tempMapping[CN=="GR"]
greekStations[,STAID:=gsub(" ","",STAID)]
# Return greek stations
greekStationsIds <- greekStations[,STAID]
# Load each one meteorological station
rainfallList <- list()
for (idx in greekStationsIds) {
stationCodec <- str_pad(idx, pad = 0, width = 6, side = "left") # Create station code
# Create station full name of dataset
stationDataName <- paste0("RR_STAID",stationCodec,".txt")
# Import dataset
stationData <- data.table(read.table(paste0(dataPath,stationDataName), skip=20, sep=",",
header=T, stringsAsFactors = FALSE, colClasses = "character"))
stationData[,STAID:=gsub(" ","",STAID)]
# Make RR and Q_RR as numeric
stationData[,c("RR","Q_RR"):=lapply(.SD, as.numeric), .SDcols=c("RR","Q_RR")]
# Drop missing records
stationData <- stationData[RR!=-9999]
# Manipulate Date
stationData[,DATE:=as.Date(as.character(DATE), format("%Y%m%d"))]
# Calculate Number of Annual Rainfall Events
AnnualRainfallEvents <- stationData[RR>0, list(NumberofRainfallEvents=.N), by=list(STAID,year(DATE))]
rainfallList[[idx]] <- AnnualRainfallEvents
}
# Combine all the results into one dataset:
totalRainfallEvents <- rbindlist(rainfallList)
setnames(totalRainfallEvents, "year", "Year")
# Subset dataset
subsetRainfallEvents <- totalRainfallEvents[Year>=1955 & Year<=2004,
list(MeanNumberofRainfallEvents=mean(NumberofRainfallEvents)), by=c("STAID")]
# Function that mannipulates longitude and latitude
findLoc <- function(x) {
tempLoc <- str_split(x, pattern = ":")
x.out <- as.numeric(tempLoc[[1]][1]) + as.numeric(tempLoc[[1]][2])/60 +
as.numeric(tempLoc[[1]][3])/3600
return(x.out)
}
# Find out longitude and latitude for each station
subsetRainfallEvents <- merge(greekStations,subsetRainfallEvents, by=c("STAID"))
# Manipulate Longitude and latitude
subsetRainfallEvents[,latUpd:=sapply(LAT,findLoc)][,longUpd:=sapply(LON,findLoc)]
subsetRainfallEvents[,LatLong:=paste0(latUpd,":",longUpd)]
#Plot mean annual number of rainfall events
countrymappping <- leaflet(subsetRainfallEvents) %>%
addTiles(group = "OSM (default)") %>%
addProviderTiles("Stamen.Toner", group = "Toner") %>%
addProviderTiles("Stamen.TonerLite", group = "Toner Lite") %>%
# Overlay groups
addCircles(~longUpd, ~latUpd, ~500*MeanNumberofRainfallEvents, stroke = F, group = "Mean Annual Rainfall", color="red") %>%
addMarkers(~longUpd, ~latUpd) %>%
addLayersControl(
baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
overlayGroups = c("Mean Annual Number of Rainfall Events"),
options = layersControlOptions(collapsed = FALSE))
countrymappping
leaflet for R - Cluster Geospatial Data of Swedenlibrary(data.table)
library(ggplot2)
library(leaflet)
library(stringr)
dataPath <- "D:/ECA Data/" # Define the path where data is located
# Load Dataset with Locations of Meteorological Stations
tempMapping <- data.table(read.table(paste0(dataPath,"/stations.txt"), skip = 16 ,sep=",",
stringsAsFactors = FALSE, header=TRUE, quote=""))
# Subset greek meteorological stations
greekStations <- tempMapping[CN=="SE"]
# Manipulate longitude and latitude function
findLoc <- function(x) {
tempLoc <- str_split(x, pattern = ":") # Split character string based on ":"
# Manipulate Longitude and Latitude
x.out <- as.numeric(tempLoc[[1]][1]) + as.numeric(tempLoc[[1]][2])/60 +
as.numeric(tempLoc[[1]][3])/3600
return(x.out)
}
# Apply function in order to manipulate longitude and latitude
greekStations[,latUpd:=sapply(LAT,findLoc)][,longUpd:=sapply(LON,findLoc)]
greekStations[,LatLong:=paste0(latUpd,":",longUpd)]
# Find if there are duplicated stations as the data series are blended
greekStations <- greekStations[!duplicated(LatLong)]
clusterChart <- leaflet(greekStations) %>% addTiles() %>% addMarkers(~longUpd, ~latUpd,
clusterOptions = markerClusterOptions()
)
clusterChart