Data cleaning and summary
Spatial mapping
Spatial Interpolation
Please note, you might need to install some packages
First things first, let’s get organized and either use your existing RStudio project from a previous week or set-up a new RStudio project following the steps below.
Click the “File” menu button (top left corner), then “New Project” Click “New Directory” Click “New Project” In “Directory name”, type the name of your project, for example “EVB203” Browse and select a folder where to locate your project (~ is your home directory). For example, a folder called “R_tutorials” in “EVB203”: ~/EVB203/R_tutorials. Click the “Create Project” button Create three sub-folders/directories in your project. Run:
dir.create("scripts")
dir.create("data")
dir.create("plots")
Or click on the New Folder button in the Files pane in the lower right quadrant of RStudio and type in the folder names.
# move data to your data folder or you can use "setwd" to tell R where your data is stored
setwd("C:/Users/N11094427/Downloads") #change this to your directory
# import the class data
mydata <- read.csv("StudentDataSet_2023.csv")
# usually, I like to check my data before I dive into analysis
nrow(mydata)
## [1] 516
summary(mydata) ### have a look on your data
## wkt_geom Time Obs Light
## Length:516 Length:516 Length:516 Min. : 8
## Class :character Class :character Class :character 1st Qu.: 2260
## Mode :character Mode :character Mode :character Median : 13527
## Mean : 34012
## 3rd Qu.: 50706
## Max. :692141
## name date author Cover
## Length:516 Length:516 Length:516 Length:516
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Humidity Long Lat Temp
## Min. :21.43 Min. :153 Min. :-27470.00 Min. :21.08
## 1st Qu.:56.17 1st Qu.:153 1st Qu.: -27.47 1st Qu.:23.18
## Median :65.16 Median :153 Median : -27.47 Median :24.86
## Mean :62.60 Mean :153 Mean : -80.65 Mean :29.73
## 3rd Qu.:72.26 3rd Qu.:153 3rd Qu.: -27.47 3rd Qu.:27.38
## Max. :87.17 Max. :153 Max. : -27.46 Max. :75.90
We can find some problems here, for example, there are some points located far away from Australia, whose Lat is -27470. We can delete those points by using their Lats.
mydata <- subset(mydata, mydata$Lat > -30.0) # by using subset, we only keep data that lattitude is greater than -30
nrow(mydata)
## [1] 515
We can create histograms to see the distributions of temperature, light, and humidity
hist(mydata$Temp)
hist(mydata$Light)
hist(mydata$Humidity)
There are some temperatures ranging from 55 to 80 celsius, this is abnormal, check the data and think why is that and what should we do? Remove them or find a way to fix it?
Now, let’s create scatterplots to compare the relationship between humidity vs temperature, also we can test their covariances
plot(x = mydata$Humidity,y = mydata$Temp,
xlab = "Humidity",
ylab = "Temperature",
main = "Humidity vs Temperature"
)
We can also test the linerity between Humidity and Temperature
Hum_Temp <- lm(Temp ~ Humidity, data = mydata)
summary(Hum_Temp) # negatively related
##
## Call:
## lm(formula = Temp ~ Humidity, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.446 -6.343 -1.431 1.489 38.179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68.18351 2.12898 32.03 <2e-16 ***
## Humidity -0.61436 0.03319 -18.51 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.62 on 513 degrees of freedom
## Multiple R-squared: 0.4004, Adjusted R-squared: 0.3992
## F-statistic: 342.6 on 1 and 513 DF, p-value: < 2.2e-16
Let’s visualize it
library(ggplot2)
library(ggpmisc)
## Loading required package: ggpp
##
## Attaching package: 'ggpp'
## The following object is masked from 'package:ggplot2':
##
## annotate
ggplot(data = mydata, aes(x = Humidity, y = Temp)) +
stat_poly_line() +
stat_poly_eq() +
geom_point()
If we can remove those abnormal temperture, it seems that we can get more accurate regression.
We can also take the four types of cover into consideration
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mydata %>%
ggplot( aes(x=Humidity, y=Temp, group=Cover, color=Cover)) +
geom_point() +
ggtitle("Humidity - Temperature") +
ylab("Temerature")
The result shows that the column of cover needs to be unified, for example, we have five groups of concrete: Concete, concrete, Concrete, Concrete(canopy), Concrete(shade)
mydata$Cover[grepl("onc", mydata$Cover, ignore.case=FALSE)] <- "Concrete"
# I used function "grepl" to replace all strings that contain "onc" to Concrete
mydata %>%
ggplot( aes(x=Humidity, y=Temp, group=Cover, color=Cover)) +
geom_point() +
ggtitle("Humidity - Temperature") +
ylab("Temerature")
Thank god, we now only have one group of concrete, you can merge other groups in your way.
ggplot(mydata, aes(x=Cover, y=Humidity)) +
geom_boxplot(color="red", fill="orange", alpha=0.2)
After all cover types been unified, your box-plot will be cleaner
# load the spatial libraries
library(sp)
library(rgdal)
library(rgeos)
library(tmap)
library(sf)
library(spatstat)
library(maptools)
library(tidyverse)
library(gstat)
# we need to use the latitude and longtitude information
mydata_WSG84 <- st_as_sf(mydata, coords = c("Long", "Lat"), crs = 4326) # give the location to your points
# to make your map more accurate, we reproject your points to projection:WGS 84 / UTM zone 56S
# in r code , it is "+proj=utm +zone=56 +south +datum=WGS84 +units=m +no_defs +type=crs"
mycrs <- paste0("+proj=utm +zone=56 +south +datum=WGS84 +units=m +no_defs +type=crs")
mydata_zone56 <- st_transform(mydata_WSG84, crs=mycrs) # give the location to your points
have a look with a openstreet map as background
tmap_mode("view")
## tmap mode set to interactive viewing
points_map <- tm_shape(mydata_zone56) + tm_dots(col = "Humidity")
points_map #looks fine
We need a boundary to mask our study area. Let’s make one.
x_coord <- c(153.015, 153.02, 153.025, 153.029, 153.035, 153.033, 153.031, 153.027, 153.025, 153.020, 153.015)
y_coord <- c(-27.460, -27.470,-27.475, -27.480, -27.480, -27.475, -27.469, -27.465, -27.465, -27.46, -27.460)
xym <- cbind(x_coord, y_coord)
boundary = st_polygon(list(xym))
boundary_WSG84 = st_sfc(boundary, crs= 4326)
boundary_zone56 <- st_transform(boundary_WSG84, crs=mycrs)
### add boundary to our map
points_map_with_boundary <- points_map + tm_shape(boundary_zone56)+ tm_borders(lwd = 2)
points_map_with_boundary
here we can use the concept of voronoi diagram: this is type of the nearest neighbor analysis # more info: https://mathworld.wolfram.com/VoronoiDiagram.html#:~:text=points%20into%20convex%20polygons%20such,known%20as%20a%20Dirichlet%20tessellation.
library(terra)
mydata_zone56_V <- vect(mydata_zone56) # we’ll need to convert mydata_zone56 to S4 objects using vect()
boundary_zone56_V <- vect(boundary_zone56) # vectlise our boundary too
v <- voronoi(mydata_zone56_V)
plot(v)
points(mydata_zone56_V)
mydata_Voronoi <- crop(v, boundary_zone56_V) #we only care about the area within our boundary
plot(mydata_Voronoi, "Humidity")
# rasterize
r <- rast(mydata_Voronoi, res=10) # Builds a blank raster of given dimensions and resolution
# the unit of our projection is meter, so res = 10, means each grid is 100 squre metre big.
vr <- rasterize(mydata_Voronoi, r, "Humidity") # we want to rasterize our result to small pixels, so that our analysis can be more accurate
plot(vr)
let’s do some interpolation
library(gstat)
d <- data.frame(geom(mydata_zone56_V)[,c("x", "y")], as.data.frame(mydata_zone56_V)) #we transfer mydata_zone56_V to dataframe
gs <- gstat(formula=Humidity~1, locations=~x+y, data=d, nmax=Inf, set=list(idp=2))
idw <- interpolate(r, gs, debug.level=0) # r is the blank raster we just built
plot(idw)
idwr <- mask(idw, vr) # crop the result with the rasterised boundary vr
plot(idwr, 1)
It would be interesting to quantify some other factors that could influnce your results, for example, the relationship of humidity and the proximity to water (river or pond). Don’t be limited to our tutorials