Reference and Disclaimer: To create this cheatsheet, I have referred online resources extensively. It is a compilation of available resources for beginners in Spatial Analytics with R.
This article is for R Begineer’s and it includes very basic R operations. Many important links are included for further practicing. I learnt R by Doing and all help is available online stackexchange.
## First specify the packages of interest
= c("raster", "reshape2","rgdal","sp","ggspatial","ggplot2","rasterVis","cowplot")
packages
## Now load or install&load all
<- lapply(
package.check
packages,FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)par(mfrow = c(1, 1))
# Setting up the working directory
setwd("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/")
# check the present working directory
getwd()
Best RMarkdown resource is the one provided by the RMarkdown itself here
We will be using inbuilt dataset for understanding the EDA process. The same can be executed in vector dataset.
data(airquality) #inbuit dataset
To check the dimension of the data
dim(airquality)
To examine the dataset type and Variable types - Here, its a dataframe with 153 obs and 6 variables.
str(airquality)
To check first few or last few rows
head(df,4) # displays first 4 rows
#tail(df,4) # displays last 4 rows
Missing Values are a big problem. Solution to this varies as per requirement and need. To check if our dataset has any missing value and in which variable, we proceed as under
#Summary of the Data
summary(airquality)
For Ozone, there are 37 NA’s and on Solar.R there are 7 NA’s. The data is missing at random and i prefer interpolation or just deleting the observations (Only if i have huge data and few missings here and there doesnt bother me much) . This depends mostly on our understanding of data and what we are expecting to do with the dataset.
Many ways to check the presence of missing data. I am just removing all NAs but again, take your own call. mice() is a nice package for missing value treatment.
colSums(is.na(airquality))
<- na.omit(airquality) df
Boxplot makes life easier to detect any outlier present in the data.
par(mfrow=c(1,2))
boxplot(airquality$Ozone ,col = "grey",main = "Boxplot",outcol="red",outpch=18,boxwex=0.7,range = 1.3)
hist(airquality$Ozone ,col = "grey",main = "Histogram", xlab = "Obs.",breaks = 15)
par(mfrow=c(1,1)) #setting it back to default one plot
We can also check for any skewness in our data
#Create function for the repetative job
<- function(x){
skew <- hist(x,col="grey",main="Skewness check",xlab=paste(substitute(x)));
histogrm <-seq(min(x),max(x),length=40)
xlab<-dnorm(xlab,mean=mean(x),sd=sd(x))
ylab<- ylab*diff(histogrm$mids[1:2])*length(x)
ylab lines(xlab, ylab, col="black", lwd=2)
}
skew(df$Ozone)
Many a times we just need part of the dataset, in such case we need to subset our data. dplyr is a wonderful package that one should know of. It does many other important data manupulation.
# the whole data frame (as a data.frame)
airquality[] 1, 6] # first element in the 6th column (as a vector)
airquality[1] # first column in the data frame (as a vector)
airquality[, 1] # first column in the data frame (as a data.frame)
airquality[1:3, 3] # first three elements in the 3rd column (as a vector)
airquality[3, ] # the 3rd row (as a data.frame)
airquality[c(1,4), ] # rows 1 and 4 only (as a data.frame)
airquality[c(1,4), c(1,3) ] # rows 1 and 4 and columns 1 and 3 (as a data.frame)
airquality[-1] # the whole data frame, excluding the first column
airquality[, -c(3:153),] # equivalent to head(airquality, 2) airquality[
Data Visualization plays a critical role in any analysis and we can play with plot() and ggplot() function to understand them better.
par(mfrow=c(1,3))
plot(x=airquality$Temp, y=airquality$Wind, main="Default is scatter plot")
plot(airquality$Temp, airquality$Wind, type="l",col = "red", main="plot type is line")# change plot type to 'line'
plot(airquality$Temp, airquality$Wind, type="b", main="plot type is line & dot")# change plot type to include both
par(mfrow=c(1,1)) #setting it back to default one plot
boxplot(Temp ~ Month, data=airquality ,col = with(airquality, boxplot(Temp ~ Month, col = c(1,2,3,4,5))),main = "Boxplot",outcol="red",outpch=18,boxwex=0.7,range = 1.3)
<-ggplot(df, aes(x=Day, y=Temp, col = as.factor(Month)))
g+ geom_point() + labs(x = "Day", y = "Temperature (°F)", title = "Basic ggplot") g
<-ggplot(df, aes(x=as.factor(Month), y=Temp, col = Temp))
g+ geom_point() + labs(x = "Year", y = "Temperature (°F)", title = "Check the default legend for continuous variable") g
<-ggplot(df, aes(x=Temp, y=Wind))
g+ geom_point(shape = as.factor(df$Month)) g
#Load the raster layer
<- raster("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/ccap_SE_FL_11class.tif")
land #Check the projection for the DEM layer - proj4string() and projection() both does the job
proj4string(land)
#Saving the projection for future use
<- projection(land) crs.land
If need be then we can re-project the way it is done here.
# Looking inside gdb
<- "D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/HR_exercise.gdb/"
fgdb # To list all the available layers inside .gdb folder
ogrListLayers(fgdb)
# Both ways work
<- readOGR("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/HR_exercise.gdb","allpoints_XYTableToPoint")
point.data #OR
<- readOGR(fgdb,"allpoints_XYTableToPoint")
point.data #Check the projection
projection(point.data)
#proj4string(land) and projection(point.data) are different, need to be in same projection
<- spTransform(point.data, crs.land)
point.data #Recheck the projection
projection(point.data)
We can inspect various aspects of the raster layer like number of cells, resolution and extent
res(land)
class(point.data)
<- point.data[point.data$IdNr==175, ]
df_1 <- point.data[point.data$IdNr==250, ] df_2
plot(land, main="Basic plot")
points(point.data, col=point.data$IdNr)
par(mfrow = c(1, 2)) # display 2 plots in a row -(1 row and 2 column)
plot(land, main="First plot")
points(df_1, col="blue")
plot(land, main="Second plot")
#If we use plot() instead of points() to add to existing raster layer, need to include add=TRUE
plot(df_2, add=TRUE, pch = 24, cex=1, col="red", bg="darkgreen", lwd=2)#play with plot() to understand functionality better
par(mfrow = c(1, 1)) # good practice is to set it back to default - display's one plot at a time
Best way to learn is to run codes and if any error then search it online. Thats how we fix error and there are many resources available online to refer to.
Resources:
Thanks!!