Reference and Disclaimer: To create this cheatsheet, I have referred online resources extensively. It is a compilation of available resources for beginners in Spatial Analytics with R.
This article is for R Begineer’s and it includes very basic R operations. Many important links are included for further practicing. I learnt R by Doing and all help is available online stackexchange.
## First specify the packages of interest
packages = c("raster", "reshape2","rgdal","sp","ggspatial","ggplot2","rasterVis","cowplot")
## Now load or install&load all
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
par(mfrow = c(1, 1))# Setting up the working directory
setwd("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/")
# check the present working directory
getwd()Best RMarkdown resource is the one provided by the RMarkdown itself here
We will be using inbuilt dataset for understanding the EDA process. The same can be executed in vector dataset.
data(airquality) #inbuit datasetTo check the dimension of the data
dim(airquality)To examine the dataset type and Variable types - Here, its a dataframe with 153 obs and 6 variables.
str(airquality)To check first few or last few rows
head(df,4) # displays first 4 rows
#tail(df,4) # displays last 4 rowsMissing Values are a big problem. Solution to this varies as per requirement and need. To check if our dataset has any missing value and in which variable, we proceed as under
#Summary of the Data
summary(airquality)For Ozone, there are 37 NA’s and on Solar.R there are 7 NA’s. The data is missing at random and i prefer interpolation or just deleting the observations (Only if i have huge data and few missings here and there doesnt bother me much) . This depends mostly on our understanding of data and what we are expecting to do with the dataset.
Many ways to check the presence of missing data. I am just removing all NAs but again, take your own call. mice() is a nice package for missing value treatment.
colSums(is.na(airquality))
df <- na.omit(airquality)Boxplot makes life easier to detect any outlier present in the data.
par(mfrow=c(1,2))
boxplot(airquality$Ozone ,col = "grey",main = "Boxplot",outcol="red",outpch=18,boxwex=0.7,range = 1.3)
hist(airquality$Ozone ,col = "grey",main = "Histogram", xlab = "Obs.",breaks = 15)par(mfrow=c(1,1)) #setting it back to default one plotWe can also check for any skewness in our data
#Create function for the repetative job
skew <- function(x){
histogrm <- hist(x,col="grey",main="Skewness check",xlab=paste(substitute(x)));
xlab<-seq(min(x),max(x),length=40)
ylab<-dnorm(xlab,mean=mean(x),sd=sd(x))
ylab <- ylab*diff(histogrm$mids[1:2])*length(x)
lines(xlab, ylab, col="black", lwd=2)
}
skew(df$Ozone)Many a times we just need part of the dataset, in such case we need to subset our data. dplyr is a wonderful package that one should know of. It does many other important data manupulation.
airquality[] # the whole data frame (as a data.frame)
airquality[1, 6] # first element in the 6th column (as a vector)
airquality[, 1] # first column in the data frame (as a vector)
airquality[1] # first column in the data frame (as a data.frame)
airquality[1:3, 3] # first three elements in the 3rd column (as a vector)
airquality[3, ] # the 3rd row (as a data.frame)
airquality[c(1,4), ] # rows 1 and 4 only (as a data.frame)
airquality[c(1,4), c(1,3) ] # rows 1 and 4 and columns 1 and 3 (as a data.frame)
airquality[, -1] # the whole data frame, excluding the first column
airquality[-c(3:153),] # equivalent to head(airquality, 2)Data Visualization plays a critical role in any analysis and we can play with plot() and ggplot() function to understand them better.
par(mfrow=c(1,3))
plot(x=airquality$Temp, y=airquality$Wind, main="Default is scatter plot")
plot(airquality$Temp, airquality$Wind, type="l",col = "red", main="plot type is line")# change plot type to 'line'
plot(airquality$Temp, airquality$Wind, type="b", main="plot type is line & dot")# change plot type to include bothpar(mfrow=c(1,1)) #setting it back to default one plotboxplot(Temp ~ Month, data=airquality ,col = with(airquality, boxplot(Temp ~ Month, col = c(1,2,3,4,5))),main = "Boxplot",outcol="red",outpch=18,boxwex=0.7,range = 1.3)g<-ggplot(df, aes(x=Day, y=Temp, col = as.factor(Month)))
g + geom_point() + labs(x = "Day", y = "Temperature (°F)", title = "Basic ggplot") g<-ggplot(df, aes(x=as.factor(Month), y=Temp, col = Temp))
g + geom_point() + labs(x = "Year", y = "Temperature (°F)", title = "Check the default legend for continuous variable") g<-ggplot(df, aes(x=Temp, y=Wind))
g + geom_point(shape = as.factor(df$Month))#Load the raster layer
land <- raster("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/ccap_SE_FL_11class.tif")
#Check the projection for the DEM layer - proj4string() and projection() both does the job
proj4string(land)
#Saving the projection for future use
crs.land <- projection(land)If need be then we can re-project the way it is done here.
# Looking inside gdb
fgdb <- "D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/HR_exercise.gdb/"
# To list all the available layers inside .gdb folder
ogrListLayers(fgdb)
# Both ways work
point.data <- readOGR("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/HR_exercise.gdb","allpoints_XYTableToPoint")
#OR
point.data <- readOGR(fgdb,"allpoints_XYTableToPoint")
#Check the projection
projection(point.data)#proj4string(land) and projection(point.data) are different, need to be in same projection
point.data <- spTransform(point.data, crs.land)
#Recheck the projection
projection(point.data)We can inspect various aspects of the raster layer like number of cells, resolution and extent
res(land)
class(point.data)df_1 <- point.data[point.data$IdNr==175, ]
df_2 <- point.data[point.data$IdNr==250, ]plot(land, main="Basic plot")
points(point.data, col=point.data$IdNr)par(mfrow = c(1, 2)) # display 2 plots in a row -(1 row and 2 column)
plot(land, main="First plot")
points(df_1, col="blue")
plot(land, main="Second plot")
#If we use plot() instead of points() to add to existing raster layer, need to include add=TRUE
plot(df_2, add=TRUE, pch = 24, cex=1, col="red", bg="darkgreen", lwd=2)#play with plot() to understand functionality betterpar(mfrow = c(1, 1)) # good practice is to set it back to default - display's one plot at a timeBest way to learn is to run codes and if any error then search it online. Thats how we fix error and there are many resources available online to refer to.
Resources:
Thanks!!