Reference and Disclaimer: To create this cheatsheet, I have referred online resources extensively. It is a compilation of available resources for beginners in Spatial Analytics with R.

This article is for R Begineer’s and it includes very basic R operations. Many important links are included for further practicing. I learnt R by Doing and all help is available online stackexchange.

Load the necessary R packages

## First specify the packages of interest
packages = c("raster", "reshape2","rgdal","sp","ggspatial","ggplot2","rasterVis","cowplot")

## Now load or install&load all
package.check <- lapply(
  packages,
  FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE)
      library(x, character.only = TRUE)
    }
  }
)
par(mfrow = c(1, 1))

Set working directory

# Setting up the working directory 
setwd("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/")
# check the present working directory
getwd()

RMarkdown Steps

  1. Open file and click R Markdown to open the .Rmd new file

  1. Next enter doc title and Author name

  1. Write codes in sky-blue code chunks and comments in white space. We can enter as many code chuncks by going to Code >> Insert Chunk

  1. We can execute each code chunk by clicking a small Run button on top right corner of each code chunk. To run the entire RMarkdown file we need to click knit in the top bar - we can either knit to pdf or knit to HTML

  1. The file is now ready to publish. knit to HTML gives an output HTML file. Click on top right Publish button.

  1. Dialog box opens up and asks if we want to publish in Rpubs. Click and open account.

  1. Provide Document name and hit Continue to publish

  1. This is how it looks on getting published.

Best RMarkdown resource is the one provided by the RMarkdown itself here

Exploratory Data Analysis

We will be using inbuilt dataset for understanding the EDA process. The same can be executed in vector dataset.

data(airquality) #inbuit dataset

To check the dimension of the data

dim(airquality)

To examine the dataset type and Variable types - Here, its a dataframe with 153 obs and 6 variables.

str(airquality)

To check first few or last few rows

head(df,4) # displays first 4 rows
#tail(df,4) # displays last 4 rows

Detecting missing values

Missing Values are a big problem. Solution to this varies as per requirement and need. To check if our dataset has any missing value and in which variable, we proceed as under

#Summary of the Data
summary(airquality)

For Ozone, there are 37 NA’s and on Solar.R there are 7 NA’s. The data is missing at random and i prefer interpolation or just deleting the observations (Only if i have huge data and few missings here and there doesnt bother me much) . This depends mostly on our understanding of data and what we are expecting to do with the dataset.

Many ways to check the presence of missing data. I am just removing all NAs but again, take your own call. mice() is a nice package for missing value treatment.

colSums(is.na(airquality))
df <- na.omit(airquality)

Detecting Outliers is easier by data visualization

Boxplot makes life easier to detect any outlier present in the data.

par(mfrow=c(1,2))
boxplot(airquality$Ozone ,col = "grey",main = "Boxplot",outcol="red",outpch=18,boxwex=0.7,range = 1.3)
hist(airquality$Ozone ,col = "grey",main = "Histogram", xlab = "Obs.",breaks = 15)

par(mfrow=c(1,1)) #setting it back to default one plot

We can also check for any skewness in our data

#Create function for the repetative job
skew <- function(x){
  histogrm <- hist(x,col="grey",main="Skewness check",xlab=paste(substitute(x)));
  xlab<-seq(min(x),max(x),length=40) 
  ylab<-dnorm(xlab,mean=mean(x),sd=sd(x)) 
  ylab <- ylab*diff(histogrm$mids[1:2])*length(x)
  lines(xlab, ylab, col="black", lwd=2)
}

skew(df$Ozone)

Subsetting dataframe

Many a times we just need part of the dataset, in such case we need to subset our data. dplyr is a wonderful package that one should know of. It does many other important data manupulation.

airquality[]       # the whole data frame (as a data.frame)
airquality[1, 6]   # first element in the 6th column (as a vector)
airquality[, 1]    # first column in the data frame (as a vector)
airquality[1]      # first column in the data frame (as a data.frame)
airquality[1:3, 3] # first three elements in the 3rd column (as a vector)
airquality[3, ]    # the 3rd row (as a data.frame)
airquality[c(1,4), ]  # rows 1 and 4 only (as a data.frame)
airquality[c(1,4), c(1,3) ] # rows 1 and 4 and columns 1 and 3 (as a data.frame)
airquality[, -1]   # the whole data frame, excluding the first column
airquality[-c(3:153),]  # equivalent to head(airquality, 2)

Play with params

Data Visualization plays a critical role in any analysis and we can play with plot() and ggplot() function to understand them better.

1. Simple plots

par(mfrow=c(1,3)) 
plot(x=airquality$Temp, y=airquality$Wind, main="Default is scatter plot")
plot(airquality$Temp, airquality$Wind, type="l",col = "red", main="plot type is line")# change plot type to 'line'
plot(airquality$Temp, airquality$Wind, type="b", main="plot type is line & dot")# change plot type to include both

par(mfrow=c(1,1)) #setting it back to default one plot

2. Boxplots

boxplot(Temp ~ Month, data=airquality ,col = with(airquality, boxplot(Temp ~ Month, col = c(1,2,3,4,5))),main = "Boxplot",outcol="red",outpch=18,boxwex=0.7,range = 1.3)

3. ggplot basic

g<-ggplot(df, aes(x=Day, y=Temp, col = as.factor(Month)))
g + geom_point() + labs(x = "Day", y = "Temperature (°F)", title = "Basic ggplot") 

4. ggplot - Continuous variable

g<-ggplot(df, aes(x=as.factor(Month), y=Temp, col = Temp))
g + geom_point() + labs(x = "Year", y = "Temperature (°F)", title = "Check the default legend for continuous variable") 

5. ggplot - Factor variable

g<-ggplot(df, aes(x=Temp, y=Wind))
g + geom_point(shape = as.factor(df$Month))

Load GIS data and check projection

Lets load a raster layer and check the projection

#Load the raster layer
land <- raster("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/ccap_SE_FL_11class.tif")
#Check the projection for the DEM layer - proj4string() and projection() both does the job
proj4string(land)
#Saving the projection for future use
crs.land <- projection(land)

Load the vector layer and check projection.

If need be then we can re-project the way it is done here.

# Looking inside gdb
fgdb <- "D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/HR_exercise.gdb/"
# To list all the available layers inside .gdb folder
ogrListLayers(fgdb)
# Both ways work
point.data <- readOGR("D:/UGA/Academic Course Work/Fall-2022/FANR-8400-AdvGIS/Assignments/Ex4/HR_exercise/HR_exercise.gdb","allpoints_XYTableToPoint")
#OR
point.data <- readOGR(fgdb,"allpoints_XYTableToPoint")
#Check the projection
projection(point.data)

Re-project the point layer as raster layer

#proj4string(land) and projection(point.data) are different, need to be in same projection
point.data <- spTransform(point.data, crs.land)
#Recheck the projection
projection(point.data)

Raster data inspection

We can inspect various aspects of the raster layer like number of cells, resolution and extent

res(land)
class(point.data)

Subsetting dataframe

df_1 <- point.data[point.data$IdNr==175, ]
df_2 <- point.data[point.data$IdNr==250, ]

Spatial data plots

Raster plot

plot(land, main="Basic plot")
points(point.data, col=point.data$IdNr)

Multiple plot

par(mfrow = c(1, 2)) # display 2 plots in a row -(1 row and 2 column)
plot(land, main="First plot")
points(df_1, col="blue")
plot(land, main="Second plot")
#If we use plot() instead of points() to add to existing raster layer, need to include add=TRUE
plot(df_2, add=TRUE, pch = 24, cex=1, col="red", bg="darkgreen", lwd=2)#play with plot() to understand functionality better

par(mfrow = c(1, 1)) # good practice is to set it back to default - display's one plot at a time

Best way to learn is to run codes and if any error then search it online. Thats how we fix error and there are many resources available online to refer to.

Resources:

  1. DA with R

  2. A GGPLOT2 TUTORIAL FOR BEAUTIFUL PLOTTING IN R

Thanks!!