Visit: https://www.geocaceres.com/
—————————————————————————————————————————————————————————————————
Coding Club Workshop 1 - R Basics:
Learning how to import and explore data, and make graphs about Edinburgh’s biodiversity
Written by Gergana Daskalova 06/11/2016 University of Edinburgh, last updated 28th March 2019
Transcribed by Carlos Caceres 01/04/2020 National University from Colombia
Take From: https://ourcodingclub.github.io/tutorials/intro-to-r/
Loading the packages dplyr by use the filter () function and assign the working directory
#If the dplyr package isn´t installed: install.package("dplyr").
#Note that there are quotation marks when installing a package
library(dplyr) # Note that there aren´t quotation marks when when loading a package
#Assign the working directory
setwd("C:/Users/GEOMATICS/PR/Getting_Started")#Remember, in R should use forward slashes ("C:/folder/data")
Importing Edinburgh Biodiversity Data. You can find all the files needed in the Github repository: https://github.com/ourcodingclub/CC-RBasics
edidiv <- read.csv("C:/Users/GEOMATICS/PR/Getting_Started/CC-RBasics-master/edidiv.csv")
View(edidiv)
#Check that your data was imported without any mistakes
head(edidiv) # Displays the first few rows
## organisationName gridReference year taxonName
## 1 Joint Nature Conservation Committee NT265775 2000 Sterna hirundo
## 2 Joint Nature Conservation Committee NT235775 2000 Sterna hirundo
## 3 Joint Nature Conservation Committee NT235775 2000 Sterna paradisaea
## 4 British Trust for Ornithology NT27 2000 Branta canadensis
## 5 British Trust for Ornithology NT27 2000 Branta leucopsis
## 6 The Wildlife Information Centre NT27S 2001 Turdus merula
## taxonGroup
## 1 Bird
## 2 Bird
## 3 Bird
## 4 Bird
## 5 Bird
## 6 Bird
tail(edidiv) # Displays the last rows
## organisationName gridReference year
## 25679 The Mammal Society NT278745 2016
## 25680 The Mammal Society NT277724 2016
## 25681 The Mammal Society NT266728 2016
## 25682 The Mammal Society NT270728 2016
## 25683 The Mammal Society NT257762 2016
## 25684 People's Trust for Endangered Species NT2372 2016
## taxonName taxonGroup
## 25679 Sciurus carolinensis Mammal
## 25680 Capreolus capreolus Mammal
## 25681 Sciurus carolinensis Mammal
## 25682 Oryctolagus cuniculus Mammal
## 25683 Vulpes vulpes Mammal
## 25684 Erinaceus europaeus Mammal
str(edidiv) # Tells you whether the variables are continuous, integers, categorical or characters
## 'data.frame': 25684 obs. of 5 variables:
## $ organisationName: Factor w/ 28 levels "BATS & The Millennium Link",..: 14 14 14 8 8 28 28 28 28 28 ...
## $ gridReference : Factor w/ 1938 levels "NT200701","NT200712",..: 1314 569 569 1412 1412 1671 1671 1671 1671 1671 ...
## $ year : int 2000 2000 2000 2000 2000 2001 2001 2001 2001 2001 ...
## $ taxonName : Factor w/ 1275 levels "Acarospora fuscata",..: 1126 1126 1127 192 193 1202 365 977 472 947 ...
## $ taxonGroup : Factor w/ 11 levels "Beetle","Bird",..: 2 2 2 2 2 2 2 2 2 2 ...
#The taxonGroup variable shows as a character variable, but it should be a factor (categorical variable)
#So we'll force it to be factor
head(edidiv$taxonGroup) # Displays the first few rows of taxonGroup column only
## [1] Bird Bird Bird Bird Bird Bird
## 11 Levels: Beetle Bird Butterfly Dragonfly Flowering.Plants ... Mollusc
class(edidiv$taxonGroup) # Tells you what type of variable we're dealing
## [1] "factor"
edidiv$taxonGroup <- as.factor(edidiv$taxonGroup) #This function turns whatever values you put inside into a factor
dim(edidiv) # Displays number of rows and columns
## [1] 25684 5
summary(edidiv) # Gives you a summary of the data
## organisationName gridReference
## Biological Records Centre :6744 NT2673 : 2741
## RSPB :5809 NT2773 : 2031
## Butterfly Conservation :3000 NT2873 : 1247
## Scottish Wildlife Trust :2070 NT2570 : 1001
## Conchological Society of Great Britain & Ireland:1998 NT27 : 888
## The Wildlife Information Centre :1860 NT2871 : 767
## (Other) :4203 (Other):17009
## year taxonName taxonGroup
## Min. :2000 Maniola jurtina : 1710 Butterfly :9670
## 1st Qu.:2006 Aphantopus hyperantus: 1468 Bird :7366
## Median :2009 Turdus merula : 1112 Flowering.Plants:2625
## Mean :2009 Lycaena phlaeas : 972 Mollusc :2226
## 3rd Qu.:2011 Aglais urticae : 959 Hymenopteran :1391
## Max. :2016 Aglais io : 720 Mammal : 960
## (Other) :18743 (Other) :1446
summary(edidiv$taxonGroup) # Gives you a summary of that particular variable (column)
## Beetle Bird Butterfly Dragonfly
## 426 7366 9670 421
## Flowering.Plants Fungus Hymenopteran Lichen
## 2625 334 1391 140
## Liverwort Mammal Mollusc
## 125 960 2226
The edidiv object has occurrence records of various species collected in Edinburgh from 2000 to 2016. To explore Edinburgh’s biodiversity, we will create a graph showing how many species were recorded in each taxonomic group. We will filter out the data for each taxon group and then count the unique species within it
#Rememeber install.packages("dplyr") and then load it using library(dplyr)
Beetle <- filter(edidiv, taxonGroup == "Beetle")
Bird <- filter(edidiv, taxonGroup == "Bird")
Butterfly <- filter(edidiv, taxonGroup == "Butterfly")
Dragonfly <- filter(edidiv, taxonGroup == "Dragonfly")
Flowering.Plants <- filter(edidiv, taxonGroup == "Flowering.Plants")
Fungus <- filter(edidiv, taxonGroup == "Fungus")
Hymenopteran <- filter(edidiv, taxonGroup == "Hymenopteran")
Lichen <- filter(edidiv, taxonGroup == "Lichen")
Liverwort <- filter(edidiv, taxonGroup == "Liverwort")
Mammal <- filter(edidiv, taxonGroup == "Mammal")
Mollusc <- filter(edidiv, taxonGroup == "Mollusc")
#Calculate we the number of different species in each group
#unique(), which identifies different species, and length(), which counts them
a <- length(unique(Beetle$taxonName))
b <- length(unique(Bird$taxonName))
c <- length(unique(Butterfly$taxonName))
d <- length(unique(Dragonfly$taxonName))
e <- length(unique(Flowering.Plants$taxonName))
f <- length(unique(Fungus$taxonName))
g <- length(unique(Hymenopteran$taxonName))
h <- length(unique(Lichen$taxonName))
i <- length(unique(Liverwort$taxonName))
j <- length(unique(Mammal$taxonName))
k <- length(unique(Mollusc$taxonName))
biodiv <- c(a,b,c,d,e,f,g,h,i,j,k) # Combine all those object in one vector
names(biodiv) <- c("Beetle", # Add labels
"Bird",
"Butterfly",
"Dragonfly",
"Fl.Plants",
"Fungus",
"Hymenopteran",
"Lichen",
"Liverwort",
"Mammal",
"Mollusc")
barplot(biodiv,main="Species Richness") #Visualise species richness with the barplot() function
help(barplot) # For help with the barplot() function
## starting httpd help server ... done
help(par) # For help with plotting in general
#Save the plot
png("barplot.png", width=1600, height=600) #Customise the size and resolution of the image
barplot(biodiv, xlab="Taxa", ylab="Number of species", ylim=c(0,600), cex.names= 1.5, cex.axis=1.5, cex.lab=1.5)
# The cex code increases the font size when greater than one (and decreases it when less than one).
dev.off() #close the diagram
## png
## 2
Data frames are tables of values: they have a two-dimensional structure with rows and columns, where each column can have a different data type.
taxa <- c("Beetle",
"Bird",
"Butterfly",
"Dragonfly",
"Flowering.Plants",
"Fungus",
"Hymenopteran",
"Lichen",
"Liverwort",
"Mammal",
"Mollusc")
taxa_f <- factor(taxa) # Turning the object in a factor
richness <- c(a,b,c,d,e,f,g,h,i,j,k) # Combining all the values for the number of species in an object called richness
biodata <- data.frame(taxa_f, richness) # Create the data frame from the two vectors
write.csv(biodata, file="biodata.csv") # Saving the file
View(biodata)
png("barplot2.png", width=1600, height=600)
barplot(biodata$richness, names.arg=c("Beetle",
"Bird",
"Butterfly",
"Dragonfly",
"Flowering.Plants",
"Fungus",
"Hymenopteran",
"Lichen",
"Liverwort",
"Mammal",
"Mollusc"),
xlab="Taxa", ylab="Number of species", ylim=c(0,600))
dev.off()
## png
## 2