This next section of the practical will introduce you to the programming language R, the lingua franca of statistical computing. Unlike point and click software, everything you do in R you do from the command line. Although this can seem a little daunting at first, it’s actually not that hard to learn, and the benefits of doing so are enormous.If you’re very new to R I’d suggest checking out this link that introduces you to the interface and some basic stats
For those of you who want to develop your skills further, the School of Biological, Earth and Environ- mental Sciences and the School of Mathematics and Statistics has set up this very useful web site with many tools from simple manipulating data (getting started with R), to the complex statistical analyses of data. Also, the UNSW codeRs is a student and staff run community dedicated for ‘R’ users. They set up events that cater to learning R. They meet fortnightly, alternating between workshops, drop-in help sessions, and other social events. You can visit them here
In this section we will be walking you through the process of data analysis (using R and R studio) for Assessment 2 - Biodiversity Report. We will:
R is a free statistical program used commonly by (but not exclusive to) biologists and psychologists. R studio is the user friendly extension that makes the R console easier to use.
Think of R like a car engine and RStudio like the body of the car, RStudio will not run without R.
To find out more about R and RStudio, like how to customize your RStudio, you can visit this link.
In the default RStudio layout you’ll see four panes in RStudio:
Step 1: Create a folder on your computer and name it “Rdata”
Step 2: Download your class data from moodle: Week 5 > Class Data > Select your class folder > download the inverts.csv file. Save this file in the “Rdata” folder you just made
You must use your scheduled practicals data
Step 3: Create a new R Project File: File > new project > existing directory > browse and select your “Rdata” folder > create project
Step 4: Start a new Script. File > new file > R script.
Save this script now in the Rdata file
Step 5: Clear R’s memory - just in case it is holding any old data from a previous analysis.
Hint: press the “copy code” button in the left corner of the chunk below and paste it into your own script
rm(list=ls())
To run the code on any line, place your cursor on the line you wish to run and press ‘Ctrl + Enter’ to send the command to the R console. Alternatively, you can click the Run icon in the script panel
Step 6: Load & see your data into R
inverts_dat <- read.csv(file = "inverts.csv", header = TRUE)
In the code above, you’ve assigned the name of this file to inverts_dat. From now on, your data will be referred to as inverts_dat.
See the structure of your data:
str(inverts_dat)
## 'data.frame': 152 obs. of 4 variables:
## $ site : chr "CP" "CP" "CP" "CP" ...
## $ group : int 1 2 5 6 6 1 5 1 1 2 ...
## $ invert: chr "Amphipod A" "Amphipod A" "Amphipod A" "Amphipod A" ...
## $ count : int 1 1 1 1 2 3 8 6 24 30 ...
You can also view your data in a table:
This data will appear as a new tab in your RStudio
View(inverts_dat)
There are 5 variables (shown as columns) in your data:
Step 7: Install & Load packages
Packages are like mods for R. These may take a minute or so to load. Look for the small stop sign located in the right-hand corner of the R-consol window.
You can press “No” if a window pops up asking you if you want to re-start R.
Installing packages:
install.packages("tidyverse")
install.packages("ggthemes")
Loading packages:
library("tidyverse")
library("ggthemes")
Step 1: We will create two barplots that show the abundance if leaf litter intervebrates at each site.
Code for Centennial Park:
See Question 1 in the workbook
You can use the zoom button in the graphing panel to view the figure in more detail.
inverts_dat %>%
filter(site == "CP") %>%
group_by(invert) %>%
ggplot(aes(x = reorder(invert, -count), y = count, fill = site)) +
geom_bar(stat = "identity", width = .7) +
labs(x = "Invertebrate species", y = "Abundance") +
theme_bw() +
ggtitle("Abundance of leaf litter invertebrates at Centennial Park") +
theme(legend.position = "none") +
theme(axis.text.x = element_text(angle = -45,hjust = 0))
Code for Randwick Environment Park:
inverts_dat %>%
filter(site == "REP") %>%
group_by(invert) %>%
ggplot(aes(x = reorder(invert, -count), y = count, fill = site)) +
geom_bar(stat = "identity", width = .7) +
labs(x = "Invertebrate Species", y = "Abundance") +
theme_bw() +
ggtitle("Abundance of leaf litter invertebrates at REP") +
theme(legend.position = "none") +
theme(axis.text.x = element_text(angle = -45, hjust = 0, size = 6))
There are several ways to measure biodiversity. The most common is to assess species richness, which is the total number of distinct species within a local community.
Having many species generally coincides with having a diverse and healthy ecosystem. However, the relative abundance, the number of individuals per species, also needs to be considered.
An informative way to measure biodiversity that combines richness and abundance is the Shannon biodiversity index (H). The Shannon index measures how similar the abundances of different species are in the community. The Shannon diversity index provides more information about community composition than simply species richness (i.e. the number of species present) by accounting for the relative abundance of the different species. The higher the value of H, the higher the diversity of species in a particular community. The lower the value of H, the lower the diversity. A value of H = 0 indicates a community that only has one species.
Step 1: Creating an abundance table
abund_group <- summarise(group_by(inverts_dat, site, group),
abundance = sum(count))
We can see which group found the most invertebrates:
abund_group
## # A tibble: 12 × 3
## # Groups: site [2]
## site group abundance
## <chr> <int> <int>
## 1 CP 1 40
## 2 CP 2 157
## 3 CP 3 16
## 4 CP 4 113
## 5 CP 5 379
## 6 CP 6 38
## 7 REP 1 33
## 8 REP 2 84
## 9 REP 3 32
## 10 REP 4 36
## 11 REP 5 55
## 12 REP 6 43
Step 2: Making boxplot figures
If you would like some help understanding boxplots I suggest you read this blogpost.
To make a boxplot we’ll need to install another package:
install.packages("patchwork")
When the installation is done, load the package:
library(patchwork)
Now we are ready to make our plot:
See Question 2 in the workbook
abund_graph <-
abund_group %>%
ggplot(aes(x = site, y = abundance, fill = site)) +
geom_boxplot(width = 0.15, position = position_dodge(width = 10)) +
geom_jitter(shape = 1, position = position_jitter(0.05)) +
labs(x = "Site", y = "Number of invertebrates") +
theme_bw() +
theme(panel.border= element_blank(), axis.line.x = element_line(colour = "black",size = 1), axis.line.y = element_line(colour ="black", size = 1), legend.position = "none")
Let’s see the graph:
abund_graph
Let’s export and save this graph. You can find this graph in the “Rdata” file you created in the very beginning.
ggsave("Invertebrate_Abundance.png", plot=abund_graph, width=6, height=4, dpi=300)
Step 3: Calculating the mean and standard error for abundance
R doesn’t have a code for calculating the standard error, so we will make our own function:
standard_error <- function(x) sd(x) / sqrt(length(x))
Now we can run this function to calculate mean and standard error
abund_all <- summarise(group_by(abund_group, site),
Mean = mean(abundance),
SE = standard_error(abundance))
View our results:
See Question 3 in the workbook
abund_all
## # A tibble: 2 × 3
## site Mean SE
## <chr> <dbl> <dbl>
## 1 CP 124. 55.5
## 2 REP 47.2 8.15
Now Let’s export these stats:
write.csv(abund_all, "Mean&SE_abundance.csv")
Mean versus Median: Both are measures of where the center of a data set lies (called “Central Tendency” in stats), but they are usually different numbers. For example, take this list of numbers: 10, 10, 20, 40, 70. The mean (informally, the “average“) is found by adding all of the numbers together and dividing by the number of items in the set: 10 + 10 + 20 + 40 + 70 / 5 = 30. The median is found by ordering the set from lowest to highest and finding the exact middle. The median is just the middle number: 20.
Step 1: Creating a richness table
rich_group <- summarise(group_by(inverts_dat, site, group),
richness = length(unique(invert)))
Let’s view it:
rich_group
## # A tibble: 12 × 3
## # Groups: site [2]
## site group richness
## <chr> <int> <int>
## 1 CP 1 9
## 2 CP 2 11
## 3 CP 3 5
## 4 CP 4 20
## 5 CP 5 19
## 6 CP 6 13
## 7 REP 1 12
## 8 REP 2 10
## 9 REP 3 13
## 10 REP 4 13
## 11 REP 5 14
## 12 REP 6 13
Step 2: Making boxplot figures
See Question 4 in the workbook
rich_graph <- rich_group %>%
ggplot(aes(x = site, y = richness, fill = site)) +
geom_boxplot(width = 0.15, position = position_dodge(width = 10)) +
geom_jitter(shape = 1, position = position_jitter(0.05)) +
labs(x = "Site", y = "Species richness") +
theme_bw() +
theme(panel.border = element_blank(), axis.line.x = element_line(colour = "black", size = 1), axis.line.y = element_line(colour = "black", size = 1), legend.position = "none")
Let’s see the graph:
rich_graph
Let’s export and save this graph. You can find this graph in the “Rdata” file you created in the very beginning.
ggsave("Invertebrate_Richness.png", plot=rich_graph, width=6, height=4, dpi=300)
Step 3: Calculating the mean and standard error
rich_all <- summarize(group_by(rich_group, site),
Mean = mean(richness),
SE = standard_error(richness))
View our results:
See Question 5 in the workbook
rich_all
## # A tibble: 2 × 3
## site Mean SE
## <chr> <dbl> <dbl>
## 1 CP 12.8 2.37
## 2 REP 12.5 0.563
Now Let’s export these stats:
write.csv(rich_all, "Mean&SE_richness.csv")
Finally, calculate the Shannon diversity index (H) for each group in each habitat type. Remember that the higher the value of H, the higher the diversity of species in a particular community. The lower the value of H, the lower the diversity. A value of H = 0 indicates a community that only has one species.
See Question 6 in the workbook
Step 1: We need to load the package that has the Shannon index formula
install.packages("vegan")
When the package has installed, load it in:
library(vegan)
Step 2: Calculating Shannon index
div_group <- summarise(group_by(inverts_dat, site, group),
diversity = diversity(count, index = "shannon"))
To view it:
div_group
## # A tibble: 12 × 3
## # Groups: site [2]
## site group diversity
## <chr> <int> <dbl>
## 1 CP 1 1.41
## 2 CP 2 1.17
## 3 CP 3 1.13
## 4 CP 4 1.79
## 5 CP 5 1.27
## 6 CP 6 2.21
## 7 REP 1 2.16
## 8 REP 2 1.38
## 9 REP 3 1.79
## 10 REP 4 1.88
## 11 REP 5 1.93
## 12 REP 6 2.13
Step 3: Making a boxplot figure
shannon_graph <- div_group %>%
ggplot(aes(x = site, y = diversity, fill = site)) +
geom_boxplot(width = 0.15, position = position_dodge(width = 10)) +
geom_jitter(shape = 1, position = position_jitter(0.05)) +
labs(x = "Site", y = "Shannon index (H)")+
theme_bw() +
theme(panel.border = element_blank(), axis.line.x = element_line(colour = "black", size = 1), axis.line.y = element_line(colour = "black", size = 1), legend.position = "none")
Let’s see the graph:
shannon_graph
Let’s export and save this graph. You can find this graph in the “Rdata” file you created in the very beginning.
ggsave("ShannonIndex.png", plot=shannon_graph, width=6, height=4, dpi=300)
Step 4: Calculate the mean and standard error
div_all <- summarise(group_by(div_group, site),
Mean = mean(diversity),
SE = standard_error(diversity))
View our results:
div_all
## # A tibble: 2 × 3
## site Mean SE
## <chr> <dbl> <dbl>
## 1 CP 1.50 0.173
## 2 REP 1.88 0.116
Now Let’s export these stats:
write.csv(div_all, "Mean&SE_Shannon.csv")
See Question 7 in the workbook
All we have left to do is cite the packages and software we used.
Citing R (we do not cite RStudio as it’s just an interface, it doesn’t actually do any of the analysis)
citation()
## To cite R in publications use:
##
## R Core Team (2024). _R: A Language and Environment for Statistical
## Computing_. R Foundation for Statistical Computing, Vienna, Austria.
## <https://www.R-project.org/>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2024},
## url = {https://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.
Citing all of the packages we used today:
citation("tidyverse")
## To cite package 'tidyverse' in publications use:
##
## Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
## Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller
## E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V,
## Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to
## the tidyverse." _Journal of Open Source Software_, *4*(43), 1686.
## doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {Welcome to the {tidyverse}},
## author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
## year = {2019},
## journal = {Journal of Open Source Software},
## volume = {4},
## number = {43},
## pages = {1686},
## doi = {10.21105/joss.01686},
## }
citation("ggthemes")
## To cite package 'ggthemes' in publications use:
##
## Arnold J (2024). _ggthemes: Extra Themes, Scales and Geoms for
## 'ggplot2'_. R package version 5.1.0,
## <https://CRAN.R-project.org/package=ggthemes>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {ggthemes: Extra Themes, Scales and Geoms for 'ggplot2'},
## author = {Jeffrey B. Arnold},
## year = {2024},
## note = {R package version 5.1.0},
## url = {https://CRAN.R-project.org/package=ggthemes},
## }
citation("vegan")
## To cite package 'vegan' in publications use:
##
## Oksanen J, Simpson G, Blanchet F, Kindt R, Legendre P, Minchin P,
## O'Hara R, Solymos P, Stevens M, Szoecs E, Wagner H, Barbour M,
## Bedward M, Bolker B, Borcard D, Carvalho G, Chirico M, De Caceres M,
## Durand S, Evangelista H, FitzJohn R, Friendly M, Furneaux B, Hannigan
## G, Hill M, Lahti L, McGlinn D, Ouellette M, Ribeiro Cunha E, Smith T,
## Stier A, Ter Braak C, Weedon J, Borman T (2025). _vegan: Community
## Ecology Package_. R package version 2.6-10,
## <https://CRAN.R-project.org/package=vegan>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {vegan: Community Ecology Package},
## author = {Jari Oksanen and Gavin L. Simpson and F. Guillaume Blanchet and Roeland Kindt and Pierre Legendre and Peter R. Minchin and R.B. O'Hara and Peter Solymos and M. Henry H. Stevens and Eduard Szoecs and Helene Wagner and Matt Barbour and Michael Bedward and Ben Bolker and Daniel Borcard and Gustavo Carvalho and Michael Chirico and Miquel {De Caceres} and Sebastien Durand and Heloisa Beatriz Antoniazi Evangelista and Rich FitzJohn and Michael Friendly and Brendan Furneaux and Geoffrey Hannigan and Mark O. Hill and Leo Lahti and Dan McGlinn and Marie-Helene Ouellette and Eduardo {Ribeiro Cunha} and Tyler Smith and Adrian Stier and Cajo J.F. {Ter Braak} and James Weedon and Tuomas Borman},
## year = {2025},
## note = {R package version 2.6-10},
## url = {https://CRAN.R-project.org/package=vegan},
## }