CW3: Data analysis exercise

Introduction
Format of submission
The dataset
Tasks
Mock task and answer
Marking criteria

Introduction

The aim of this coursework is for you to demonstrate that you can independently analyse a typical ecological dataset using the R statistical environment. The focus of the assignment is technical, i.e., on employing your R skills. While your ecological knowledge and skills will help you with interpretation the data, no in-depth discussion of the data will be required. There are tasks that will guide you through the exercise. Each of the tasks comes with details on what is asked of you and hints that will help you tackle the assignment.

There is always more than one way to do things in R. Our previous sessions have covered most of what is asked of you, but this doesn’t mean you have to do everything the way that it was demonstrated.

Make sure you pay close attention to what is asked of you and consult the hints below. For further assistance, use the R help pages, ask us, or consult online help pages (e.g., Stackoverflow).

Format of submission

You will submit a report that should start with a very brief introduction explaining the dataset and the broad area of study. The main part should address each task in turn. Each task should be stated and it should be explained briefly how you want to address this. This should be followed by a code block (Insert -> R) that contains your R code. Your code should be annotated so that it can be understood by people not familiar with what you have done. Every code block should result in some form of output, depending on the task and your approach. Typically this would be a single or multiple graphs or tables (or both). After the code block there should be a short summarising statement on what you have found. After that, you should address the next task. Please see the mock task and solution below for an example.

The coursework must be submitted as R markdown document – no other file format will be accepted. Such a document can be created in R Studio through File -> New File -> R Markdown.

Hint: If you press Knit in the Rstudio window, R will compile your Markdown document into a webpage (html document). If there are any errors in your script, knitting will be unsuccessful. Observe the R console: it will tell you exactly where any potential error lies (i.e., in which line of the code).

The dataset

You will be analysing a dataset characterising fish communities along an altitudinal gradient of the river Doubs in France. Chemical water parameters were collected together with abundances for several fish species from 30 locations.

Location of the river Doubs in France

# The dataset is part of the ade4 package, so this must be loaded first (install it if not present already)
library(ade4)

# This loads the dataset
data(doubs)

# The data consists of 1 dataframe for the species data (rows = sites, columns = species)...
fish <- doubs$fish
fish

# ... 1 dataframe with water chemistry ...
env <- doubs$env
env

# ... 1 dataframe with the species names (only FYI) ...
doubs$species

# and 1 dataframe with the coordinates of the sampling sites (not important for you)

The environmental parameters explained:

Code	Description of the variable
dfs	Distance from the source [km]
alt	Altitude [m a.s.l.]
slo	Slope [per thousand]
flo	Mean minimum discharge [m3s-1]
pH	pH of water
har	Calcium concentration (hardness) [mgL-1]
pho	Phosphate concentration [mgL-1]
nit	Nitrate concentration [mgL-1]
amm	Ammonium concentration [mgL-1]
oxy	Dissolved oxygen [mgL-1]
bdo	Biological oxygen demand [mgL-1]

Tasks

1) Data exploration

Give an overview of how the environmental and species datasets are structured. This should contain information on how many sites, species, and environmental parameters were collected.

An overview of the characteristics of a dataset can be found using the base r functions str() and summary().

str(env)

## 'data.frame':    30 obs. of  11 variables:
##  $ dfs: num  3 22 102 185 215 324 268 491 705 990 ...
##  $ alt: num  934 932 914 854 849 846 841 792 752 617 ...
##  $ slo: num  6.18 3.43 3.64 3.5 3.18 ...
##  $ flo: num  84 100 180 253 264 286 400 130 480 1000 ...
##  $ pH : num  79 80 83 80 81 79 81 81 80 77 ...
##  $ har: num  45 40 52 72 84 60 88 94 90 82 ...
##  $ pho: num  1 2 5 10 38 20 7 20 30 6 ...
##  $ nit: num  20 20 22 21 52 15 15 41 82 75 ...
##  $ amm: num  0 10 5 0 20 0 0 12 12 1 ...
##  $ oxy: num  122 103 105 110 80 102 111 70 72 100 ...
##  $ bdo: num  27 19 35 13 62 53 22 81 52 43 ...

str(fish)

## 'data.frame':    30 obs. of  27 variables:
##  $ Cogo: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Satr: num  3 5 5 4 2 3 5 0 0 1 ...
##  $ Phph: num  0 4 5 5 3 4 4 0 1 4 ...
##  $ Neba: num  0 3 5 5 2 5 5 0 3 4 ...
##  $ Thth: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Teso: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Chna: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Chto: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Lele: num  0 0 0 0 5 1 1 0 0 2 ...
##  $ Lece: num  0 0 0 1 2 2 1 0 5 2 ...
##  $ Baba: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Spbi: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Gogo: num  0 0 0 1 2 1 0 0 0 1 ...
##  $ Eslu: num  0 0 1 2 4 1 0 0 0 0 ...
##  $ Pefl: num  0 0 0 2 4 1 0 0 0 0 ...
##  $ Rham: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Legi: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Scer: num  0 0 0 0 2 0 0 0 0 0 ...
##  $ Cyca: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Titi: num  0 0 0 1 3 2 0 0 1 0 ...
##  $ Abbr: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Icme: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Acce: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Ruru: num  0 0 0 0 5 1 0 0 4 0 ...
##  $ Blbj: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Alal: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Anan: num  0 0 0 0 0 0 0 0 0 0 ...

summary(env)

##       dfs              alt             slo             flo      
##  Min.   :   3.0   Min.   :172.0   Min.   :1.099   Min.   :  84  
##  1st Qu.: 544.5   1st Qu.:248.0   1st Qu.:1.831   1st Qu.: 420  
##  Median :1752.0   Median :395.0   Median :2.565   Median :2210  
##  Mean   :1879.0   Mean   :481.5   Mean   :2.758   Mean   :2220  
##  3rd Qu.:3017.2   3rd Qu.:782.0   3rd Qu.:3.390   3rd Qu.:2858  
##  Max.   :4530.0   Max.   :934.0   Max.   :6.176   Max.   :6900  
##        pH             har              pho              nit       
##  Min.   :77.00   Min.   : 40.00   Min.   :  1.00   Min.   : 15.0  
##  1st Qu.:79.25   1st Qu.: 84.25   1st Qu.: 12.50   1st Qu.: 50.5  
##  Median :80.00   Median : 89.00   Median : 28.50   Median :160.0  
##  Mean   :80.50   Mean   : 86.10   Mean   : 55.77   Mean   :165.4  
##  3rd Qu.:81.00   3rd Qu.: 96.75   3rd Qu.: 56.00   3rd Qu.:242.5  
##  Max.   :86.00   Max.   :110.00   Max.   :422.00   Max.   :620.0  
##       amm              oxy              bdo        
##  Min.   :  0.00   Min.   : 41.00   Min.   : 13.00  
##  1st Qu.:  0.00   1st Qu.: 80.25   1st Qu.: 27.25  
##  Median : 10.00   Median :102.00   Median : 41.50  
##  Mean   : 20.93   Mean   : 93.90   Mean   : 51.17  
##  3rd Qu.: 20.00   3rd Qu.:109.00   3rd Qu.: 52.75  
##  Max.   :180.00   Max.   :124.00   Max.   :167.00

summary(fish)

##       Cogo           Satr           Phph            Neba            Thth     
##  Min.   :0.00   Min.   :0.00   Min.   :0.000   Min.   :0.000   Min.   :0.00  
##  1st Qu.:0.00   1st Qu.:0.00   1st Qu.:0.000   1st Qu.:1.000   1st Qu.:0.00  
##  Median :0.00   Median :1.00   Median :3.000   Median :2.000   Median :0.00  
##  Mean   :0.50   Mean   :1.90   Mean   :2.267   Mean   :2.433   Mean   :0.50  
##  3rd Qu.:0.75   3rd Qu.:3.75   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:0.75  
##  Max.   :3.00   Max.   :5.00   Max.   :5.000   Max.   :5.000   Max.   :4.00  
##       Teso             Chna          Chto             Lele      
##  Min.   :0.0000   Min.   :0.0   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.0000   Median :0.0   Median :0.0000   Median :1.000  
##  Mean   :0.6333   Mean   :0.6   Mean   :0.8667   Mean   :1.433  
##  3rd Qu.:0.7500   3rd Qu.:1.0   3rd Qu.:2.0000   3rd Qu.:2.000  
##  Max.   :5.0000   Max.   :3.0   Max.   :4.0000   Max.   :5.000  
##       Lece            Baba            Spbi          Gogo            Eslu      
##  Min.   :0.000   Min.   :0.000   Min.   :0.0   Min.   :0.000   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:0.000   1st Qu.:0.0   1st Qu.:0.000   1st Qu.:0.000  
##  Median :2.000   Median :0.000   Median :0.0   Median :1.000   Median :1.000  
##  Mean   :1.867   Mean   :1.433   Mean   :0.9   Mean   :1.833   Mean   :1.333  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:1.0   3rd Qu.:3.750   3rd Qu.:2.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.0   Max.   :5.000   Max.   :5.000  
##       Pefl          Rham          Legi             Scer          Cyca       
##  Min.   :0.0   Min.   :0.0   Min.   :0.0000   Min.   :0.0   Min.   :0.0000  
##  1st Qu.:0.0   1st Qu.:0.0   1st Qu.:0.0000   1st Qu.:0.0   1st Qu.:0.0000  
##  Median :0.5   Median :0.0   Median :0.0000   Median :0.0   Median :0.0000  
##  Mean   :1.2   Mean   :1.1   Mean   :0.9667   Mean   :0.7   Mean   :0.8333  
##  3rd Qu.:2.0   3rd Qu.:2.0   3rd Qu.:1.7500   3rd Qu.:1.0   3rd Qu.:1.0000  
##  Max.   :5.0   Max.   :5.0   Max.   :5.0000   Max.   :5.0   Max.   :5.0000  
##       Titi          Abbr             Icme          Acce            Ruru    
##  Min.   :0.0   Min.   :0.0000   Min.   :0.0   Min.   :0.000   Min.   :0.0  
##  1st Qu.:0.0   1st Qu.:0.0000   1st Qu.:0.0   1st Qu.:0.000   1st Qu.:0.0  
##  Median :1.0   Median :0.0000   Median :0.0   Median :0.000   Median :1.0  
##  Mean   :1.5   Mean   :0.8667   Mean   :0.6   Mean   :1.267   Mean   :2.1  
##  3rd Qu.:3.0   3rd Qu.:1.0000   3rd Qu.:0.0   3rd Qu.:2.000   3rd Qu.:5.0  
##  Max.   :5.0   Max.   :5.0000   Max.   :5.0   Max.   :5.000   Max.   :5.0  
##       Blbj            Alal          Anan     
##  Min.   :0.000   Min.   :0.0   Min.   :0.00  
##  1st Qu.:0.000   1st Qu.:0.0   1st Qu.:0.00  
##  Median :0.000   Median :0.0   Median :0.00  
##  Mean   :1.033   Mean   :1.9   Mean   :0.90  
##  3rd Qu.:1.750   3rd Qu.:5.0   3rd Qu.:1.75  
##  Max.   :5.000   Max.   :5.0   Max.   :5.00

#this shows the structure and summary of the different aspects of the data

This gives an overview of the structure and summary of the dataset, showing a brief introduction to the different aspects fo the dataset.

Determine the altitudinal range across which the data was collected.

Values for the minimum, maximum, and range can be obtained from a dataset using min(), max(), and range().

min(env$alt)

## [1] 172

max(env$alt)

## [1] 934

range(env$alt)

## [1] 172 934

#this gives the min, max, range for altitude column in env data

This shows the range of altitudes across which the data was collected, down the course of the river.

Determine the abundance of each fish species in the dataset. In terms of species abundance, would you describe the fish community as even across sampling sites? Does this match your expectations from such a dataset?

You can obtain column sums and means of the different aspects of a dataset using the colSums() function.

#summarises column values, give abundances of species in the river
colSums(fish)

## Cogo Satr Phph Neba Thth Teso Chna Chto Lele Lece Baba Spbi Gogo Eslu Pefl Rham 
##   15   57   68   73   15   19   18   26   43   56   43   27   55   40   36   33 
## Legi Scer Cyca Titi Abbr Icme Acce Ruru Blbj Alal Anan 
##   29   21   25   45   26   18   38   63   31   57   27

colSums(fish, na.rm = FALSE, dims = 1)

## Cogo Satr Phph Neba Thth Teso Chna Chto Lele Lece Baba Spbi Gogo Eslu Pefl Rham 
##   15   57   68   73   15   19   18   26   43   56   43   27   55   40   36   33 
## Legi Scer Cyca Titi Abbr Icme Acce Ruru Blbj Alal Anan 
##   29   21   25   45   26   18   38   63   31   57   27

rowSums(fish, na.rm = FALSE, dims = 1)

##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
##  3 12 16 21 34 21 16  0 14 14 11 18 19 28 33 40 44 42 46 56 62 72  4 15 11 43 
## 27 28 29 30 
## 63 70 87 89

colMeans(fish, na.rm = FALSE, dims = 1)

##      Cogo      Satr      Phph      Neba      Thth      Teso      Chna      Chto 
## 0.5000000 1.9000000 2.2666667 2.4333333 0.5000000 0.6333333 0.6000000 0.8666667 
##      Lele      Lece      Baba      Spbi      Gogo      Eslu      Pefl      Rham 
## 1.4333333 1.8666667 1.4333333 0.9000000 1.8333333 1.3333333 1.2000000 1.1000000 
##      Legi      Scer      Cyca      Titi      Abbr      Icme      Acce      Ruru 
## 0.9666667 0.7000000 0.8333333 1.5000000 0.8666667 0.6000000 1.2666667 2.1000000 
##      Blbj      Alal      Anan 
## 1.0333333 1.9000000 0.9000000

rowMeans(fish, na.rm = FALSE, dims = 1)

##         1         2         3         4         5         6         7         8 
## 0.1111111 0.4444444 0.5925926 0.7777778 1.2592593 0.7777778 0.5925926 0.0000000 
##         9        10        11        12        13        14        15        16 
## 0.5185185 0.5185185 0.4074074 0.6666667 0.7037037 1.0370370 1.2222222 1.4814815 
##        17        18        19        20        21        22        23        24 
## 1.6296296 1.5555556 1.7037037 2.0740741 2.2962963 2.6666667 0.1481481 0.5555556 
##        25        26        27        28        29        30 
## 0.4074074 1.5925926 2.3333333 2.5925926 3.2222222 3.2962963

The fish species community is not even across the sampling sites, as there are distinct habitats over the course of the river and there are multiple niches to occupy at different points, as well as different chemical compositions and nutrient levels.

The diversity of fish species increases as you get further from the sources

This fits with what I may expect to see.

2) Associations

Are any of the environmental variables associated with each other more strongly than expected by chance? Find at least 2 different means of exploring associations between environmental variables. For both methods, use best practices and comment on the strength of associations.

There are several different methods for finding the correlations between different numeric variables. The two used here are cor() and pairs().

#1. checks correlation between all the columns for the env data 

cor(env[, c("dfs", "alt", "slo", "flo", "pH", "har", "pho", "nit", "amm", "oxy", "bdo")])

##             dfs         alt        slo         flo          pH         har
## dfs  1.00000000 -0.94102219 -0.7557286  0.94904174  0.00472656  0.69790332
## alt -0.94102219  1.00000000  0.7637673 -0.86926914 -0.03726938 -0.74481167
## slo -0.75572859  0.76376732  1.0000000 -0.71571143 -0.27091451 -0.65375106
## flo  0.94904174 -0.86926914 -0.7157114  1.00000000  0.02042538  0.69678410
## pH   0.00472656 -0.03726938 -0.2709145  0.02042538  1.00000000  0.08886897
## har  0.69790332 -0.74481167 -0.6537511  0.69678410  0.08886897  1.00000000
## pho  0.47789736 -0.44204914 -0.4037680  0.38528236 -0.08323950  0.36379811
## nit  0.74671936 -0.76054593 -0.6108798  0.60707232 -0.04887849  0.51073526
## amm  0.40866509 -0.38132330 -0.3514402  0.29490860 -0.12412055  0.29074449
## oxy -0.51035396  0.36190401  0.4637083 -0.35789468  0.17700293 -0.38239140
## bdo  0.39573704 -0.33784820 -0.3170900  0.25320534 -0.15181290  0.34496636
##            pho         nit        amm        oxy        bdo
## dfs  0.4778974  0.74671936  0.4086651 -0.5103540  0.3957370
## alt -0.4420491 -0.76054593 -0.3813233  0.3619040 -0.3378482
## slo -0.4037680 -0.61087984 -0.3514402  0.4637083 -0.3170900
## flo  0.3852824  0.60707232  0.2949086 -0.3578947  0.2532053
## pH  -0.0832395 -0.04887849 -0.1241205  0.1770029 -0.1518129
## har  0.3637981  0.51073526  0.2907445 -0.3823914  0.3449664
## pho  1.0000000  0.80025065  0.9695215 -0.7236924  0.8855369
## nit  0.8002507  1.00000000  0.7976855 -0.6290729  0.6422816
## amm  0.9695215  0.79768545  1.0000000 -0.7208146  0.8857985
## oxy -0.7236924 -0.62907291 -0.7208146  1.0000000 -0.8431211
## bdo  0.8855369  0.64228156  0.8857985 -0.8431211  1.0000000

#same thing with less steps
cor(env)

##             dfs         alt        slo         flo          pH         har
## dfs  1.00000000 -0.94102219 -0.7557286  0.94904174  0.00472656  0.69790332
## alt -0.94102219  1.00000000  0.7637673 -0.86926914 -0.03726938 -0.74481167
## slo -0.75572859  0.76376732  1.0000000 -0.71571143 -0.27091451 -0.65375106
## flo  0.94904174 -0.86926914 -0.7157114  1.00000000  0.02042538  0.69678410
## pH   0.00472656 -0.03726938 -0.2709145  0.02042538  1.00000000  0.08886897
## har  0.69790332 -0.74481167 -0.6537511  0.69678410  0.08886897  1.00000000
## pho  0.47789736 -0.44204914 -0.4037680  0.38528236 -0.08323950  0.36379811
## nit  0.74671936 -0.76054593 -0.6108798  0.60707232 -0.04887849  0.51073526
## amm  0.40866509 -0.38132330 -0.3514402  0.29490860 -0.12412055  0.29074449
## oxy -0.51035396  0.36190401  0.4637083 -0.35789468  0.17700293 -0.38239140
## bdo  0.39573704 -0.33784820 -0.3170900  0.25320534 -0.15181290  0.34496636
##            pho         nit        amm        oxy        bdo
## dfs  0.4778974  0.74671936  0.4086651 -0.5103540  0.3957370
## alt -0.4420491 -0.76054593 -0.3813233  0.3619040 -0.3378482
## slo -0.4037680 -0.61087984 -0.3514402  0.4637083 -0.3170900
## flo  0.3852824  0.60707232  0.2949086 -0.3578947  0.2532053
## pH  -0.0832395 -0.04887849 -0.1241205  0.1770029 -0.1518129
## har  0.3637981  0.51073526  0.2907445 -0.3823914  0.3449664
## pho  1.0000000  0.80025065  0.9695215 -0.7236924  0.8855369
## nit  0.8002507  1.00000000  0.7976855 -0.6290729  0.6422816
## amm  0.9695215  0.79768545  1.0000000 -0.7208146  0.8857985
## oxy -0.7236924 -0.62907291 -0.7208146  1.0000000 -0.8431211
## bdo  0.8855369  0.64228156  0.8857985 -0.8431211  1.0000000

#2 pairs plots: this gives plots of the correlation of all the env. variables 

pairs(env)

Comments on strong associations:

alt and dfs: strong negative, this is to be anticipated as water flows downhill.
flo and dfs: strong positive, this is to be expected as more water drains into river system further from the source, leading to faster flow.
alt and flo: strong negative, to be expected as altitude decreases flow increases as the water picks up speed
nitrogen concentration increases as you get further from the source
oxygen concentration decreases with biological demand for oxygen.

Another method for analysing correlation numerically in R is to use a Spearmans Rank coefficient:

#could also do spearmans rank coefficient with:
corr <- cor.test(x = env$dfs, y = env$alt, method = "spearman")
#run through for each variable, would take a lot of time
#also possible to do pairwise correlations, scatter plot matrix, correlation test for significance, pearsons correlation coefficient

Spearmans rank requires that you run through every combination of variables.

3) Classification of sites

Use cluster analysis to explore which sites are similar in terms of their chemical composition and species composition. How would you describe the concordance of groupings between these two clustering approaches, and what could this mean?

Hint: check out the R script from week 10 for more details on cluster analysis.

First need to install the relevant packages, and then use the dist.mat() and fviz.dist() functions to generate a distribution matrix and a dendogram for the environmental data.

library(dplyr, warn.conflict = FALSE, quietly = TRUE)
env.chem <- env%>%select(name = oxy|har|pho|nit)


#need to install packages
install.packages("factoextra")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

library(factoextra)

## Loading required package: ggplot2

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

be.scale <- scale(env.chem)
#euclidian disance matrix
dist.mat <- dist(env.chem, method = "euclidian")

#generates heatmap
fviz_dist(dist.mat, lab_size = 2)

#clusters in hierarchy 
library(factoextra)
clust.mat <- hclust(dist.mat) 
#denotes ideal number of clusters (=2)
fviz_nbclust(env.chem, kmeans, method = "silhouette")

fviz_dend(clust.mat, 
          k = 2,
          horiz = FALSE,
          rect = T,
          palette = "d3")

## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
##   Please report the issue at <]8;;https://github.com/kassambara/factoextra/issueshttps://github.com/kassambara/factoextra/issues]8;;>.

Generate dendogram for species data:

dist.mat2 <- dist(fish, method = "euclidian")

#generates heat map for fish data
fviz_dist(dist.mat2, lab = 2)

#orders in hierachy
clust.mat2 <- hclust(dist.mat2) 

#give ideal no. clusters
fviz_nbclust(fish, kmeans, method = "silhouette")

#gives dendgram 
fviz_dend(clust.mat2, 
          k = 2,
          horiz = FALSE,
          rect = T,
          palette = "d3")

There is different levels of similarity between chemical composition and species composition, they do not follow similar trends. Sites 20, 21, 26, 28, 27, 32, 30, and 29 are significantly different in terms of species composition to the other sites. Site 25 is the only significantly different site in terms of chemical composition.

4) Species richness and diversity

Calculate the species diversity for each of the sites using at least two different measures, and provide a brief summary of how and why the measures differ. Does the species diversity correlate with any measured ecological parameters?

Hint: Use the function diversity() from the vegan package to calculate species diversity.

Need to install the vegan package to access the diversity() function to conduct shannon and simpson diveristy index analysis of the sites.

install.packages("vegan")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

library(vegan)

## Loading required package: permute

## Loading required package: lattice

## Registered S3 method overwritten by 'vegan':
##   method     from      
##   rev.hclust dendextend

## This is vegan 2.6-4

#method 1, shannon diversity index, 1 = 1 species per site
diversity(fish, "shannon")

##        1        2        3        4        5        6        7        8 
## 0.000000 1.077556 1.263741 1.882039 2.329070 2.108294 1.420116 0.000000 
##        9       10       11       12       13       14       15       16 
## 1.432757 1.648847 1.594167 1.673142 1.705013 2.125904 2.322898 2.643290 
##       17       18       19       20       21       22       23       24 
## 2.941232 3.023328 2.962449 2.992018 3.038689 3.015832 1.039721 1.894312 
##       25       26       27       28       29       30 
## 1.972247 2.904931 2.952539 2.986392 3.144175 2.996777

#method 2, simpson diversity index, 1 = high, 0 = low
diversity(fish, "simpson")

##         1         2         3         4         5         6         7         8 
## 0.0000000 0.6527778 0.7031250 0.8253968 0.8961938 0.8571429 0.7343750 1.0000000 
##         9        10        11        12        13        14        15        16 
## 0.7346939 0.7857143 0.7603306 0.7962963 0.8033241 0.8673469 0.8962351 0.9175000 
##        17        18        19        20        21        22        23        24 
## 0.9400826 0.9467120 0.9395085 0.9451531 0.9479709 0.9483025 0.6250000 0.8177778 
##        25        26        27        28        29        30 
## 0.8429752 0.9378042 0.9428068 0.9461224 0.9538909 0.9486176

The species diversity of the sites increases further down the river course, and correlates positively with levels of nitrogen, but negatively with levels of oxygen.

Mock task and answer

Task

Provide an overview of the geographic locations of the sampling sites.

Answer

The sampling locations are given as coordinates in the dataset (doubs$xy) and can thus be plotted as scatterplot to create a visual overview of the sampling sites.

# I will use ggplot2 to map the locations, so I am loading the package here
library(ggplot2)

# First, load the data 
data(doubs)

# Store the sampling coordinates in a new dataframe
coords <- doubs$xy

# plot using ggplot (note the extensive annotations that will make this code easier to understand)
# this calls ggplot and sets the aesthetics (x and y of my dataframe)
ggplot(coords, aes(x = x, y = y)) +        
  # Plot a line between all consecutive points (light blue and thick).
  geom_path(col = "lightblue", lwd = 5) +
  # Add labels for the sites 1–30
  geom_label(label = c(1:30)) +
  # Remove all axes and grids
  theme_void() +
  # Add a title
  labs(title = "Location of the sampling sites for the Doubs dataset")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

There are 30 sites that cover the Doubs at more or less regular intervals, beginning upstream (sample 1) to downstream (sample 30). The shape of the plot closely resembles the outline of the river in a map, which suggests that the original location measurements were accurate and my plotting approach was appropriate.

Marking criteria

Criterion	Max score	Weight per question	Max score for all questions
Format follows description (contains brief explanation of approach, code block, output, and summary)	10	0.25	10
Approach appropriate and complete	10	0.5	20
Code without errors (i.e., does what it’s supposed to; code “elegance” will not be assessed!)	10	1	40
Code annotations (quantity and quality)	10	0.25	10
Interpretation of outcome	10	0.25	10
Overall format & structure; usage of markdown syntax	10	–	10