The aim of this coursework is for you to demonstrate that you can
independently analyse a typical ecological dataset using the
R statistical environment. The focus of the assignment is
technical, i.e., on employing your R skills. While your
ecological knowledge and skills will help you with interpretation the
data, no in-depth discussion of the data will be required. There are
tasks that will guide you through the exercise. Each of the tasks comes
with details on what is asked of you and hints that will help you tackle
the assignment.
There is always more than one way to do things in
R. Our previous sessions have covered most of what is asked of you, but this doesn’t mean you have to do everything the way that it was demonstrated.
Make sure you pay close attention to what is asked of you and
consult the hints below. For further assistance, use the
R help pages, ask us, or consult online help pages (e.g.,
Stackoverflow).
You will submit a report that should start with a very brief
introduction explaining the dataset and the broad area of study. The
main part should address each task in turn. Each task should be stated
and it should be explained briefly how you want to address this. This
should be followed by a code block (Insert ->
R) that contains your R code. Your code should
be annotated so that it can be understood by people not familiar with
what you have done. Every code block should result in some form of
output, depending on the task and your approach. Typically this would be
a single or multiple graphs or tables (or both). After the code block
there should be a short summarising statement on what you have
found. After that, you should address the next task. Please see the mock
task and solution below for an example.
The coursework must be submitted as R markdown
document – no other file format will be accepted. Such a document can be
created in R Studio through File ->
New File -> R Markdown.
Hint: If you press
Knitin the Rstudio window,Rwill compile your Markdown document into a webpage (html document). If there are any errors in your script, knitting will be unsuccessful. Observe theRconsole: it will tell you exactly where any potential error lies (i.e., in which line of the code).
You will be analysing a dataset characterising fish communities along an altitudinal gradient of the river Doubs in France. Chemical water parameters were collected together with abundances for several fish species from 30 locations.
Location of the river Doubs in France
# The dataset is part of the ade4 package, so this must be loaded first (install it if not present already)
library(ade4)
# This loads the dataset
data(doubs)
# The data consists of 1 dataframe for the species data (rows = sites, columns = species)...
fish <- doubs$fish
fish
# ... 1 dataframe with water chemistry ...
env <- doubs$env
env
# ... 1 dataframe with the species names (only FYI) ...
doubs$species
# and 1 dataframe with the coordinates of the sampling sites (not important for you)
The environmental parameters explained:
| Code | Description of the variable |
|---|---|
| dfs | Distance from the source [km] |
| alt | Altitude [m a.s.l.] |
| slo | Slope [per thousand] |
| flo | Mean minimum discharge [m3s-1] |
| pH | pH of water |
| har | Calcium concentration (hardness) [mgL-1] |
| pho | Phosphate concentration [mgL-1] |
| nit | Nitrate concentration [mgL-1] |
| amm | Ammonium concentration [mgL-1] |
| oxy | Dissolved oxygen [mgL-1] |
| bdo | Biological oxygen demand [mgL-1] |
An overview of the characteristics of a dataset can be found using the base r functions str() and summary().
str(env)
## 'data.frame': 30 obs. of 11 variables:
## $ dfs: num 3 22 102 185 215 324 268 491 705 990 ...
## $ alt: num 934 932 914 854 849 846 841 792 752 617 ...
## $ slo: num 6.18 3.43 3.64 3.5 3.18 ...
## $ flo: num 84 100 180 253 264 286 400 130 480 1000 ...
## $ pH : num 79 80 83 80 81 79 81 81 80 77 ...
## $ har: num 45 40 52 72 84 60 88 94 90 82 ...
## $ pho: num 1 2 5 10 38 20 7 20 30 6 ...
## $ nit: num 20 20 22 21 52 15 15 41 82 75 ...
## $ amm: num 0 10 5 0 20 0 0 12 12 1 ...
## $ oxy: num 122 103 105 110 80 102 111 70 72 100 ...
## $ bdo: num 27 19 35 13 62 53 22 81 52 43 ...
str(fish)
## 'data.frame': 30 obs. of 27 variables:
## $ Cogo: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Satr: num 3 5 5 4 2 3 5 0 0 1 ...
## $ Phph: num 0 4 5 5 3 4 4 0 1 4 ...
## $ Neba: num 0 3 5 5 2 5 5 0 3 4 ...
## $ Thth: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Teso: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Chna: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Chto: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Lele: num 0 0 0 0 5 1 1 0 0 2 ...
## $ Lece: num 0 0 0 1 2 2 1 0 5 2 ...
## $ Baba: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Spbi: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Gogo: num 0 0 0 1 2 1 0 0 0 1 ...
## $ Eslu: num 0 0 1 2 4 1 0 0 0 0 ...
## $ Pefl: num 0 0 0 2 4 1 0 0 0 0 ...
## $ Rham: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Legi: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Scer: num 0 0 0 0 2 0 0 0 0 0 ...
## $ Cyca: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Titi: num 0 0 0 1 3 2 0 0 1 0 ...
## $ Abbr: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Icme: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Acce: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Ruru: num 0 0 0 0 5 1 0 0 4 0 ...
## $ Blbj: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Alal: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Anan: num 0 0 0 0 0 0 0 0 0 0 ...
summary(env)
## dfs alt slo flo
## Min. : 3.0 Min. :172.0 Min. :1.099 Min. : 84
## 1st Qu.: 544.5 1st Qu.:248.0 1st Qu.:1.831 1st Qu.: 420
## Median :1752.0 Median :395.0 Median :2.565 Median :2210
## Mean :1879.0 Mean :481.5 Mean :2.758 Mean :2220
## 3rd Qu.:3017.2 3rd Qu.:782.0 3rd Qu.:3.390 3rd Qu.:2858
## Max. :4530.0 Max. :934.0 Max. :6.176 Max. :6900
## pH har pho nit
## Min. :77.00 Min. : 40.00 Min. : 1.00 Min. : 15.0
## 1st Qu.:79.25 1st Qu.: 84.25 1st Qu.: 12.50 1st Qu.: 50.5
## Median :80.00 Median : 89.00 Median : 28.50 Median :160.0
## Mean :80.50 Mean : 86.10 Mean : 55.77 Mean :165.4
## 3rd Qu.:81.00 3rd Qu.: 96.75 3rd Qu.: 56.00 3rd Qu.:242.5
## Max. :86.00 Max. :110.00 Max. :422.00 Max. :620.0
## amm oxy bdo
## Min. : 0.00 Min. : 41.00 Min. : 13.00
## 1st Qu.: 0.00 1st Qu.: 80.25 1st Qu.: 27.25
## Median : 10.00 Median :102.00 Median : 41.50
## Mean : 20.93 Mean : 93.90 Mean : 51.17
## 3rd Qu.: 20.00 3rd Qu.:109.00 3rd Qu.: 52.75
## Max. :180.00 Max. :124.00 Max. :167.00
summary(fish)
## Cogo Satr Phph Neba Thth
## Min. :0.00 Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.00
## 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:0.00
## Median :0.00 Median :1.00 Median :3.000 Median :2.000 Median :0.00
## Mean :0.50 Mean :1.90 Mean :2.267 Mean :2.433 Mean :0.50
## 3rd Qu.:0.75 3rd Qu.:3.75 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:0.75
## Max. :3.00 Max. :5.00 Max. :5.000 Max. :5.000 Max. :4.00
## Teso Chna Chto Lele
## Min. :0.0000 Min. :0.0 Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.0000 Median :0.0 Median :0.0000 Median :1.000
## Mean :0.6333 Mean :0.6 Mean :0.8667 Mean :1.433
## 3rd Qu.:0.7500 3rd Qu.:1.0 3rd Qu.:2.0000 3rd Qu.:2.000
## Max. :5.0000 Max. :3.0 Max. :4.0000 Max. :5.000
## Lece Baba Spbi Gogo Eslu
## Min. :0.000 Min. :0.000 Min. :0.0 Min. :0.000 Min. :0.000
## 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.0 1st Qu.:0.000 1st Qu.:0.000
## Median :2.000 Median :0.000 Median :0.0 Median :1.000 Median :1.000
## Mean :1.867 Mean :1.433 Mean :0.9 Mean :1.833 Mean :1.333
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:1.0 3rd Qu.:3.750 3rd Qu.:2.000
## Max. :5.000 Max. :5.000 Max. :5.0 Max. :5.000 Max. :5.000
## Pefl Rham Legi Scer Cyca
## Min. :0.0 Min. :0.0 Min. :0.0000 Min. :0.0 Min. :0.0000
## 1st Qu.:0.0 1st Qu.:0.0 1st Qu.:0.0000 1st Qu.:0.0 1st Qu.:0.0000
## Median :0.5 Median :0.0 Median :0.0000 Median :0.0 Median :0.0000
## Mean :1.2 Mean :1.1 Mean :0.9667 Mean :0.7 Mean :0.8333
## 3rd Qu.:2.0 3rd Qu.:2.0 3rd Qu.:1.7500 3rd Qu.:1.0 3rd Qu.:1.0000
## Max. :5.0 Max. :5.0 Max. :5.0000 Max. :5.0 Max. :5.0000
## Titi Abbr Icme Acce Ruru
## Min. :0.0 Min. :0.0000 Min. :0.0 Min. :0.000 Min. :0.0
## 1st Qu.:0.0 1st Qu.:0.0000 1st Qu.:0.0 1st Qu.:0.000 1st Qu.:0.0
## Median :1.0 Median :0.0000 Median :0.0 Median :0.000 Median :1.0
## Mean :1.5 Mean :0.8667 Mean :0.6 Mean :1.267 Mean :2.1
## 3rd Qu.:3.0 3rd Qu.:1.0000 3rd Qu.:0.0 3rd Qu.:2.000 3rd Qu.:5.0
## Max. :5.0 Max. :5.0000 Max. :5.0 Max. :5.000 Max. :5.0
## Blbj Alal Anan
## Min. :0.000 Min. :0.0 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.0 1st Qu.:0.00
## Median :0.000 Median :0.0 Median :0.00
## Mean :1.033 Mean :1.9 Mean :0.90
## 3rd Qu.:1.750 3rd Qu.:5.0 3rd Qu.:1.75
## Max. :5.000 Max. :5.0 Max. :5.00
#this shows the structure and summary of the different aspects of the data
This gives an overview of the structure and summary of the dataset, showing a brief introduction to the different aspects fo the dataset.
Values for the minimum, maximum, and range can be obtained from a dataset using min(), max(), and range().
min(env$alt)
## [1] 172
max(env$alt)
## [1] 934
range(env$alt)
## [1] 172 934
#this gives the min, max, range for altitude column in env data
This shows the range of altitudes across which the data was collected, down the course of the river.
You can obtain column sums and means of the different aspects of a dataset using the colSums() function.
#summarises column values, give abundances of species in the river
colSums(fish)
## Cogo Satr Phph Neba Thth Teso Chna Chto Lele Lece Baba Spbi Gogo Eslu Pefl Rham
## 15 57 68 73 15 19 18 26 43 56 43 27 55 40 36 33
## Legi Scer Cyca Titi Abbr Icme Acce Ruru Blbj Alal Anan
## 29 21 25 45 26 18 38 63 31 57 27
colSums(fish, na.rm = FALSE, dims = 1)
## Cogo Satr Phph Neba Thth Teso Chna Chto Lele Lece Baba Spbi Gogo Eslu Pefl Rham
## 15 57 68 73 15 19 18 26 43 56 43 27 55 40 36 33
## Legi Scer Cyca Titi Abbr Icme Acce Ruru Blbj Alal Anan
## 29 21 25 45 26 18 38 63 31 57 27
rowSums(fish, na.rm = FALSE, dims = 1)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 3 12 16 21 34 21 16 0 14 14 11 18 19 28 33 40 44 42 46 56 62 72 4 15 11 43
## 27 28 29 30
## 63 70 87 89
colMeans(fish, na.rm = FALSE, dims = 1)
## Cogo Satr Phph Neba Thth Teso Chna Chto
## 0.5000000 1.9000000 2.2666667 2.4333333 0.5000000 0.6333333 0.6000000 0.8666667
## Lele Lece Baba Spbi Gogo Eslu Pefl Rham
## 1.4333333 1.8666667 1.4333333 0.9000000 1.8333333 1.3333333 1.2000000 1.1000000
## Legi Scer Cyca Titi Abbr Icme Acce Ruru
## 0.9666667 0.7000000 0.8333333 1.5000000 0.8666667 0.6000000 1.2666667 2.1000000
## Blbj Alal Anan
## 1.0333333 1.9000000 0.9000000
rowMeans(fish, na.rm = FALSE, dims = 1)
## 1 2 3 4 5 6 7 8
## 0.1111111 0.4444444 0.5925926 0.7777778 1.2592593 0.7777778 0.5925926 0.0000000
## 9 10 11 12 13 14 15 16
## 0.5185185 0.5185185 0.4074074 0.6666667 0.7037037 1.0370370 1.2222222 1.4814815
## 17 18 19 20 21 22 23 24
## 1.6296296 1.5555556 1.7037037 2.0740741 2.2962963 2.6666667 0.1481481 0.5555556
## 25 26 27 28 29 30
## 0.4074074 1.5925926 2.3333333 2.5925926 3.2222222 3.2962963
The fish species community is not even across the sampling sites, as there are distinct habitats over the course of the river and there are multiple niches to occupy at different points, as well as different chemical compositions and nutrient levels.
The diversity of fish species increases as you get further from the sources
This fits with what I may expect to see.
Are any of the environmental variables associated with each other more strongly than expected by chance? Find at least 2 different means of exploring associations between environmental variables. For both methods, use best practices and comment on the strength of associations.
There are several different methods for finding the correlations between different numeric variables. The two used here are cor() and pairs().
#1. checks correlation between all the columns for the env data
cor(env[, c("dfs", "alt", "slo", "flo", "pH", "har", "pho", "nit", "amm", "oxy", "bdo")])
## dfs alt slo flo pH har
## dfs 1.00000000 -0.94102219 -0.7557286 0.94904174 0.00472656 0.69790332
## alt -0.94102219 1.00000000 0.7637673 -0.86926914 -0.03726938 -0.74481167
## slo -0.75572859 0.76376732 1.0000000 -0.71571143 -0.27091451 -0.65375106
## flo 0.94904174 -0.86926914 -0.7157114 1.00000000 0.02042538 0.69678410
## pH 0.00472656 -0.03726938 -0.2709145 0.02042538 1.00000000 0.08886897
## har 0.69790332 -0.74481167 -0.6537511 0.69678410 0.08886897 1.00000000
## pho 0.47789736 -0.44204914 -0.4037680 0.38528236 -0.08323950 0.36379811
## nit 0.74671936 -0.76054593 -0.6108798 0.60707232 -0.04887849 0.51073526
## amm 0.40866509 -0.38132330 -0.3514402 0.29490860 -0.12412055 0.29074449
## oxy -0.51035396 0.36190401 0.4637083 -0.35789468 0.17700293 -0.38239140
## bdo 0.39573704 -0.33784820 -0.3170900 0.25320534 -0.15181290 0.34496636
## pho nit amm oxy bdo
## dfs 0.4778974 0.74671936 0.4086651 -0.5103540 0.3957370
## alt -0.4420491 -0.76054593 -0.3813233 0.3619040 -0.3378482
## slo -0.4037680 -0.61087984 -0.3514402 0.4637083 -0.3170900
## flo 0.3852824 0.60707232 0.2949086 -0.3578947 0.2532053
## pH -0.0832395 -0.04887849 -0.1241205 0.1770029 -0.1518129
## har 0.3637981 0.51073526 0.2907445 -0.3823914 0.3449664
## pho 1.0000000 0.80025065 0.9695215 -0.7236924 0.8855369
## nit 0.8002507 1.00000000 0.7976855 -0.6290729 0.6422816
## amm 0.9695215 0.79768545 1.0000000 -0.7208146 0.8857985
## oxy -0.7236924 -0.62907291 -0.7208146 1.0000000 -0.8431211
## bdo 0.8855369 0.64228156 0.8857985 -0.8431211 1.0000000
#same thing with less steps
cor(env)
## dfs alt slo flo pH har
## dfs 1.00000000 -0.94102219 -0.7557286 0.94904174 0.00472656 0.69790332
## alt -0.94102219 1.00000000 0.7637673 -0.86926914 -0.03726938 -0.74481167
## slo -0.75572859 0.76376732 1.0000000 -0.71571143 -0.27091451 -0.65375106
## flo 0.94904174 -0.86926914 -0.7157114 1.00000000 0.02042538 0.69678410
## pH 0.00472656 -0.03726938 -0.2709145 0.02042538 1.00000000 0.08886897
## har 0.69790332 -0.74481167 -0.6537511 0.69678410 0.08886897 1.00000000
## pho 0.47789736 -0.44204914 -0.4037680 0.38528236 -0.08323950 0.36379811
## nit 0.74671936 -0.76054593 -0.6108798 0.60707232 -0.04887849 0.51073526
## amm 0.40866509 -0.38132330 -0.3514402 0.29490860 -0.12412055 0.29074449
## oxy -0.51035396 0.36190401 0.4637083 -0.35789468 0.17700293 -0.38239140
## bdo 0.39573704 -0.33784820 -0.3170900 0.25320534 -0.15181290 0.34496636
## pho nit amm oxy bdo
## dfs 0.4778974 0.74671936 0.4086651 -0.5103540 0.3957370
## alt -0.4420491 -0.76054593 -0.3813233 0.3619040 -0.3378482
## slo -0.4037680 -0.61087984 -0.3514402 0.4637083 -0.3170900
## flo 0.3852824 0.60707232 0.2949086 -0.3578947 0.2532053
## pH -0.0832395 -0.04887849 -0.1241205 0.1770029 -0.1518129
## har 0.3637981 0.51073526 0.2907445 -0.3823914 0.3449664
## pho 1.0000000 0.80025065 0.9695215 -0.7236924 0.8855369
## nit 0.8002507 1.00000000 0.7976855 -0.6290729 0.6422816
## amm 0.9695215 0.79768545 1.0000000 -0.7208146 0.8857985
## oxy -0.7236924 -0.62907291 -0.7208146 1.0000000 -0.8431211
## bdo 0.8855369 0.64228156 0.8857985 -0.8431211 1.0000000
#2 pairs plots: this gives plots of the correlation of all the env. variables
pairs(env)
Comments on strong associations:
Another method for analysing correlation numerically in R is to use a Spearmans Rank coefficient:
#could also do spearmans rank coefficient with:
corr <- cor.test(x = env$dfs, y = env$alt, method = "spearman")
#run through for each variable, would take a lot of time
#also possible to do pairwise correlations, scatter plot matrix, correlation test for significance, pearsons correlation coefficient
Spearmans rank requires that you run through every combination of variables.
Use cluster analysis to explore which sites are similar in terms of their chemical composition and species composition. How would you describe the concordance of groupings between these two clustering approaches, and what could this mean?
Hint: check out the R script from week 10 for more details on cluster analysis.
First need to install the relevant packages, and then use the dist.mat() and fviz.dist() functions to generate a distribution matrix and a dendogram for the environmental data.
library(dplyr, warn.conflict = FALSE, quietly = TRUE)
env.chem <- env%>%select(name = oxy|har|pho|nit)
#need to install packages
install.packages("factoextra")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
be.scale <- scale(env.chem)
#euclidian disance matrix
dist.mat <- dist(env.chem, method = "euclidian")
#generates heatmap
fviz_dist(dist.mat, lab_size = 2)
#clusters in hierarchy
library(factoextra)
clust.mat <- hclust(dist.mat)
#denotes ideal number of clusters (=2)
fviz_nbclust(env.chem, kmeans, method = "silhouette")
fviz_dend(clust.mat,
k = 2,
horiz = FALSE,
rect = T,
palette = "d3")
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
## Please report the issue at <]8;;https://github.com/kassambara/factoextra/issueshttps://github.com/kassambara/factoextra/issues]8;;>.
Generate dendogram for species data:
dist.mat2 <- dist(fish, method = "euclidian")
#generates heat map for fish data
fviz_dist(dist.mat2, lab = 2)
#orders in hierachy
clust.mat2 <- hclust(dist.mat2)
#give ideal no. clusters
fviz_nbclust(fish, kmeans, method = "silhouette")
#gives dendgram
fviz_dend(clust.mat2,
k = 2,
horiz = FALSE,
rect = T,
palette = "d3")
There is different levels of similarity between chemical composition and
species composition, they do not follow similar trends. Sites 20, 21,
26, 28, 27, 32, 30, and 29 are significantly different in terms of
species composition to the other sites. Site 25 is the only
significantly different site in terms of chemical composition.
Calculate the species diversity for each of the sites using at least two different measures, and provide a brief summary of how and why the measures differ. Does the species diversity correlate with any measured ecological parameters?
Hint: Use the function
diversity()from the vegan package to calculate species diversity.
Need to install the vegan package to access the diversity() function to conduct shannon and simpson diveristy index analysis of the sites.
install.packages("vegan")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## Registered S3 method overwritten by 'vegan':
## method from
## rev.hclust dendextend
## This is vegan 2.6-4
#method 1, shannon diversity index, 1 = 1 species per site
diversity(fish, "shannon")
## 1 2 3 4 5 6 7 8
## 0.000000 1.077556 1.263741 1.882039 2.329070 2.108294 1.420116 0.000000
## 9 10 11 12 13 14 15 16
## 1.432757 1.648847 1.594167 1.673142 1.705013 2.125904 2.322898 2.643290
## 17 18 19 20 21 22 23 24
## 2.941232 3.023328 2.962449 2.992018 3.038689 3.015832 1.039721 1.894312
## 25 26 27 28 29 30
## 1.972247 2.904931 2.952539 2.986392 3.144175 2.996777
#method 2, simpson diversity index, 1 = high, 0 = low
diversity(fish, "simpson")
## 1 2 3 4 5 6 7 8
## 0.0000000 0.6527778 0.7031250 0.8253968 0.8961938 0.8571429 0.7343750 1.0000000
## 9 10 11 12 13 14 15 16
## 0.7346939 0.7857143 0.7603306 0.7962963 0.8033241 0.8673469 0.8962351 0.9175000
## 17 18 19 20 21 22 23 24
## 0.9400826 0.9467120 0.9395085 0.9451531 0.9479709 0.9483025 0.6250000 0.8177778
## 25 26 27 28 29 30
## 0.8429752 0.9378042 0.9428068 0.9461224 0.9538909 0.9486176
The species diversity of the sites increases further down the river course, and correlates positively with levels of nitrogen, but negatively with levels of oxygen.
Provide an overview of the geographic locations of the sampling sites.
The sampling locations are given as coordinates in the dataset
(doubs$xy) and can thus be plotted as scatterplot to create
a visual overview of the sampling sites.
# I will use ggplot2 to map the locations, so I am loading the package here
library(ggplot2)
# First, load the data
data(doubs)
# Store the sampling coordinates in a new dataframe
coords <- doubs$xy
# plot using ggplot (note the extensive annotations that will make this code easier to understand)
# this calls ggplot and sets the aesthetics (x and y of my dataframe)
ggplot(coords, aes(x = x, y = y)) +
# Plot a line between all consecutive points (light blue and thick).
geom_path(col = "lightblue", lwd = 5) +
# Add labels for the sites 1–30
geom_label(label = c(1:30)) +
# Remove all axes and grids
theme_void() +
# Add a title
labs(title = "Location of the sampling sites for the Doubs dataset")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
There are 30 sites that cover the Doubs at more or less regular intervals, beginning upstream (sample 1) to downstream (sample 30). The shape of the plot closely resembles the outline of the river in a map, which suggests that the original location measurements were accurate and my plotting approach was appropriate.
| Criterion | Max score | Weight per question | Max score for all questions |
|---|---|---|---|
| Format follows description (contains brief explanation of approach, code block, output, and summary) | 10 | 0.25 | 10 |
| Approach appropriate and complete | 10 | 0.5 | 20 |
| Code without errors (i.e., does what it’s supposed to; code “elegance” will not be assessed!) | 10 | 1 | 40 |
| Code annotations (quantity and quality) | 10 | 0.25 | 10 |
| Interpretation of outcome | 10 | 0.25 | 10 |
| Overall format & structure; usage of markdown syntax | 10 | – | 10 |