Feel free to mark this on https://rpubs.com/PontSatyre11119/eeb313-assignment-2
To submit this assignment, upload the full document to Quercus,
including the original questions, your code, and the output. Submit your
assignment as a knitted .pdf (preferred),
.docx, or .html file.
Run the code below to create a categorical variable of the
activ column. This will make dplyr recognize that there are
only two levels of activity (0 and 1), rather than a continuous range
0-1, which will facilitate plotting.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
beaver1_f <- beaver1 %>%
mutate(factor_activ = factor(activ)) %>%
group_by(day) %>%
mutate(count = n_distinct(activ))
(unique(beaver1_f$activ)) # here we have only two factors now
## [1] 0 1
Hint: You need to use the fill parameter rather than
color.
library(ggplot2)
library(dplyr)
ggplot(beaver1_f, aes(fill = factor_activ)) +
geom_histogram(aes(x = temp), bins = 20) + # this makes a bar plot using the factors made beforehand
labs(title = "Beaver Temperature vs Activity", x = "Temperature C", y = "Count", fill = "Activity") +
theme_classic()
Hint: To separate measurements per day, you could use
filter() and two chunks of code, but try the more efficient
way of facetting into subplots, which we talked about in the
lecture.
ggplot(beaver1_f, aes(color = factor_activ)) +
geom_point(aes(x = time, y = temp)) + # this makes a scatterplot with x-axis being time and y-axis being temperature
facet_wrap(vars(day), scales = "free_x") + # we are facetting by day
theme_classic() +
labs(title = "Beaver Temperature over Time", subtitle = "For days 346 and 347", color = "Activity", x = "Time (hhmm)", y = "Temperature C")
Ok, that’s enough about beaver body temperatures. Now you will apply your data wrangling skills on the yearly change in biomass of plants in the beautiful Abisko national park in northern Sweden. We have preprocessed this data and made it available as a csv-file via this link. You can find the original data and a short readme on figshare and dryad. The original study1 is available un an open access license. Reading through the readme on figshare, and the study abstract will increase your understanding for working with the data.
plant_biomass and display the first six rows. (0.25
mark)plant_biomass <- read.csv("plant-biomass-preprocess.csv") # reads the csv
head(plant_biomass) # displays the top 6
Hint: Search online to find out which Latin and English names pair up.
colnames(plant_biomass) <- (c("Year", "Site", "Habitat", "Treatment", "Bilberry", "Bog Bilberry", "Crowberry", "Dwarf Birch", "Lingonberry", "Wavy Hair Grass")) # Here I changed the column names to the common names with correct formatting. It makes it easier for plotting later on.
plant_biomass
species) and the measurement values into another column
(biomass). Assign it to the variable
plant_long. (0.5 marks)plant_long <- pivot_longer(plant_biomass, names_to = "species", values_to = "biomass", cols = -c(Year, Site, Habitat, Treatment)) # I am making the data from wide to long format
head(plant_long) # here is the top 6
Now that our data is in a tidy format, we can start exploring it!
bio_filt <- plant_long %>% # I removed all NAs and values less than 0. You can't have biomass that is less than 0.
filter(!is.na(biomass) & biomass > 0)
bio_filt %>% # this is my new df
summarize(mean(biomass)) # this calculates the mean for everything
bio_filt %>%
group_by(Treatment) %>% # we want to calculate by treatment
summarize(mean(biomass)) # this performs the mean calculation
bio_filt %>%
group_by(Year) %>% # we want to calculate by year
summarize(mean = mean(biomass)) # i am mean
grazedcontrol and rodentexclosure groups
(spread these variables as separate columns in a table)? (0.5
marks)bio_filt %>%
group_by(Treatment, Year, species) %>% # this makes summarize work
summarize(mean = mean(biomass)) %>% # calculate means biomass for each species based on treatment
pivot_wider(names_from = Treatment, values_from = mean) # makes the long df wider with treatment as two separate columns
## `summarise()` has grouped output by 'Treatment', 'Year'. You can override using
## the `.groups` argument.
grazedcontrol with that of
rodentexclosure graphically in a line plot. What could
explain the big dip in biomass year 2012? (0.75 marks)library(ggplot2)
bio_filt %>%
group_by(Year, Treatment) %>% # we want to compare the biomass over time separated by treatment
summarize(mean = mean(biomass)) %>% # calculates mean biomass
ggplot(aes(x = Year, y = mean, color = Treatment)) +
geom_line() + # makes a line plot with our requested analysis
labs(title = "Average Biomass by Treatment", x = "Year", y = "Yearly Biomas", color = "Treatment", subtitle = "The dip in biomass is explained as a comment.") +
scale_colour_discrete(labels = c("Grazed Control", "Rodent Exclosure")) +
theme_classic()
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
# Note the dip in 2012 is caused by an outbreak of geometrid moths causing herbivory on plant buds and leaves. An increase in pathogenic fungus also ensued, killing many plants.
Hint: The published study might be able to help with the second question…
bio_filt %>%
group_by(species) %>%
tally()
# There is not an equal number of observations per species. However, the data is good enough for graphing. We might want to do an ANOVA or calculate variances.
bio_filt %>%
group_by(Year, species) %>% # makes the summarize function usable
summarize(mean = mean(biomass)) %>% # calculates mean biomass
ggplot(aes(x = Year, y = mean, color = species)) +
geom_line() + # make a line diagram for each species's mean biomass over time
labs(title = "Change in Biomass per Species", x = "Year", y = "Mean Biomass", color = "Species") +
scale_colour_discrete(labels = c("Bilberry", "Bog Bilberry", "Crowberry", "Dwarf Birch", "Lingonberry", "Wavy Hair Grass")) +
theme_classic()
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
rodentexclosure and crowberry are of similar
shape. Coincidence? Let’s find out! Use a facetted line plot to explore
whether all plant species are impacted equally by grazing. (0.75
mark)bio_filt %>%
group_by(Year, species, Treatment) %>% # makes the summarize function usable
summarize(mean = mean(biomass)) %>% # calculates the mean
ggplot(aes(x = Year, y = mean, color = Treatment)) +
geom_line() + # make x-axis time and y-axis mean biomass, with points being treatment
facet_wrap(vars(species), scales = "free_x") +
labs(title = "Change in Biomass by Treatment", x = "Year", y = "Mean Biomass", color = "Species") +
scale_colour_discrete(labels = c("Grazed Control", "Rodent Exclosure")) +
theme_classic()
## `summarise()` has grouped output by 'Year', 'species'. You can override using
## the `.groups` argument.
bio_filt %>%
group_by(Year, species, Habitat) %>% # again makes the summarize function usable
summarize(mean = mean(biomass)) %>% # calculates the mean
ggplot(aes(x = Year, y = mean, color = Habitat)) +
geom_line() + # make x-axis time and y-axis mean biomass, with points being habitat
facet_wrap(vars(species), scales = "free_x") +
labs(title = "Change in Biomass by Habitat", x = "Year", y = "Mean Biomass", color = "Habitat") +
scale_colour_discrete(labels = c("Forest", "Tundra")) +
theme_classic()
## `summarise()` has grouped output by 'Year', 'species'. You can override using
## the `.groups` argument.
bio_filt %>%
group_by(Year, species, Habitat, Treatment) %>% # essentially treating this as select() by this point
summarize(mean = mean(biomass)) %>% # add a mean biomass column
ggplot(aes(x = Year, y = mean, color = species), labels = (c("Grazed Control", "Rodent Exclosure"))) +
geom_line() + # plot x-axis as time and y-axis as mean biomass, with species as points
facet_grid(Habitat ~ Treatment) + # compare habitat with treatment
labs(title = "Change in Biomass by Habitat & Treatment", x = "Year", y = "Mean Biomass", color = "Species") +
theme_classic()
## `summarise()` has grouped output by 'Year', 'species', 'Habitat'. You can
## override using the `.groups` argument.
Hint: This is a hard one! You may want to explore R’s
documentation for ggplot’s facet_grid
squared_biomass observations in
descending order. Only include the columns year,
squared_biomass and species and only
observations between the years 2003 and 2008 from the forest
habitat.Hint: Break this down into single criteria and add one at a time. You will be able to obtain the desired result with five operations (1 mark)
(bio_filt %>%
mutate(squared_biomass = biomass^2) %>% # add a column of squared biomass
filter(Year == c(2003:2008) & Habitat == "Forest") %>% # filter by years and habitat
select(Year, squared_biomass, species) %>% # only show the three columns
arrange(desc(squared_biomass)) # arrange by ascending and sort by descending
)[1:3,] # select the first three
Olofsson J, te Beest M, Ericson L (2013) Complex biotic interactions drive long-term vegetation dynamics in a subarctic ecosystem. Philosophical Transactions of the Royal Society B 368(1624): 20120486. https://dx.doi.org/10.1098/rstb.2012.0486↩︎