Code
cat(paste0("My SID is: ", SID)) # DO NOT EDIT THIS LINEMy SID is: 540741612
ENVX2001 Applied Statistical Methods
cat(paste0("My SID is: ", SID)) # DO NOT EDIT THIS LINEMy SID is: 540741612
crop# A tibble: 42 × 4
System Fertiliser Yield Abundance
<chr> <chr> <dbl> <int>
1 diversified yes 7835. 3
2 monoculture no 3874. 9
3 monoculture yes 5890. 11
4 diversified yes 7559. 2
5 monoculture yes 4708. 10
6 monoculture yes 6446. 10
7 diversified yes 7966. 1
8 monoculture no 4402. 9
9 monoculture no 3683. 12
10 monoculture no 3882. 9
# ℹ 32 more rows
str(crop)tibble [42 × 4] (S3: tbl_df/tbl/data.frame)
$ System : chr [1:42] "diversified" "monoculture" "monoculture" "diversified" ...
$ Fertiliser: chr [1:42] "yes" "no" "yes" "yes" ...
$ Yield : num [1:42] 7835 3874 5890 7559 4708 ...
$ Abundance : int [1:42] 3 9 11 2 10 10 1 9 12 9 ...
fertiliser <- as.factor(crop$Fertiliser)
culture <- as.factor(crop$System)crop # display the data# A tibble: 42 × 4
System Fertiliser Yield Abundance
<chr> <chr> <dbl> <int>
1 diversified yes 7835. 3
2 monoculture no 3874. 9
3 monoculture yes 5890. 11
4 diversified yes 7559. 2
5 monoculture yes 4708. 10
6 monoculture yes 6446. 10
7 diversified yes 7966. 1
8 monoculture no 4402. 9
9 monoculture no 3683. 12
10 monoculture no 3882. 9
# ℹ 32 more rows
library(ggplot2)
# Yield distribution by System
ggplot(crop, aes(x = Yield, fill = System)) +
geom_histogram(bins = 15, alpha = 0.7, position = "identity") +
labs(title = "Yield Distribution by System FIG. 1", x = "Yield", y = "Count") +
theme_minimal()# Abundance distribution by System
ggplot(crop, aes(x = Abundance, fill = System)) +
geom_bar(position = "dodge") +
labs(title = "Abundance Distribution by System FIG. 2", x = "Abundance", y = "Count") +
theme_minimal() # Yield vs Abundance scatter plot
ggplot(crop, aes(x = Abundance, y = Yield, color = System, shape = Fertiliser)) +
geom_point(size = 3, alpha = 0.8) +
labs(title = "Yield vs Abundance FIG. 3", x = "Abundance", y = "Yield") +
theme_minimal() +
scale_color_brewer(palette = "Set2")# Yield vs Fertiliser
ggplot(crop, aes(x = Fertiliser, y = Yield, fill = Fertiliser)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Yield by Fertiliser Use FIG. 4", x = "Fertiliser", y = "Yield") +
theme_minimal()# Abundance vs Fertiliser
ggplot(crop, aes(x = Fertiliser, y = Abundance, fill = Fertiliser)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Abundance by Fertiliser Use FIG. 5", x = "Fertiliser", y = "Abundance") +
theme_minimal()#Analysis of skew
# Histogram for Yield
ggplot(crop, aes(x = Yield)) +
geom_histogram(bins = 15, fill = "steelblue", color = "black", alpha = 0.7) +
labs(title = "Histogram of Yield FIG. 6", x = "Yield", y = "Count") +
theme_minimal()# Histogram for Abundance
ggplot(crop, aes(x = Abundance)) +
geom_histogram(bins = 15, fill = "tomato", color = "black", alpha = 0.7) +
labs(title = "Histogram of Abundance FIG. 7", x = "Abundance", y = "Count") +
theme_minimal()##Discuss the implications of the data structure and distribution for data analysis.
Points to consider:
The first point the data implies is that fertiliser did not significantly effect the results of abundance in either monoculter or diversified crops seen in FIG 5. However it does effect yield as seen in FIG. 4 crops with fertiliser have a significatly higher yield. Than crops without fertiliser.
Comparing both cultures we can see that a monoculture will have a lower yield and higher abundance overall. In comparison to a diversified crop which have a significantly greater yield and significantly lower abundance that a monoculture
The potential challenges with analysing this dataset are the variance caused by fertiliser as it could be an equalising factor, effecting a one culture more that another. Furthermore, abundance may effect yield indirectly. Finally, one value may be significantly underrepresented eg. “fertiliser” (yes/no) if this occurs a higher variance may skew results.
I received assistance from CHATGPT to write the code used to visualise this dataset and to fix a few issues with my code when running and saving. b. If you used AI tools, create a list of the tools you used and provide a brief description of how you used them, including your prompts and questions. I used CHATGPT, these are the questions I asked it;
This report was written in RStudio and Quarto. I started by reading the data paper for Jones et al. (2021) to understand the context of the data. The I used CHATGPT to visualise the data allowing me to explore the values in more detail and help me fix bugs in my code.
To complete the report the following resources and tools are used:
Jones et. al (2021), Tutorials/Lectures, CHATGPT. CHATGPT prompts after loading data: “Hi CHATGPT can you visualise this dataset. Include at least one histogram, one boxplot and one geometric plot.”, can you change the yield vs abundance plot to differenciate fertiliser by colour and system by shape”, “can you compare fertiliser to yield and fertiliser to abundance in a 1 to 1 rationalised format”
Failure to journal your use of AI tools appropriately will result in a fail. However, the use of these tools is entirely optional and you can complete the report without them, since we provide enough information from lectures, tutorials and labs for you to complete the report. –>
Provide a list of references, if any, that you used in completing the report. There is no specific reference style required, but you should be consistent in your formatting.
Jones, S. K., Sánchez, A. C., Juventia, S. D., & Estrada-Carmona, N. (2021). A global database of diversified farming effects on biodiversity and yield. Scientific Data, 8(1). https://doi.org/10.1038/s41597-021-01000-y
OpenAI. (2023). ChatGPT (Mar 14 version) [Large language model]. https://chatgpt.com
Do not delete this section. We need this information for reproducibility and integrity checks.
=== Integrity Check Report ===
Time of execution: 2025-03-28 13:32:27
Last modified: 2025-03-28 13:32:22
File creation: 2025-03-28 13:32:22
Data hash: a70c4842bf2394ce5cd665dcd388b5b52ae0e9de7bb8bad68d9c7324e0e2fabd
File hash: 5b390a83fff2720db34297a77aa89b262f228fb616e257e11df985c32dd0ef1a
=== Environment Information ===
Working directory: C:/Users/liamw/OneDrive/Desktop/ENVX2001/ENVX2001-project1-template
User:
Home directory: C:\Users\liamw\OneDrive\Documents
Language:
=== R Session Information ===
R version: R version 4.3.2 (2023-10-31 ucrt)
RStudio version: Not running in RStudio
=== System Information ===
Operating system: Windows
OS version: build 26100
Machine type: x86-64
Node name: LAPTOP-SI6KI3AM
=== Loaded Packages ===
Package Version Attached
digest digest 0.6.35 Yes
lubridate lubridate 1.9.3 Yes
forcats forcats 1.0.0 Yes
stringr stringr 1.5.1 Yes
dplyr dplyr 1.1.4 Yes
purrr purrr 1.0.2 Yes
readr readr 2.1.5 Yes
tidyr tidyr 1.3.1 Yes
tibble tibble 3.2.1 Yes
ggplot2 ggplot2 3.5.1 Yes
tidyverse tidyverse 2.0.0 Yes