proj01

Quarto

My SID is: 490290987

Introduction

Data Summary

head(crop)
# A tibble: 6 × 4
  System      Fertiliser  Yield Abundance
  <chr>       <chr>       <dbl>     <int>
1 monoculture yes         6352.        10
2 monoculture yes         5814.         8
3 monoculture no          5433.         9
4 diversified yes         9007.         2
5 diversified yes        10507.         1
6 monoculture no          4995.        10

Response variables - Yield, Abundance

Predictor variables - System, Fertiliser

crop %>%
  group_by(System, Fertiliser) %>%
  summarise(
    Mean_Yield = mean(Yield),
    SD_Yield = sd(Yield),
    Mean_Abundance = mean(Abundance),
    SD_Abundance = sd(Abundance)
  )
`summarise()` has grouped output by 'System'. You can override using the
`.groups` argument.
# A tibble: 4 × 6
# Groups:   System [2]
  System      Fertiliser Mean_Yield SD_Yield Mean_Abundance SD_Abundance
  <chr>       <chr>           <dbl>    <dbl>          <dbl>        <dbl>
1 diversified no              7211.     983.           1.6          1.65
2 diversified yes             9528.     537.           2.17         1.47
3 monoculture no              4513.     875.           9.58         2.50
4 monoculture yes             5695.     813.           9.79         3.04
ggplot(crop, aes(x = System, y = Yield, fill = System)) +
  geom_boxplot() +
  labs(title = "Yield by Cropping System", x = "System", y = "Yield")

ggplot(crop, aes(x = Fertiliser, y = Yield, fill = Fertiliser)) +
  geom_boxplot() +
  labs(title = "Yield by Fertiliser", x = "Fertiliser", y = "Yield")

ggplot(crop, aes(x = Abundance, y = Yield, color = System)) +
  geom_point(alpha = 0.6) +
  labs(title = "Yield vs. Abundance", x = "Abundance", y = "Yield")

Discussion

a. Implications of the Data Structure and Distribution for Analysis

Based on the table output, we can derive several key insights:

  1. Impact of Fertiliser on Yield
    • For monoculture, adding fertilizer increases mean yield from 4513.31 to 5695.32, a ~26% increase.
    • For diversified systems, fertilizer significantly boosts yield from 7211.35 to 9528.02, a ~32% increase.
    • This suggests fertilizer has a strong positive impact on yield, particularly in diversified systems.
  2. Effect of Cropping System on Yield
    • Diversified farming without fertilizer (7211.35) yields more than monoculture with fertilizer (5695.32).
    • Diversified farming with fertilizer reaches near the 9600 upper bound, suggesting a ceiling effect.
  3. Abundance Differences Between Systems
    • Monoculture has much higher species abundance (~9.6-9.8) compared to diversified farming (~1.6-2.2).
    • This aligns with the simulation setup: monoculture was designed to have λ = 10 while diversified had λ = 2.
    • This implies that species richness is significantly reduced in diversified systems, possibly due to competitive exclusion or environmental factors.
  4. Yield Variability (Standard Deviation Analysis)
    • Diversified systems have lower SD in yield (~537-983) compared to monoculture (~813-875).
    • This suggests more consistent yields in diversified systems, while monoculture yields fluctuate more.
    • Monoculture SD for abundance is also higher (~2.5-3.0), indicating more variability in species richness than in diversified systems.

b. Potential Challenges in Analyzing the Dataset

1. Ceiling Effect in Yield

  • The maximum possible yield is 9600, and fertilized diversified systems approach this limit (Mean: 9528).
  • This could skew the results, making it hard to assess fertilizer’s true effect beyond this threshold.
  • A transformation (e.g., log transformation) may be needed to properly analyze the yield distribution.

2. Highly Skewed Distributions

  • Abundance values are discrete (Poisson-distributed), with monoculture having much higher counts than diversified.
  • This makes standard statistical tests (e.g., t-tests, ANOVA) less reliable since Poisson data is not normally distributed.
  • A non-parametric test (e.g., Kruskal-Wallis test) might be more appropriate for abundance analysis.

3. Unequal Variability (Heteroscedasticity)

  • Yield SD differs across groups, especially between monoculture and diversified systems.
  • This violates assumptions of parametric tests (e.g., ANOVA, linear regression), which assume equal variances.
  • Possible solutions:
    • Use robust regression methods (e.g., weighted least squares).
    • Apply data transformations (e.g., log or square root transformation) to stabilize variance.

4. Interaction Effects Between Fertiliser and System

  • There may be an interaction effect: fertilizer benefits diversified systems more than monoculture.
  • A two-way ANOVA or linear regression model with interaction terms can help quantify this.

5. Sample Size Considerations

  • If the dataset is small, statistical power may be an issue, making it harder to detect meaningful effects.
  • A power analysis can help determine if we need more data.

Conclusion: Best Next Steps for Analysis

  1. Check normality of Yield and Abundance using histograms and Shapiro-Wilk tests.
  2. Test for heteroscedasticity using Levene’s test.
  3. Use a two-way ANOVA or regression model with an interaction term to assess the combined effects of System and Fertiliser on Yield.
  4. Use Poisson or negative binomial regression to analyze Abundance, since it is count data.
  5. Consider transformations (e.g., log transformation for Yield) if needed.

Would you like help with R code for any of these analyses? 🚀

Acknowledgements & statement of originality

ChatGPT - Used to explain how the synthetic data is generated and what it would look like. Used to generate the code for the summary table, from the prompt: “Given [head(crop) output], summarize yield and abundance depending on system and fertiliser”. Used to provide ideas for the discussion section by prompting it with the data from table1 with the question “What are the implications of the data structure and distribution for data analysis?”. This output was used to provide a starting point and was not used verbatim.