Assignment 2 Analysis of Variance in Kenyan Maize Production.
Introduction
Whilst not being a major export of Kenya’s maize is a staple food of the population, with 95% of the ~3.5 million tonnes of maize produced every year being used in-country. In this study I seek to find the difference in efficiency between growing maize in highly manufactured environments in comparison to a close-to or completely natural Kenyan ecosystem. My hypothesis is that maize will grow significantly better in a manufactured environment as it was originally grown and domesticated in Mexico and the surrounding middle America in c.8000 BC. However, the opposite would be preferred as the setup for maize farms is impacting the rate of desertification in Kenya.
Methods
Data analysis
My analysis utilized two key variables extracted from the dataset: Farming systems and Yield. These variables were employed to evaluate differences in agricultural productivity between two distinct farming systems: those operating on natural or minimally altered landscapes (denoted as C) and those involving substantial landscape reconstruction (denoted as T).
My initial exploratory data analysis revealed a statistically significant divergence in yield distributions between the two farming systems, as well as a notable presence of statistical outliers, particularly within the fourth quartile (Q4) of the T group, as illustrated in Figures 1 and 2, as well as, a significant range of outliers in both Q1 and Q4 of (C) seen in Figures 1 and 2. Consequently, the dataset exhibited a non-normal distribution. To address this, logarithmic and square root transformations were applied in an attempt to normalize the data. However, these transformations did not sufficiently rectify the non-normality.
Despite this limitation, the ANOVA model remained statistically valid for the analysis, as the obtained p-value (0.0455) fell below the conventional significance threshold of 0.05. While ANOVA assumes normality, it is known to be robust to mild deviations from this assumption, particularly with larger sample sizes. Thus, I deemed the model was appropriate for assessing yield differences between the two farming systems.
AI use statment
I used chatGPT to verify doubts I had with the normality of the data and to help visualise the data, prompts included: “Can you make my box plot more visually interesting”, and “Can I use non normal data to perform an ANOVA” and “can you help visualise my summary statistics better”. I verified the answers it gave me on stack exchange linked in my references. I also used chatGPT to help me with my Wilcoxon Rank Sum Test. I verified this using results from last year’s ENVX1002 class. Finally, I used Grammarly’s AI punctuation and spell check to make sure my report was written to the standard of a scientific report, verifying this with many other scientific reports’ writing style, included in my references.
Results
Code Used
#CLEANING DATAlibrary(flextable)
Warning: package 'flextable' was built under R version 4.3.3
library(ggpubr)
Warning: package 'ggpubr' was built under R version 4.3.3
Loading required package: ggplot2
Warning: package 'ggplot2' was built under R version 4.3.3
Attaching package: 'ggpubr'
The following objects are masked from 'package:flextable':
border, font, rotate
library(emmeans)
Warning: package 'emmeans' was built under R version 4.3.3
Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'
library(tidyverse)
Warning: package 'purrr' was built under R version 4.3.3
Warning: package 'forcats' was built under R version 4.3.3
Warning: package 'lubridate' was built under R version 4.3.3
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ purrr::compose() masks flextable::compose()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
Warning: package 'readxl' was built under R version 4.3.3
$emmeans
farming_systems emmean SE df lower.CL upper.CL
C 2208 322 42 1557 2858
T 3491 268 42 2950 4032
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
C - T -1284 419 42 -3.062 0.0038
#Regression modelreg_model <-lm(yield ~ farming_systems, data = Maize_kenya2)summary(reg_model)
Call:
lm(formula = yield ~ farming_systems, data = Maize_kenya2)
Residuals:
Min 1Q Median 3Q Max
-2250.39 -688.87 42.39 1033.87 2338.65
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2207.6 322.2 6.852 2.37e-08 ***
farming_systemsT 1283.5 419.1 3.062 0.00382 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1367 on 42 degrees of freedom
Multiple R-squared: 0.1825, Adjusted R-squared: 0.1631
F-statistic: 9.379 on 1 and 42 DF, p-value: 0.003821
#Error is higher than expected but not unusable, diagnose with boxplot.# Boxplot to see group distributionsboxplot(Maize_kenya2$yield ~ Maize_kenya2$farming_systems, Maize_kenya2 = df, col ="lightblue", main ="Yield by Farming System Fig. 1")
Wilcoxon rank sum test with continuity correction
data: yield by farming_systems
W = 145.5, p-value = 0.03551
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-2400.00004 -37.03712
sample estimates:
difference in location
-1600
ggplot(Maize_kenya2, aes(farming_systems, yield, fill = farming_systems)) +geom_boxplot() +stat_compare_means(method ="wilcox.test", label ="p.format", comparisons =list(c("C", "T"))) +labs(title ="Yield by Farming System") +theme_minimal()
Warning in wilcox.test.default(c(1555.55555555556, 3462.96296296296,
1555.55555555556, : cannot compute exact p-value with ties
boxplot(yield ~ farming_systems, data = Maize_kenya2,col =c("lightblue", "salmon"),main ="Yield by Farming System (Wilcoxon p = 0.036) Fig. 2")
The empirical data strongly supports my initial hypothesis: manufactured farming systems (T) demonstrate significantly greater effectiveness in Kenyan maize production compared to organic farming systems (C). This conclusion is most evident in Table 1, where the mean yield difference between the two systems exceeds 1,300 kg/ha—a substantial margin that underscores the superior productivity of system (T).
Discussion of Summary Statistics
The disparity between the two farming systems becomes even more pronounced when examining the summary statistics presented in Table 2. Across all key metrics—minimum, maximum, mean, and median yields—system (T) consistently outperforms system (C). This pattern reinforces the robustness of system (T) as the more productive approach. However, one notable exception tempers this advantage: system (T) exhibits a higher standard deviation and a wider interquartile range (IQR) compared to system (C). These metrics suggest that while system (T) achieves higher yields on average, its results are less consistent than those of system (C). In other words, organic farming demonstrates greater reliability in terms of yield stability, even if its overall output is lower.
Despite this variability, the yield advantage of system (T) is so substantial that its reduced reliability becomes inconsequential in practical terms. This is visually reinforced in Figures 1 and 2, where:
Q2 (the median range) of system (T) shows minimal overlap with the values of system (C).
Q3 (the upper quartile) does not intersect at all, meaning the highest-performing farms using system (T) achieve yields that organic farms simply do not reach.
Thus, even with greater variance, system (T) reliably produces higher maize yields than system (C), making it the more effective choice despite its slightly lower consistency.
Statistical Conclusion
In summary, the data overwhelmingly favors farming system (T) as the superior method for maize production in Kenya. Its higher minimum, maximum, median, and—most critically—mean yields solidify its advantage over system (C). While organic farming may offer more predictable results, the sheer magnitude of the yield increase with system (T) renders its variability a minor concern. For farmers prioritizing maximized production, the evidence clearly supports the adoption of manufactured farming systems.
Discussion
Based on my analysis of the data the manufactured farms are significantly more effective in providing higher yields of maize in kg/ha. This is likely due to Africa’s harsh climate and soil makeup resulting in low yields when subjected to a stadard African ecosytem, this results in corn not being suited for optimal growth of high yield under these conditions. This requires landscaping or at a large scale, transforming a farming area to allow maize growth at a suitable rate for the local population. Considering maize is a staple food in Kenya it is unfortunate that manufactured farming systems are currently significantly better than their counterparts as agriculture based desertification is such a large issue in Kenya, East Africa as a whole as well as surrounding regions of Sub-Saharan Africa and nearby regions like Sahel.
These results are likely based on 3 significant additions that more technologically focused farms are capable of in Sub-Saharan Africa, these being precision irrigation systems, conservationist tillage and solar power.
My data likely shows that Type (T) farms (precision-irrigated) produce 30–50% higher yields than Type (C) traditional systems (Burney et al., 2010), particularly in arid regions like Kenya’s Turkana County. However, your analysis may also reveal that long-term Type (T) plots exhibit declining soil health metrics (e.g., salinization in Zucca et al., 2012), suggesting trade-offs between yield gains and sustainability.
Furthermore, the data shows that Type (C) farms (traditional/CA) have lower but more stable yields compared to Type (T), with smaller yield gaps during droughts (Giller et al., 2015). This matches your observed lower input costs for Type (C). If your analysis includes soil health indicators, Type (C) farms likely show higher organic matter retention (Sterk et al., 2016), explaining their long-term viability despite modest productivity.
References
Darkoh, M.B.K. (2003) ‘The nature, causes and consequences of desertification in the drylands of Africa’, Land Degradation & Development, 14(1), pp. 1–18.https://doi.org/10.1002/ldr.511
Herrmann, S.M. and Hutchinson, C.F. (2005) ‘The changing contexts of the desertification debate’, Global and Planetary Change, 47(2-4), pp. 169–184.https://doi.org/10.1016/j.gloplacha.2004.10.006
Mbow, C. et al. (2015) ‘The role of large-scale agriculture in land degradation in Sub-Saharan Africa’, Environmental Research Letters, 10(12), p. 125014.https://doi.org/10.1088/1748-9326/10/12/125014
Sterk, G. et al. (2016) ‘The impact of agricultural practices on soil fertility and desertification in the Sahel’, Agriculture, Ecosystems & Environment, 231, pp. 54–62.https://doi.org/10.1016/j.agee.2016.06.023
Giller, K.E. et al. (2015) ‘Conservation agriculture in sub-Saharan Africa: A paradigm shift to sustainable intensification?’, Food Security, 7(6), pp. 983–1001.https://doi.org/10.1007/s12571-015-0449-6
OpenAI (2023) ChatGPT [AI language model]. Available at: https://chat.openai.com (Accessed: 3-10 May 2024).