Load libraries

Load data into R

edible_plants_2 <- read.csv("edible_plants_cleaned.csv", header = TRUE)
Climate Areas
Climate Areas

Part 1 - Introduction

The Edible Plant Database (EPD), created as a component of the GROW Observatory, is the source of the data examined in this study. Sustainable food growth, land monitoring, and soil moisture sensing were the main objectives of this European Citizen Science project. In particular, the EPD gathers data on 146 edible plant species to offer growers in 12 different European climate zones regionally relevant guidance.

Food security and environmental sustainability depend on an understanding of the unique growing needs of different crops. The effects of local soil conditions and microclimates on plant health are largely unknown to novice gardeners. We can more accurately forecast crop growth and maximize resource utilization in a variety of geographic areas by examining the connections between environmental elements like sunlight and water or management elements like culture groups and nutrient requirements.

Part 2 - Main Research Questions

  1. Do edible plants that require more sunlight also require more water?

  2. Are certain farming groups more likely to need a lot of nutrients to thrive?

  3. Based on temperature class, are certain cultivation groups more likely to be hardier than others?

Part 3 - Exploring the Data (Descriptive Statistics)

Do edible plants that prefer more sun also prefer more water?

## Does R know sunlight is a categorical varible? 

is.factor(edible_plants_2$sunlight)
## [1] FALSE
edible_plants_2$sunlight <- as.factor(edible_plants_2$sunlight)
levels(edible_plants_2$sunlight)
## [1] "Any Sunlight"  "Full sun"      "Partial Shade"
## Does R know if water is a categorical variable?

is.factor(edible_plants_2$water)
## [1] FALSE
edible_plants_2$water <- as.factor(edible_plants_2$water)
levels(edible_plants_2$water)
## [1] "High"      "Low"       "Medium"    "Very High" "Very Low"
# Mosaic Plot

water_table <- table(edible_plants_2$sunlight, edible_plants_2$water)
water_table
##                
##                 High Low Medium Very High Very Low
##   Any Sunlight     6   4     36         0        1
##   Full sun        16  14     54         2        1
##   Partial Shade    3   0      3         0        0
mosaicplot(water_table,col = c("lightyellow", "steelblue", "darkblue"),las = 2,cex.axis = 0.55,main = "Mosaic Plot of Water vs Sunlight",xlab = "Sunlight Preference",ylab = "Water Preference")

Are some cultivation groups more likely to require high nutrients in order to grow?

## Does R know cultivation is a categorical varible? 

is.factor(edible_plants_2$cultivation)
## [1] FALSE
edible_plants_2$cultivation <- as.factor(edible_plants_2$cultivation)
levels(edible_plants_2$cultivation)
##  [1] "Allium"         "Brassica"       "Chenopodiaceae" "Cucurbit"      
##  [5] "Lamiaceae"      "Legume"         "Miscellaneous"  "Salad"         
##  [9] "Solanaceae"     "Umbelliferae"
## Does R know if nutrients is a categorical variable?
is.factor(edible_plants_2$nutrients)
## [1] FALSE
edible_plants_2$nutrients <- as.factor(edible_plants_2$nutrients)
levels(edible_plants_2$nutrients)
## [1] "High"   "Low"    "Medium"
# Mosaic Plot

nutrient_table <- table(edible_plants_2$cultivation, edible_plants_2$nutrients)
nutrient_table
##                 
##                  High Low Medium
##   Allium            0   5      4
##   Brassica         13   4      5
##   Chenopodiaceae    0   2      1
##   Cucurbit          8   0      0
##   Lamiaceae         0   4      0
##   Legume            1   1      8
##   Miscellaneous     2  28     35
##   Salad             1   1      1
##   Solanaceae        5   2      1
##   Umbelliferae      1   3      4
mosaicplot(nutrient_table,col = c("palegreen3","khaki","coral1"),las = 2,cex.axis = 0.55,main = "Mosaic Plot of Cultivation Group vs Nutrients",xlab = "Cultivation Group",ylab = "Nutrient Necessity")

Are some cultivation groups more likely to be hardier than others?

## Does R know cultivation is a categorical variable? 

is.factor(edible_plants_2$cultivation)
## [1] TRUE
edible_plants_2$cultivation <- as.factor(edible_plants_2$cultivation)
levels(edible_plants_2$cultivation)
##  [1] "Allium"         "Brassica"       "Chenopodiaceae" "Cucurbit"      
##  [5] "Lamiaceae"      "Legume"         "Miscellaneous"  "Salad"         
##  [9] "Solanaceae"     "Umbelliferae"
## Does R know if temperature class is a categorical variable?
is.factor(edible_plants_2$temperature_class)
## [1] FALSE
edible_plants_2$temperature_class <- as.factor(edible_plants_2$temperature_class)
levels(edible_plants_2$temperature_class)
## [1] "Half hardy"  "Hardy"       "Tender"      "Very hardy"  "Very tender"
## Mosaic plots

temp_class_table <- table(edible_plants_2$cultivation, edible_plants_2$temperature_class)
temp_class_table
##                 
##                  Half hardy Hardy Tender Very hardy Very tender
##   Allium                  0     3      0          6           0
##   Brassica                3    10      4          5           0
##   Chenopodiaceae          1     0      0          2           0
##   Cucurbit                0     1      7          0           0
##   Lamiaceae               0     2      0          1           1
##   Legume                  3     3      3          1           0
##   Miscellaneous           3    41     13          7           1
##   Salad                   0     1      2          0           0
##   Solanaceae              0     0      8          0           0
##   Umbelliferae            2     6      0          0           0
mosaicplot(temp_class_table,col = c("steelblue1","royalblue3","cyan3","blue4","lightskyblue1"),las = 2,cex.axis = 0.55,main = "Mosaic Plot of Cultivation vs Temp. Class",xlab = "Cultivation",ylab = "Temperature Class")

Part 4 - Statistical Tests (Inferential Statistics)

Do edible plants that prefer more sun also prefer more water?

## Fisher Test

water_fisher.test.results <- fisher.test(water_table)
water_fisher.test.results 
## 
##  Fisher's Exact Test for Count Data
## 
## data:  water_table
## p-value = 0.3099
## alternative hypothesis: two.sided
# p value = 0.3099; no association

Are some cultivation groups more likely to require high nutrients in order to grow?

## Chi Square Test

chisq.test(nutrient_table)
## Warning in chisq.test(nutrient_table): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  nutrient_table
## X-squared = 85.462, df = 18, p-value = 9.331e-11
# p-value = 9.331e-11: There is an association between cultivation group and Nutrients

Are some cultivation groups more likely to be hardier than others?

# Chi Square Test

chisq.test(temp_class_table)
## Warning in chisq.test(temp_class_table): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  temp_class_table
## X-squared = 107.83, df = 36, p-value = 4.296e-09
# p value = 4.296e-09; There is an association between Cultivation Group and Temperature Class

Part 4 - Discussion

According to the inferential statistics used, there was no discernible correlation between edible plants’ preferences for sunlight and their water needs (Fisher’s Exact Test). There was a correlation found between a plant’s cultivation group and its nutrient requirements which had a p value below 0.05 when using a chi square test. Similarly, the analysis of the relationship between cultivation groups and nutrient levels produced a p-value smaller than 0.05, indicating that there is an association.

These findings can help growers make more informed decisions by focusing on cultivation group as a predictor of nutrient demand and plant hardiness, while treating sunlight and water preferences as separate considerations. However, other unmeasured factors, such as soil type, climate variability,and farming practices can influence plant growth. Therefore, while the statistical tests reveal meaningful associations, they do not imply causation, and the results should be interpreted with these limitations in mind.

Part 5 - Conclusion

In summary, our analysis of the Edible Plant Database found no significant association between sunlight preference and water requirements, suggesting that these environmental factors operate independently. On the other hand, there were statistically significant associations between cultivation group and nutrient requirements, as well as between cultivation group and temperature class (p < 0.05). These results indicate that different cultivation groups tend to have distinct nutrient needs and environmental tolerances. Overall, this suggests that while sunlight and water preferences may not be directly linked, cultivation group is an important factor when predicting plant nutrient demands and hardiness.

Part 6 - References

Cite your data + any pictures you used + the resources you used to support your background information in the introduction!

Rfordatascience. (2026, February 3). Tidytuesday/Data/2026/2026-02-03 at main · rfordatascience/tidytuesday. GitHub. https://github.com/rfordatascience/tidytuesday/tree/main/data/2026/2026-02-03