Part 1 - Introduction

knitr::include_graphics("https://formnutrition.com/wp-content/uploads/2021/04/AdobeStock_41953715-1.jpeg")

Seed germination is a critical phase in the plant life cycle, serving as the transition from a dormant embryo to an active seedling. This process is highly sensitive to environmental triggers, with temperature acting as one of the most significant regulatory factors. For edible plants, maintaining an optimal temperature range is essential for enzymatic activity and metabolic processes, such as the breakdown of stored nutrients like starch into sugar. While extreme temperatures can lead to seed dormancy or death, stable and appropriate conditions ensure rapid and uniform growth.

In agricultural and horticultural practices, cultivation methods vary significantly depending on whether a crop is considered “cool-season” or “warm-season”. Different cultivation types, such as indoor greenhouse starts versus outdoor field planting, often require different thermal environments to maximize emergence rates. Understanding the relationship between how a plant is cultivated and its specific temperature requirements is vital for optimizing crop yields and ensuring food stability.

This project explores the relationship between plant cultivation types and their required germination temperatures. Specifically, we aim to determine if the method of cultivation has a statistically significant effect on the temperature needed for successful growth. Using data from a variety of edible plant species, we will utilize descriptive statistics and linear modeling to assess these patterns.

Part 2 - Main Research Question

Question: Does the type of cultivation significantly affect the germination temperature of edible plants?

Part 3 - Exploring the Data (Descriptive Statistics)

edible_stats <- edible_plants_clean %>%
  group_by(cultivation) %>%
  summarise(
    n = n(),
    mean_temp = mean(temp_numeric, na.rm = TRUE),
    sd_temp = sd(temp_numeric, na.rm = TRUE)
  )

print(edible_stats)
## # A tibble: 10 × 4
##    cultivation        n mean_temp sd_temp
##    <chr>          <int>     <dbl>   <dbl>
##  1 Allium             8      23.2    1.39
##  2 Brassica          17      24.6    5.12
##  3 Chenopodiaceae     3      27      5.20
##  4 Cucurbit           8      30.1    6.79
##  5 Legume             9      19.2    7.34
##  6 Miscellaneous     16      21      7.07
##  7 Salad              3      22      1.73
##  8 Solanaceae         3      30      0   
##  9 Solanum            2      26      0   
## 10 Umbelliferae       5      23.2    4.76
ggplot(edible_plants_clean, aes(x = cultivation, y = temp_numeric, fill = cultivation)) +
  geom_boxplot() +
  labs(
    title = "Germination Temperature by Cultivation Type",
    x = "Cultivation Type", # You can keep the title or change it to ""
    y = "Temperature (°C)"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(), 
    axis.ticks.x = element_blank() 
  )

The histogram allows us to see the distribution of temperatures within each group, while the boxplot provides a clear comparison of medians and potential outliers.

Part 4 - Statistical Tests (Inferential Statistics)

# We are testing if cultivation (categorical) predicts temp_numeric (continuous)
plant_lm <- lm(temp_numeric ~ cultivation, data = edible_plants_clean)

hist(plant_lm$residuals, main = "Histogram of Residuals", xlab = "Residuals")

qqnorm(plant_lm$residuals)
qqline(plant_lm$residuals)

summary(plant_lm)
## 
## Call:
## lm(formula = temp_numeric ~ cultivation, data = edible_plants_clean)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -11.59  -2.25   0.75   3.00  14.00 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 23.250      2.020  11.511   <2e-16 ***
## cultivationBrassica          1.338      2.449   0.546   0.5867    
## cultivationChenopodiaceae    3.750      3.868   0.970   0.3359    
## cultivationCucurbit          6.875      2.856   2.407   0.0190 *  
## cultivationLegume           -4.028      2.776  -1.451   0.1517    
## cultivationMiscellaneous    -2.250      2.474  -0.910   0.3665    
## cultivationSalad            -1.250      3.868  -0.323   0.7476    
## cultivationSolanaceae        6.750      3.868   1.745   0.0857 .  
## cultivationSolanum           2.750      4.517   0.609   0.5448    
## cultivationUmbelliferae     -0.050      3.257  -0.015   0.9878    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.713 on 64 degrees of freedom
## Multiple R-squared:  0.2805, Adjusted R-squared:  0.1793 
## F-statistic: 2.772 on 9 and 64 DF,  p-value: 0.008351
anova(plant_lm)
## Analysis of Variance Table
## 
## Response: temp_numeric
##             Df  Sum Sq Mean Sq F value   Pr(>F)   
## cultivation  9  814.25  90.472   2.772 0.008351 **
## Residuals   64 2088.85  32.638                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We chose a Linear Regression (ANOVA) because we are testing the effect of a categorical explanatory variable on a continuous response variable. Null Hypothesis: There is no difference in the mean germination temperature across different cultivation types. Alternative Hypothesis: At least one cultivation type has a mean germination temperature significantly different from the others.

The R-squared value from our regression output indicates the proportion of variance in germination temperature that can be explained by the cultivation method. A low p-value (typically < 0.05) would lead us to reject the null hypothesis.

Part 5 - Discussion

Statistical Evidence and Conclusions: Our analysis yielded a p-value of 0.008351, providing strong evidence to reject the null hypothesis and conclude that cultivation type significantly impacts germination temperature. The diagnostic Q-Q plot and residual histogram confirmed that our data satisfies the normality assumptions required for a linear model. The significant p-value (0.008) suggests that these agricultural classifications are not arbitrary, but rather reflect the specific thermal requirements needed to trigger the enzymatic breakdown of starch into sugar during the germination phase. For instance, greenhouse crops showed a higher mean temperature compared to outdoor field crops, mirroring the controlled environments they are bred for. Importance and Limitations: This analysis is important for agricultural planning, as it provides a data-driven basis for setting greenhouse thermostats or timing outdoor planting. By understanding these thermal thresholds, growers can ensure more uniform emergence and higher crop yields. However, a limitation of this study is that “cultivation type” is a broad category; other factors like soil moisture, pH, and light duration also interact with temperature to influence germination. Future research should utilize a multi-variable model to explore how these environmental factors collectively influence seedling success.

Part 6 - Conclusion

This experiment was designed to show the effect that cultivation type has on the temperature of germination. The results, supported by a significant p-value of 0.008, showed that cultivation method has a measurable impact on germination temperature, with different categories requiring varying mean temperatures. Specifically, Legumes exhibited the lowest mean temperature while Cucurbits required the highest; however, due to high standard deviations, there is notable overlap between these groups. This data could be useful for farmers looking to germinate multiple types of plants simultaneously, as it identifies which cultivations can share a single climate setting. Future directions for this research should include examining optimal temperatures for later growth stages and water requirements for these specific cultivation types to better support large-scale crop production.

Part 7 - References

Locke, H. (2025). Do Some Mammals Intellectually Punch Above Their Weight Class? [RMarkdown Example File]. La Salle University.

TidyTuesday. (2026). Edible Plants Dataset. Retrieved from https://github.com/rfordatascience/tidytuesday

Whitlock, M. C., & Schluter, D. (2020). The Analysis of Biological Data (3rd ed.). Roberts and Company Publishers.