Is the average soil moisture level significantly different across different soil temperature conditions? Understanding the relationship between soil temperature and soil moisture is important in environmental and agricultural studies, as temperature directly affects evaporation rates and water retention in soil. This study examines whether soil moisture levels differ significantly when soil temperature conditions vary.
The dataset used in this analysis is the Hyperspectral Benchmark Dataset on Soil Moisture, collected during a five-day field campaign in Karlsruhe, Germany, in 2017. Each observation represents a single time-stamped soil measurement. The dataset contains over 500 observations and includes variables such as soil moisture percentage, soil temperature in degrees Celsius, and hyperspectral reflectance values across multiple wavelengths. For this analysis, the focus is on soil moisture as the response variable and soil temperature as the explanatory variable. Soil temperature is further categorized into low and high temperature groups to allow for comparison of mean soil moisture levels between conditions. The dataset was collected using a time-domain reflectometry (TDR) sensor for soil moisture and a hyperspectral snapshot camera for surface reflectance measurements.
Data Analys In this section, exploratory data analysis (EDA) is conducted to better understand the distribution and relationship between soil moisture and soil temperature. The dataset is first cleaned by selecting relevant variables and removing missing values. Summary statistics are used to examine central tendencies and variability. Soil temperature is then categorized into low and high temperature groups to facilitate comparison of soil moisture levels across conditions. Visualizations, including histograms and boxplots, are created to assess the distribution of soil moisture and to explore differences between temperature groups. These steps prepare the data for the subsequent hypothesis test.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
soil <- read.csv("soilmoisture_dataset.csv")
dim(soil)
## [1] 679 129
soil_clean <- soil %>%
select(soil_moisture, soil_temperature) %>%
filter(!is.na(soil_moisture), !is.na(soil_temperature))
summary(soil_clean)
## soil_moisture soil_temperature
## Min. :25.50 Min. :26.40
## 1st Qu.:28.25 1st Qu.:33.60
## Median :31.77 Median :36.70
## Mean :31.57 Mean :37.50
## 3rd Qu.:34.19 3rd Qu.:41.15
## Max. :42.50 Max. :47.10
soil_clean <- soil_clean %>%
mutate(
temperature_group = ifelse(
soil_temperature < median(soil_temperature),
"Low Temperature",
"High Temperature"
)
)
table(soil_clean$temperature_group)
##
## High Temperature Low Temperature
## 341 338
ggplot(soil_clean, aes(x = soil_moisture)) +
geom_histogram(bins = 30) +
labs(
title = "Distribution of Soil Moisture",
x = "Soil Moisture (%)",
y = "Frequency"
)
ggplot(soil_clean, aes(x = temperature_group, y = soil_moisture)) +
geom_boxplot() +
labs(
title = "Soil Moisture by Soil Temperature Group",
x = "Soil Temperature Group",
y = "Soil Moisture (%)"
)
Statistical Analysis
To determine whether the mean soil moisture differs between low and high soil temperature conditions, an independent two-sample t-test is conducted. This method is appropriate because soil moisture is a quantitative variable and the two temperature groups are independent of each other. The test compares the mean soil moisture levels across the two temperature categories. The hypotheses are evaluated at a significance level of α = 0.05.
Null Hypothesis (H₀): μ₁ = μ₂
Alternative Hypothesis (H₁): μ₁ ≠ μ₂
Where:
μ₁ = mean soil moisture under low soil temperature
μ₂ = mean soil moisture under high soil temperature
t_test_result <- t.test(
soil_moisture ~ temperature_group,
data = soil_clean,
alternative = "two.sided"
)
t_test_result
##
## Welch Two Sample t-test
##
## data: soil_moisture by temperature_group
## t = -25.326, df = 671.52, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group High Temperature and group Low Temperature is not equal to 0
## 95 percent confidence interval:
## -5.474287 -4.686530
## sample estimates:
## mean in group High Temperature mean in group Low Temperature
## 29.03935 34.11976
soil_clean %>%
group_by(temperature_group) %>%
summarise(
mean_soil_moisture = mean(soil_moisture),
sd_soil_moisture = sd(soil_moisture),
n = n()
)
## # A tibble: 2 × 4
## temperature_group mean_soil_moisture sd_soil_moisture n
## <chr> <dbl> <dbl> <int>
## 1 High Temperature 29.0 2.74 341
## 2 Low Temperature 34.1 2.48 338
The independent two-sample t-test compares the mean soil moisture levels between low and high soil temperature groups. At a significance level of α = 0.05, the decision is based on the p-value produced by the test. If the p-value is less than 0.05, the null hypothesis is rejected, indicating a statistically significant difference in mean soil moisture between temperature conditions. If the p-value is greater than 0.05, the null hypothesis fails to be rejected, suggesting no statistically significant difference. Based on the results of the t-test, the observed p-value indicates that [interpretation depends on output], and therefore the appropriate conclusion regarding the null hypothesis is reached
Hypotheses
H₀: μ₁ = μ₂ H₁: μ₁ ≠ μ₂
At a significance level of α = 0.05, the p-value is less than α. Therefore, the null hypothesis is rejected. This indicates that there is a statistically significant difference in mean soil moisture between low and high soil temperature conditions.
At a significance level of α = 0.05, the p-value is greater than α. Therefore, the null hypothesis fails to be rejected. This suggests that there is no statistically significant difference in mean soil moisture between low and high soil temperature conditions.
At a significance level of α = 0.05, the p-value obtained from the independent two-sample t-test is less than 2.2 × 10⁻¹⁶, which is far smaller than α. Therefore, the null hypothesis is rejected. This provides strong statistical evidence that the mean soil moisture level differs significantly between low and high soil temperature conditions.
Conclusion
This study examined whether mean soil moisture levels differ significantly between low and high soil temperature conditions using data from a hyperspectral soil moisture dataset. The results of the independent two-sample t-test showed a statistically significant difference in mean soil moisture between the two temperature groups. Specifically, soil moisture levels varied substantially as soil temperature changed, indicating that temperature plays an important role in influencing soil water content.
These findings have important implications for environmental monitoring and agricultural management, as understanding how soil temperature affects moisture can improve irrigation planning and soil conservation strategies. Future research could expand this analysis by treating soil temperature as a continuous variable and using regression methods to better model the relationship between temperature and soil moisture. Additionally, incorporating other environmental factors, such as soil type or weather conditions, could provide a more comprehensive understanding of the mechanisms affecting soil moisture variability.