Project_3: Predicting Soil Moisture from Hyperspectral Reflectance Data

Description

This project analyzes whether soil moisture (%) can be predicted from hyperspectral reflectance data using a Multiple Linear Regression model.
The dataset used is the Hyperspectral Benchmark Soil Moisture Dataset obtained from Zenodo.org.
The dataset contains reflectance values across wavelengths 454–950 nm, soil moisture (%), and soil temperature (°C).

Objective

The primary goal is to determine whether hyperspectral reflectance values can reliably predict soil moisture. content.
A Multiple Linear Regression model is used, and all five regression assumptions are examined using diagnostic plots.

Introduction

Soil moisture is a critical environmental variable influencing plant growth, surface energy balance, hydrology, and agricultural productivity. Hyperspectral reflectance sensors measure the reflectance of soils at extremely narrow and contiguous wavelength intervals, making them powerful tools for estimating soil properties.
This project focuses on determining whether soil moisture can be predicted using reflectance values from 454–950 nm, combined with soil temperature.
Multiple Linear Regression is used due to the continuous nature of the variables, and all model assumptions are thoroughly checked.
Imaging data that allows for the gathering of numerous wavelengths on the electromagnetic spectrum.
This enables the separation of unique materials as their ability to absorb and reflect light differs.
Healthier soils are known to reflect light more than that of rather un healthy soil. The water content they hold are the key to their unique reflective parameters. As noted by Jambhali et al.,“Water dominates the optical reflectance properties of water bearing materials.”
Hyperspectral data often contain hundreds of highly correlated bands, which can lead to multicollinearity. However, regression models can still produce reliable predictions even when individual coefficients may be unstable.

Fields Used

band_454 – band_950 — Quantitative (reflectance at each wavelength)
soil_moisture — Quantitative
soil_temperature — Quantitative

Regression Model Hypotheses

\(H_0\): Hyperspectral reflectance does not predict soil moisture.
\(H_a\): Hyperspectral reflectance does predict soil moisture.

Level of Significance: \(\alpha = 0.05\)

Packages and Data Load

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(car)

## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some

library(broom)

Load dataset

setwd("~/Downloads/25_Semesters/Fall/DATA101")
soil_data <- read_csv("soilmoisture_dataset.csv")

## Rows: 679 Columns: 129
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (128): index, soil_moisture, soil_temperature, 454, 458, 462, 466, 470,...
## dttm   (1): datetime
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Working copy

df <- soil_data

# Structure

str(df)

## spc_tbl_ [679 × 129] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ index           : num [1:679] 0 1 2 3 4 5 6 7 8 9 ...
##  $ datetime        : POSIXct[1:679], format: "2017-05-23 14:06:17" "2017-05-23 14:08:17" ...
##  $ soil_moisture   : num [1:679] 33.5 33.5 33.5 33.3 33.3 ...
##  $ soil_temperature: num [1:679] 34.8 35.2 35.4 35 35.3 35.5 35.4 35.1 35 34.8 ...
##  $ 454             : num [1:679] 0.0821 0.0795 0.0806 0.078 0.08 ...
##  $ 458             : num [1:679] 0.0559 0.0553 0.0541 0.055 0.0553 ...
##  $ 462             : num [1:679] 0.05 0.0491 0.0492 0.0491 0.0493 ...
##  $ 466             : num [1:679] 0.0479 0.0476 0.0475 0.0479 0.0474 ...
##  $ 470             : num [1:679] 0.0475 0.0467 0.0465 0.0469 0.047 ...
##  $ 474             : num [1:679] 0.0465 0.0468 0.046 0.0468 0.047 ...
##  $ 478             : num [1:679] 0.0467 0.0463 0.0463 0.0468 0.0468 ...
##  $ 482             : num [1:679] 0.0468 0.047 0.0469 0.047 0.0471 ...
##  $ 486             : num [1:679] 0.0475 0.0477 0.0472 0.0476 0.0481 ...
##  $ 490             : num [1:679] 0.0486 0.0483 0.0486 0.0485 0.0482 ...
##  $ 494             : num [1:679] 0.0493 0.0491 0.0492 0.0487 0.049 ...
##  $ 498             : num [1:679] 0.0503 0.0503 0.0499 0.0499 0.0501 ...
##  $ 502             : num [1:679] 0.0513 0.0515 0.0511 0.0514 0.0516 ...
##  $ 506             : num [1:679] 0.0532 0.0528 0.0523 0.052 0.053 ...
##  $ 510             : num [1:679] 0.0543 0.0543 0.0539 0.0544 0.0544 ...
##  $ 514             : num [1:679] 0.0559 0.0554 0.0553 0.056 0.0558 ...
##  $ 518             : num [1:679] 0.0575 0.0572 0.0572 0.0573 0.0575 ...
##  $ 522             : num [1:679] 0.0593 0.0588 0.059 0.0589 0.0593 ...
##  $ 526             : num [1:679] 0.061 0.0607 0.0606 0.0609 0.0611 ...
##  $ 530             : num [1:679] 0.0625 0.0621 0.0619 0.0624 0.0625 ...
##  $ 534             : num [1:679] 0.0641 0.0639 0.0635 0.0643 0.0642 ...
##  $ 538             : num [1:679] 0.0662 0.0656 0.066 0.0652 0.0655 ...
##  $ 542             : num [1:679] 0.0678 0.0677 0.0672 0.0672 0.0674 ...
##  $ 546             : num [1:679] 0.0695 0.0691 0.0691 0.0688 0.0693 ...
##  $ 550             : num [1:679] 0.0713 0.0709 0.0712 0.0709 0.0714 ...
##  $ 554             : num [1:679] 0.0729 0.0729 0.0727 0.0733 0.0732 ...
##  $ 558             : num [1:679] 0.075 0.0746 0.0744 0.0748 0.0749 ...
##  $ 562             : num [1:679] 0.0773 0.0764 0.0767 0.0769 0.077 ...
##  $ 566             : num [1:679] 0.0786 0.0787 0.0783 0.0785 0.0789 ...
##  $ 570             : num [1:679] 0.0808 0.0802 0.0801 0.0804 0.0808 ...
##  $ 574             : num [1:679] 0.0823 0.0821 0.0824 0.082 0.0824 ...
##  $ 578             : num [1:679] 0.0849 0.0839 0.0839 0.0838 0.0847 ...
##  $ 582             : num [1:679] 0.0865 0.0859 0.0854 0.0858 0.0864 ...
##  $ 586             : num [1:679] 0.0879 0.0874 0.0869 0.0871 0.0878 ...
##  $ 590             : num [1:679] 0.0893 0.0884 0.0887 0.0886 0.0893 ...
##  $ 594             : num [1:679] 0.0908 0.09 0.0903 0.0901 0.0906 ...
##  $ 598             : num [1:679] 0.0919 0.0909 0.0915 0.0916 0.0918 ...
##  $ 602             : num [1:679] 0.0932 0.0924 0.0927 0.0927 0.0935 ...
##  $ 606             : num [1:679] 0.0944 0.0937 0.094 0.094 0.0943 ...
##  $ 610             : num [1:679] 0.0955 0.0944 0.0945 0.0949 0.0955 ...
##  $ 614             : num [1:679] 0.0965 0.0957 0.0955 0.0958 0.0962 ...
##  $ 618             : num [1:679] 0.0973 0.0965 0.0962 0.0969 0.0971 ...
##  $ 622             : num [1:679] 0.0986 0.0975 0.0972 0.0979 0.0982 ...
##  $ 626             : num [1:679] 0.0997 0.0984 0.0985 0.0989 0.099 ...
##  $ 630             : num [1:679] 0.1003 0.0991 0.0993 0.0998 0.0999 ...
##  $ 634             : num [1:679] 0.101 0.1 0.1 0.1 0.1 ...
##  $ 638             : num [1:679] 0.102 0.101 0.101 0.102 0.102 ...
##  $ 642             : num [1:679] 0.103 0.102 0.102 0.103 0.103 ...
##  $ 646             : num [1:679] 0.104 0.103 0.104 0.104 0.104 ...
##  $ 650             : num [1:679] 0.106 0.104 0.104 0.105 0.105 ...
##  $ 654             : num [1:679] 0.107 0.105 0.106 0.106 0.107 ...
##  $ 658             : num [1:679] 0.108 0.107 0.107 0.107 0.108 ...
##  $ 662             : num [1:679] 0.109 0.107 0.107 0.108 0.109 ...
##  $ 666             : num [1:679] 0.11 0.109 0.109 0.109 0.11 ...
##  $ 670             : num [1:679] 0.111 0.11 0.11 0.111 0.111 ...
##  $ 674             : num [1:679] 0.112 0.112 0.111 0.112 0.112 ...
##  $ 678             : num [1:679] 0.114 0.113 0.113 0.113 0.114 ...
##  $ 682             : num [1:679] 0.115 0.114 0.114 0.115 0.115 ...
##  $ 686             : num [1:679] 0.117 0.115 0.116 0.116 0.116 ...
##  $ 690             : num [1:679] 0.118 0.117 0.116 0.117 0.117 ...
##  $ 694             : num [1:679] 0.119 0.118 0.118 0.118 0.119 ...
##  $ 698             : num [1:679] 0.12 0.119 0.119 0.119 0.12 ...
##  $ 702             : num [1:679] 0.122 0.121 0.12 0.12 0.121 ...
##  $ 706             : num [1:679] 0.123 0.122 0.121 0.122 0.123 ...
##  $ 710             : num [1:679] 0.124 0.123 0.123 0.123 0.124 ...
##  $ 714             : num [1:679] 0.125 0.124 0.124 0.125 0.125 ...
##  $ 718             : num [1:679] 0.127 0.125 0.126 0.126 0.127 ...
##  $ 722             : num [1:679] 0.128 0.127 0.127 0.127 0.128 ...
##  $ 726             : num [1:679] 0.129 0.128 0.128 0.129 0.129 ...
##  $ 730             : num [1:679] 0.131 0.13 0.13 0.13 0.13 ...
##  $ 734             : num [1:679] 0.132 0.131 0.131 0.131 0.131 ...
##  $ 738             : num [1:679] 0.133 0.131 0.132 0.132 0.133 ...
##  $ 742             : num [1:679] 0.134 0.133 0.133 0.134 0.134 ...
##  $ 746             : num [1:679] 0.135 0.135 0.135 0.135 0.135 ...
##  $ 750             : num [1:679] 0.137 0.136 0.136 0.136 0.137 ...
##  $ 754             : num [1:679] 0.138 0.137 0.137 0.137 0.138 ...
##  $ 758             : num [1:679] 0.139 0.137 0.138 0.138 0.139 ...
##  $ 762             : num [1:679] 0.14 0.138 0.138 0.138 0.14 ...
##  $ 766             : num [1:679] 0.14 0.139 0.139 0.139 0.14 ...
##  $ 770             : num [1:679] 0.141 0.14 0.14 0.14 0.141 ...
##  $ 774             : num [1:679] 0.142 0.141 0.14 0.141 0.141 ...
##  $ 778             : num [1:679] 0.142 0.142 0.141 0.142 0.142 ...
##  $ 782             : num [1:679] 0.143 0.142 0.142 0.142 0.143 ...
##  $ 786             : num [1:679] 0.144 0.142 0.143 0.143 0.144 ...
##  $ 790             : num [1:679] 0.145 0.143 0.143 0.144 0.144 ...
##  $ 794             : num [1:679] 0.146 0.144 0.144 0.144 0.145 ...
##  $ 798             : num [1:679] 0.146 0.145 0.145 0.145 0.146 ...
##  $ 802             : num [1:679] 0.146 0.145 0.145 0.145 0.146 ...
##  $ 806             : num [1:679] 0.147 0.145 0.146 0.146 0.147 ...
##  $ 810             : num [1:679] 0.147 0.146 0.146 0.146 0.147 ...
##  $ 814             : num [1:679] 0.147 0.146 0.146 0.147 0.147 ...
##  $ 818             : num [1:679] 0.148 0.146 0.146 0.147 0.147 ...
##  $ 822             : num [1:679] 0.148 0.147 0.146 0.147 0.147 ...
##  $ 826             : num [1:679] 0.148 0.147 0.146 0.147 0.147 ...
##  $ 830             : num [1:679] 0.148 0.147 0.147 0.147 0.148 ...
##   [list output truncated]
##  - attr(*, "spec")=
##   .. cols(
##   ..   index = col_double(),
##   ..   datetime = col_datetime(format = ""),
##   ..   soil_moisture = col_double(),
##   ..   soil_temperature = col_double(),
##   ..   `454` = col_double(),
##   ..   `458` = col_double(),
##   ..   `462` = col_double(),
##   ..   `466` = col_double(),
##   ..   `470` = col_double(),
##   ..   `474` = col_double(),
##   ..   `478` = col_double(),
##   ..   `482` = col_double(),
##   ..   `486` = col_double(),
##   ..   `490` = col_double(),
##   ..   `494` = col_double(),
##   ..   `498` = col_double(),
##   ..   `502` = col_double(),
##   ..   `506` = col_double(),
##   ..   `510` = col_double(),
##   ..   `514` = col_double(),
##   ..   `518` = col_double(),
##   ..   `522` = col_double(),
##   ..   `526` = col_double(),
##   ..   `530` = col_double(),
##   ..   `534` = col_double(),
##   ..   `538` = col_double(),
##   ..   `542` = col_double(),
##   ..   `546` = col_double(),
##   ..   `550` = col_double(),
##   ..   `554` = col_double(),
##   ..   `558` = col_double(),
##   ..   `562` = col_double(),
##   ..   `566` = col_double(),
##   ..   `570` = col_double(),
##   ..   `574` = col_double(),
##   ..   `578` = col_double(),
##   ..   `582` = col_double(),
##   ..   `586` = col_double(),
##   ..   `590` = col_double(),
##   ..   `594` = col_double(),
##   ..   `598` = col_double(),
##   ..   `602` = col_double(),
##   ..   `606` = col_double(),
##   ..   `610` = col_double(),
##   ..   `614` = col_double(),
##   ..   `618` = col_double(),
##   ..   `622` = col_double(),
##   ..   `626` = col_double(),
##   ..   `630` = col_double(),
##   ..   `634` = col_double(),
##   ..   `638` = col_double(),
##   ..   `642` = col_double(),
##   ..   `646` = col_double(),
##   ..   `650` = col_double(),
##   ..   `654` = col_double(),
##   ..   `658` = col_double(),
##   ..   `662` = col_double(),
##   ..   `666` = col_double(),
##   ..   `670` = col_double(),
##   ..   `674` = col_double(),
##   ..   `678` = col_double(),
##   ..   `682` = col_double(),
##   ..   `686` = col_double(),
##   ..   `690` = col_double(),
##   ..   `694` = col_double(),
##   ..   `698` = col_double(),
##   ..   `702` = col_double(),
##   ..   `706` = col_double(),
##   ..   `710` = col_double(),
##   ..   `714` = col_double(),
##   ..   `718` = col_double(),
##   ..   `722` = col_double(),
##   ..   `726` = col_double(),
##   ..   `730` = col_double(),
##   ..   `734` = col_double(),
##   ..   `738` = col_double(),
##   ..   `742` = col_double(),
##   ..   `746` = col_double(),
##   ..   `750` = col_double(),
##   ..   `754` = col_double(),
##   ..   `758` = col_double(),
##   ..   `762` = col_double(),
##   ..   `766` = col_double(),
##   ..   `770` = col_double(),
##   ..   `774` = col_double(),
##   ..   `778` = col_double(),
##   ..   `782` = col_double(),
##   ..   `786` = col_double(),
##   ..   `790` = col_double(),
##   ..   `794` = col_double(),
##   ..   `798` = col_double(),
##   ..   `802` = col_double(),
##   ..   `806` = col_double(),
##   ..   `810` = col_double(),
##   ..   `814` = col_double(),
##   ..   `818` = col_double(),
##   ..   `822` = col_double(),
##   ..   `826` = col_double(),
##   ..   `830` = col_double(),
##   ..   `834` = col_double(),
##   ..   `838` = col_double(),
##   ..   `842` = col_double(),
##   ..   `846` = col_double(),
##   ..   `850` = col_double(),
##   ..   `854` = col_double(),
##   ..   `858` = col_double(),
##   ..   `862` = col_double(),
##   ..   `866` = col_double(),
##   ..   `870` = col_double(),
##   ..   `874` = col_double(),
##   ..   `878` = col_double(),
##   ..   `882` = col_double(),
##   ..   `886` = col_double(),
##   ..   `890` = col_double(),
##   ..   `894` = col_double(),
##   ..   `898` = col_double(),
##   ..   `902` = col_double(),
##   ..   `906` = col_double(),
##   ..   `910` = col_double(),
##   ..   `914` = col_double(),
##   ..   `918` = col_double(),
##   ..   `922` = col_double(),
##   ..   `926` = col_double(),
##   ..   `930` = col_double(),
##   ..   `934` = col_double(),
##   ..   `938` = col_double(),
##   ..   `942` = col_double(),
##   ..   `946` = col_double(),
##   ..   `950` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

# Column names

names(df)

##   [1] "index"            "datetime"         "soil_moisture"   
##   [4] "soil_temperature" "454"              "458"             
##   [7] "462"              "466"              "470"             
##  [10] "474"              "478"              "482"             
##  [13] "486"              "490"              "494"             
##  [16] "498"              "502"              "506"             
##  [19] "510"              "514"              "518"             
##  [22] "522"              "526"              "530"             
##  [25] "534"              "538"              "542"             
##  [28] "546"              "550"              "554"             
##  [31] "558"              "562"              "566"             
##  [34] "570"              "574"              "578"             
##  [37] "582"              "586"              "590"             
##  [40] "594"              "598"              "602"             
##  [43] "606"              "610"              "614"             
##  [46] "618"              "622"              "626"             
##  [49] "630"              "634"              "638"             
##  [52] "642"              "646"              "650"             
##  [55] "654"              "658"              "662"             
##  [58] "666"              "670"              "674"             
##  [61] "678"              "682"              "686"             
##  [64] "690"              "694"              "698"             
##  [67] "702"              "706"              "710"             
##  [70] "714"              "718"              "722"             
##  [73] "726"              "730"              "734"             
##  [76] "738"              "742"              "746"             
##  [79] "750"              "754"              "758"             
##  [82] "762"              "766"              "770"             
##  [85] "774"              "778"              "782"             
##  [88] "786"              "790"              "794"             
##  [91] "798"              "802"              "806"             
##  [94] "810"              "814"              "818"             
##  [97] "822"              "826"              "830"             
## [100] "834"              "838"              "842"             
## [103] "846"              "850"              "854"             
## [106] "858"              "862"              "866"             
## [109] "870"              "874"              "878"             
## [112] "882"              "886"              "890"             
## [115] "894"              "898"              "902"             
## [118] "906"              "910"              "914"             
## [121] "918"              "922"              "926"             
## [124] "930"              "934"              "938"             
## [127] "942"              "946"              "950"

# Preview rows

head(df, 5)

## # A tibble: 5 × 129
##   index datetime            soil_moisture soil_temperature  `454`  `458`  `462`
##   <dbl> <dttm>                      <dbl>            <dbl>  <dbl>  <dbl>  <dbl>
## 1     0 2017-05-23 14:06:17          33.5             34.8 0.0821 0.0559 0.0500
## 2     1 2017-05-23 14:08:17          33.5             35.2 0.0795 0.0553 0.0491
## 3     2 2017-05-23 14:10:17          33.5             35.4 0.0806 0.0541 0.0492
## 4     3 2017-05-23 14:12:17          33.3             35   0.0780 0.0550 0.0491
## 5     4 2017-05-23 14:14:17          33.3             35.3 0.0800 0.0553 0.0493
## # ℹ 122 more variables: `466` <dbl>, `470` <dbl>, `474` <dbl>, `478` <dbl>,
## #   `482` <dbl>, `486` <dbl>, `490` <dbl>, `494` <dbl>, `498` <dbl>,
## #   `502` <dbl>, `506` <dbl>, `510` <dbl>, `514` <dbl>, `518` <dbl>,
## #   `522` <dbl>, `526` <dbl>, `530` <dbl>, `534` <dbl>, `538` <dbl>,
## #   `542` <dbl>, `546` <dbl>, `550` <dbl>, `554` <dbl>, `558` <dbl>,
## #   `562` <dbl>, `566` <dbl>, `570` <dbl>, `574` <dbl>, `578` <dbl>,
## #   `582` <dbl>, `586` <dbl>, `590` <dbl>, `594` <dbl>, `598` <dbl>, …

tail(df, 5)

## # A tibble: 5 × 129
##   index datetime            soil_moisture soil_temperature  `454`  `458`  `462`
##   <dbl> <dttm>                      <dbl>            <dbl>  <dbl>  <dbl>  <dbl>
## 1   677 2017-05-26 14:00:10          30.0             40.5 0.0956 0.0633 0.0549
## 2   678 2017-05-26 14:02:10          29.8             39.5 0.0952 0.0642 0.0548
## 3   679 2017-05-26 14:04:10          29.8             39.5 0.0956 0.0645 0.0558
## 4   680 2017-05-26 14:06:10          29.9             39.5 0.0950 0.0642 0.0550
## 5   681 2017-05-26 14:08:10          29.8             39.7 0.0977 0.0654 0.0561
## # ℹ 122 more variables: `466` <dbl>, `470` <dbl>, `474` <dbl>, `478` <dbl>,
## #   `482` <dbl>, `486` <dbl>, `490` <dbl>, `494` <dbl>, `498` <dbl>,
## #   `502` <dbl>, `506` <dbl>, `510` <dbl>, `514` <dbl>, `518` <dbl>,
## #   `522` <dbl>, `526` <dbl>, `530` <dbl>, `534` <dbl>, `538` <dbl>,
## #   `542` <dbl>, `546` <dbl>, `550` <dbl>, `554` <dbl>, `558` <dbl>,
## #   `562` <dbl>, `566` <dbl>, `570` <dbl>, `574` <dbl>, `578` <dbl>,
## #   `582` <dbl>, `586` <dbl>, `590` <dbl>, `594` <dbl>, `598` <dbl>, …

# Choose a subset of bands

df_clean <- soil_data |>
select(soil_moisture, 
       soil_temperature, 
       `454`, `550`, `650`, `750`, `850`, `950`) |>
drop_na()

head(df_clean )

## # A tibble: 6 × 8
##   soil_moisture soil_temperature  `454`  `550` `650` `750` `850` `950`
##           <dbl>            <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1          33.5             34.8 0.0821 0.0713 0.106 0.137 0.150 0.154
## 2          33.5             35.2 0.0795 0.0709 0.104 0.136 0.147 0.157
## 3          33.5             35.4 0.0806 0.0712 0.104 0.136 0.148 0.154
## 4          33.3             35   0.0780 0.0709 0.105 0.136 0.148 0.158
## 5          33.3             35.3 0.0800 0.0714 0.105 0.137 0.148 0.156
## 6          33.2             35.5 0.0815 0.0707 0.105 0.136 0.148 0.155

summary(df_clean $soil_moisture)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   25.50   28.25   31.77   31.57   34.19   42.50

summary(df_clean $soil_temperature)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   26.40   33.60   36.70   37.50   41.15   47.10

Justification

A multilinear regression model that predicts soil moisture using:
- soil temperature
- reflectance at 454, 550, 650, 750, 850, and 950 nm
These wavelengths represent hyperspectral reflectance bands.
The hyperspectral dataset contains reflectance values from 454–950 nm at 4-nm intervals. Bands 454, 550, 650, 750, 850, and 950 nm were selected because they span distinct regions of the electromagnetic spectrum, capturing the major ways soil moisture affects reflectance. Choosing these spaced-apart wavelengths also reduces multicollinearity while preserving essential spectral information.

# Fit the model

multiple_lm <- lm(soil_moisture ~ soil_temperature + `454` + `550` + `650` + `750` + `850` + `950`,
data = df_clean)

# View summary

summary(multiple_lm)

## 
## Call:
## lm(formula = soil_moisture ~ soil_temperature + `454` + `550` + 
##     `650` + `750` + `850` + `950`, data = df_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6085 -1.0473 -0.2183  1.0229  8.5218 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        57.45896    0.57407 100.091  < 2e-16 ***
## soil_temperature   -0.41870    0.02365 -17.704  < 2e-16 ***
## `454`              62.81845   10.96500   5.729 1.53e-08 ***
## `550`             133.74798   35.41145   3.777 0.000173 ***
## `650`            -133.93142   41.26045  -3.246 0.001229 ** 
## `750`             181.59719   48.85625   3.717 0.000218 ***
## `850`            -177.82961   35.87794  -4.957 9.10e-07 ***
## `950`             -50.62741   15.59957  -3.245 0.001231 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.616 on 671 degrees of freedom
## Multiple R-squared:  0.8055, Adjusted R-squared:  0.8034 
## F-statistic: 396.9 on 7 and 671 DF,  p-value: < 2.2e-16

When all predictors are zero, soil moisture would be roughly 57%.
- As soil temperature increases, soil moisture decreases. For each 1 unit increase in temperature, moisture drops by about 0.42 units.
- Positive estimate: higher reflectance increases predicted soil moisture.
- Negative estimate: higher reflectance decreases predicted soil moisture.
- Significant p-values: all predictors result in values less than the level of signifcance (0.05). Indicating they are significant as predictors in the model.
Based on the results of the R-squared value The model explains about 80% of the variation in soil moisture. Suggesting a strong and meaningful relationship.
The predictions are off by about 1.6 units of soil moisture. However, the model is still highly significant strongly explaing soil moisture.

Linearity

Ideal outcome: Residuals vs Fitted plot should show random scatter around 0.
Violation: Indcation of pattern or curvature.

# Linearity 
plot(multiple_lm, which = 1)

Interpretation
- The model fits well for most of the data. The pattern at the beginning suggests a noticalbe patter indicating a nonlinear relationship. However, because the residuals become random across the majority of the range (middle to higher values), the assumption is not cirtically violated.

Independence of Observations

Ideal outcome: Durbin Watson statistic equals roughly 2, meaning residuals are independent.
Violation: Durbin Watson statistic is near 0 or 4 indicates autocorrelation.

# Independence of Observations
durbinWatsonTest(multiple_lm)

##  lag Autocorrelation D-W Statistic p-value
##    1       0.8774503     0.2423303       0
##  Alternative hypothesis: rho != 0

Intepretation
- The Durbin–Watson statistic is 0.24, which is far from the ideal value of 2. This indicates strong positive autocorrelation in the residuals. The autocorrelation value (0.88) is very high, and the p-value of 0 confirms the statement that this autocorrelation is statistically significant.

Homoscedasticity (constant variance)

Ideal outcome: Residuals have roughly equal spread across fitted values.
Violation: Funnel shape indicates heteroscedasticity.

# Homoscedasticity (constant variance)
plot(multiple_lm, which = 3)

Interpretation
- The model meets the homoscedasiticity assumption. A roughly even spread indicates that the model is stable and reliable across the predicted values.

Normality of Residuals

*Ideal outcome - Points follow the diagonal line → residuals are approximately normal.

violations - Curved pattern or extreme tail deviations

# Normality of Residuals
plot(multiple_lm, which = 2)

Interpretation
- The residuals closely follow the diagonal line, showing no strong curvature or deviations, which means the residuals are approximately normally distributed.

Residuals vs Leverage

Ideal outcome: No points with extremely high leverage or Cook’s distance
Violation: extremely high Cook’s distance

# Residuals vs Leverage 
plot(multiple_lm, which = 5)

Interpretation
- All data points fall within safe limits for leverage and influence, meaning no single observation distorts the model or affects the regression results.

residuals_model <- resid(multiple_lm)

rmse <- sqrt(mean(residuals_model^2))
rmse

## [1] 1.606691

The model results in a Root Mean Square of 1.6%, meaning the soil moisture prediction would be off by 1.6%.

Conclusion

Some wavelengths increase with moisture (454, 550, 750)
Others decrease with moisture (650, 850, 950)
Overall, the regression model shows a good fit and mostly meets key assumptions. The linearity plot indicates that residuals are mostly randomly dispersed across fitted values, with only slight non-linearity at the lower end. The Durbin–Watson test reveals strong positive autocorrelation in the residuals, suggesting that the independence assumption is violated, possibly due to sequential patterns in the data. The homoscedasticity check shows that residuals have a fairly constant spread across predicted values, supporting the assumption of equal residual variance. The normal Q-Q plot indicates that residuals are approximately normally distributed, validating the use of statistical tests and confidence intervals. Ultimately, the residuals vs leverage plot shows a few extreme points with high leverage or influence. However they were not significant enough to properly affect the model.

Project_3: Predicting Soil Moisture from Hyperspectral Reflectance Data

Marvellous Onajobi

2025-12-07

Description

Objective

Introduction

Fields Used

Regression Model Hypotheses

Packages and Data Load

Load dataset

Justification

Linearity

Independence of Observations

Homoscedasticity (constant variance)

Normality of Residuals

Residuals vs Leverage

Conclusion