The project aims to utilize a particular crop related dataset (Paddy / Rice Dataset from UC Irvene Machine Learning repository utilized in current case), which contains multiple agronomic, environmental, and crop‑related features, for the purpose of dimension reduction. Modern agricultural research increasingly relies on large, feature‑rich datasets to understand crop performance, optimize cultivation practices, and support data‑driven decision‑making. As farming conditions, climate patterns, and crop varieties evolve, the volume and complexity of agricultural yield continues to grow which is an expected practice.
For current project, we utilize the full Paddy Dataset because all feature groups—soil characteristics, climate variables, crop breed or traits, and management practices—contribute to understanding paddy or rice growth patterns. Small variations across a few selective features can significantly influence yield, making dimension reduction a valuable tool for uncovering underlying structure in the dataset. And, accordingly the results can be utilized to harness parameters which influence paddy production volume the most for real world cultivation suggestions.
Keywords : Paddy Dataset, Agriculture, Crop Features, Dimensionality Reduction, PCA, MCA, UMAP, Isomap
library(tidyverse); # library utilized for ggplot2 and other functions
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
Data Set Name - Paddy Dataset
Viewing dataset features and data header rows (top 5) for reference. Also, utilizing summary and str functions to notice data structure.
df <- read_csv("./paddydataset.csv");
## Rows: 2789 Columns: 45
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Agriblock, Variety, Soil Types, Nursery, Wind Direction_D1_D30, Wi...
## dbl (37): Hectares, Seedrate(in Kg), LP_Mainfield(in Tonnes), Nursery area (...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(df);
## spc_tbl_ [2,789 × 45] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Hectares : num [1:2789] 6 6 6 6 6 6 6 6 6 6 ...
## $ Agriblock : chr [1:2789] "Cuddalore" "Kurinjipadi" "Panruti" "Kallakurichi" ...
## $ Variety : chr [1:2789] "CO_43" "ponmani" "delux ponni" "CO_43" ...
## $ Soil Types : chr [1:2789] "alluvial" "clay" "alluvial" "clay" ...
## $ Seedrate(in Kg) : num [1:2789] 150 150 150 150 150 150 150 150 150 150 ...
## $ LP_Mainfield(in Tonnes) : num [1:2789] 75 75 75 75 75 75 75 75 75 75 ...
## $ Nursery : chr [1:2789] "dry" "wet" "dry" "wet" ...
## $ Nursery area (Cents) : num [1:2789] 120 120 120 120 120 120 120 120 120 120 ...
## $ LP_nurseryarea(in Tonnes) : num [1:2789] 6 6 6 6 6 6 6 6 6 6 ...
## $ DAP_20days : num [1:2789] 240 240 240 240 240 240 240 240 240 240 ...
## $ Weed28D_thiobencarb : num [1:2789] 12 12 12 12 12 12 12 12 12 12 ...
## $ Urea_40Days : num [1:2789] 163 163 163 163 163 ...
## $ Potassh_50Days : num [1:2789] 62.3 62.3 62.3 62.3 62.3 ...
## $ Micronutrients_70Days : num [1:2789] 90 90 90 90 90 90 90 90 90 90 ...
## $ Pest_60Day(in ml) : num [1:2789] 3600 3600 3600 3600 3600 3600 3600 3600 3600 3600 ...
## $ 30DRain( in mm) : num [1:2789] 19.6 19.6 18.5 18.5 18.1 18.1 19.6 19.6 18.5 18.5 ...
## $ 30DAI(in mm) : num [1:2789] 20.4 20.4 21.5 21.5 21.9 21.9 20.4 20.4 21.5 21.5 ...
## $ 30_50DRain( in mm) : num [1:2789] 187 187 185 185 186 ...
## $ 30_50DAI(in mm) : num [1:2789] 271 271 273 273 272 ...
## $ 51_70DRain(in mm) : num [1:2789] 167 167 165 165 166 ...
## $ 51_70AI(in mm) : num [1:2789] 250 250 252 252 251 ...
## $ 71_105DRain(in mm) : num [1:2789] 61 61 60 60 60.2 60.2 61 61 60 60 ...
## $ 71_105DAI(in mm) : num [1:2789] 64 64 65 65 64.8 64.8 64 64 65 65 ...
## $ Min temp_D1_D30 : num [1:2789] 18.5 19.5 20 19 20.5 18 18.5 19.5 20 19 ...
## $ Max temp_D1_D30 : num [1:2789] 34 34 35 33 32 31 34 34 35 33 ...
## $ Min temp_D31_D60 : num [1:2789] 16 18.5 18 17 17.5 15.5 16 18.5 18 17 ...
## $ Max temp_D31_D60 : num [1:2789] 30 35 30 32 28 34 30 35 30 32 ...
## $ Min temp_D61_D90 : num [1:2789] 15.5 17 17.5 16.5 18 15 15.5 17 17.5 16.5 ...
## $ Max temp_D61_D90 : num [1:2789] 31 32.5 33.5 31.5 34 33 31 32.5 33.5 31.5 ...
## $ Min temp_D91_D120 : num [1:2789] 16 16 18 15.5 16.5 15 16 16 18 15.5 ...
## $ Max temp_D91_D120 : num [1:2789] 33 30.5 33 32.5 35 31.5 33 30.5 33 32.5 ...
## $ Inst Wind Speed_D1_D30(in Knots) : num [1:2789] 4 10 4 8 10 6 4 10 4 8 ...
## $ Inst Wind Speed_D31_D60(in Knots) : num [1:2789] 10 4 12 6 12 6 10 4 12 6 ...
## $ Inst Wind Speed_D61_D90(in Knots) : num [1:2789] 8 10 4 8 10 8 8 10 4 8 ...
## $ Inst Wind Speed_D91_D120(in Knots): num [1:2789] 10 6 12 6 12 10 10 6 12 6 ...
## $ Wind Direction_D1_D30 : chr [1:2789] "SW" "NW" "ENE" "W" ...
## $ Wind Direction_D31_D60 : chr [1:2789] "W" "S" "NE" "WNW" ...
## $ Wind Direction_D61_D90 : chr [1:2789] "NNW" "SE" "NNE" "SE" ...
## $ Wind Direction_D91_D120 : chr [1:2789] "WSW" "SSE" "W" "S" ...
## $ Relative Humidity_D1_D30 : num [1:2789] 72 64.6 85 88.5 72.7 78.6 72 64.6 85 88.5 ...
## $ Relative Humidity_D31_D60 : num [1:2789] 78 85 96 95 91 80 78 85 96 95 ...
## $ Relative Humidity_D61_D90 : num [1:2789] 88 84 84 81 83 92 88 84 84 81 ...
## $ Relative Humidity_D91_D120 : num [1:2789] 85 87 79 84 81 88 85 87 79 84 ...
## $ Trash(in bundles) : num [1:2789] 540 600 600 540 600 480 540 480 600 540 ...
## $ Paddy yield(in Kg) : num [1:2789] 35028 35412 36300 35016 34044 ...
## - attr(*, "spec")=
## .. cols(
## .. Hectares = col_double(),
## .. Agriblock = col_character(),
## .. Variety = col_character(),
## .. `Soil Types` = col_character(),
## .. `Seedrate(in Kg)` = col_double(),
## .. `LP_Mainfield(in Tonnes)` = col_double(),
## .. Nursery = col_character(),
## .. `Nursery area (Cents)` = col_double(),
## .. `LP_nurseryarea(in Tonnes)` = col_double(),
## .. DAP_20days = col_double(),
## .. Weed28D_thiobencarb = col_double(),
## .. Urea_40Days = col_double(),
## .. Potassh_50Days = col_double(),
## .. Micronutrients_70Days = col_double(),
## .. `Pest_60Day(in ml)` = col_double(),
## .. `30DRain( in mm)` = col_double(),
## .. `30DAI(in mm)` = col_double(),
## .. `30_50DRain( in mm)` = col_double(),
## .. `30_50DAI(in mm)` = col_double(),
## .. `51_70DRain(in mm)` = col_double(),
## .. `51_70AI(in mm)` = col_double(),
## .. `71_105DRain(in mm)` = col_double(),
## .. `71_105DAI(in mm)` = col_double(),
## .. `Min temp_D1_D30` = col_double(),
## .. `Max temp_D1_D30` = col_double(),
## .. `Min temp_D31_D60` = col_double(),
## .. `Max temp_D31_D60` = col_double(),
## .. `Min temp_D61_D90` = col_double(),
## .. `Max temp_D61_D90` = col_double(),
## .. `Min temp_D91_D120` = col_double(),
## .. `Max temp_D91_D120` = col_double(),
## .. `Inst Wind Speed_D1_D30(in Knots)` = col_double(),
## .. `Inst Wind Speed_D31_D60(in Knots)` = col_double(),
## .. `Inst Wind Speed_D61_D90(in Knots)` = col_double(),
## .. `Inst Wind Speed_D91_D120(in Knots)` = col_double(),
## .. `Wind Direction_D1_D30` = col_character(),
## .. `Wind Direction_D31_D60` = col_character(),
## .. `Wind Direction_D61_D90` = col_character(),
## .. `Wind Direction_D91_D120` = col_character(),
## .. `Relative Humidity_D1_D30` = col_double(),
## .. `Relative Humidity_D31_D60` = col_double(),
## .. `Relative Humidity_D61_D90` = col_double(),
## .. `Relative Humidity_D91_D120` = col_double(),
## .. `Trash(in bundles)` = col_double(),
## .. `Paddy yield(in Kg)` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(df);
## Hectares Agriblock Variety Soil Types
## Min. :1.000 Length:2789 Length:2789 Length:2789
## 1st Qu.:3.000 Class :character Class :character Class :character
## Median :4.000 Mode :character Mode :character Mode :character
## Mean :3.717
## 3rd Qu.:5.000
## Max. :6.000
## Seedrate(in Kg) LP_Mainfield(in Tonnes) Nursery
## Min. : 25.00 Min. :12.50 Length:2789
## 1st Qu.: 75.00 1st Qu.:37.50 Class :character
## Median :100.00 Median :50.00 Mode :character
## Mean : 92.94 Mean :46.47
## 3rd Qu.:125.00 3rd Qu.:62.50
## Max. :150.00 Max. :75.00
## Nursery area (Cents) LP_nurseryarea(in Tonnes) DAP_20days
## Min. : 20.00 Min. :1.000 Min. : 40.0
## 1st Qu.: 60.00 1st Qu.:3.000 1st Qu.:120.0
## Median : 80.00 Median :4.000 Median :160.0
## Mean : 74.35 Mean :3.717 Mean :148.7
## 3rd Qu.:100.00 3rd Qu.:5.000 3rd Qu.:200.0
## Max. :120.00 Max. :6.000 Max. :240.0
## Weed28D_thiobencarb Urea_40Days Potassh_50Days Micronutrients_70Days
## Min. : 2.000 Min. : 27.13 Min. :10.38 Min. :15.00
## 1st Qu.: 6.000 1st Qu.: 81.39 1st Qu.:31.14 1st Qu.:45.00
## Median : 8.000 Median :108.52 Median :41.52 Median :60.00
## Mean : 7.435 Mean :100.85 Mean :38.59 Mean :55.76
## 3rd Qu.:10.000 3rd Qu.:135.65 3rd Qu.:51.90 3rd Qu.:75.00
## Max. :12.000 Max. :162.78 Max. :62.28 Max. :90.00
## Pest_60Day(in ml) 30DRain( in mm) 30DAI(in mm) 30_50DRain( in mm)
## Min. : 600 Min. :18.10 Min. :20.40 Min. :185.2
## 1st Qu.:1800 1st Qu.:18.10 1st Qu.:20.40 1st Qu.:185.2
## Median :2400 Median :18.50 Median :21.50 Median :185.6
## Mean :2230 Mean :18.72 Mean :21.28 Mean :186.0
## 3rd Qu.:3000 3rd Qu.:19.60 3rd Qu.:21.90 3rd Qu.:187.2
## Max. :3600 Max. :19.60 Max. :21.90 Max. :187.2
## 30_50DAI(in mm) 51_70DRain(in mm) 51_70AI(in mm) 71_105DRain(in mm)
## Min. :270.8 Min. :165.3 Min. :250.0 Min. :60.00
## 1st Qu.:270.8 1st Qu.:165.3 1st Qu.:250.0 1st Qu.:60.00
## Median :272.4 Median :166.1 Median :250.9 Median :60.20
## Mean :272.0 Mean :166.2 Mean :250.8 Mean :60.41
## 3rd Qu.:272.8 3rd Qu.:167.0 3rd Qu.:251.7 3rd Qu.:61.00
## Max. :272.8 Max. :167.0 Max. :251.7 Max. :61.00
## 71_105DAI(in mm) Min temp_D1_D30 Max temp_D1_D30 Min temp_D31_D60
## Min. :64.00 Min. :18.00 Min. :31.00 Min. :15.50
## 1st Qu.:64.00 1st Qu.:18.50 1st Qu.:32.00 1st Qu.:16.00
## Median :64.80 Median :19.50 Median :33.00 Median :17.50
## Mean :64.59 Mean :19.34 Mean :33.13 Mean :17.14
## 3rd Qu.:65.00 3rd Qu.:20.00 3rd Qu.:34.00 3rd Qu.:18.00
## Max. :65.00 Max. :20.50 Max. :35.00 Max. :18.50
## Max temp_D31_D60 Min temp_D61_D90 Max temp_D61_D90 Min temp_D91_D120
## Min. :28.00 Min. :15.00 Min. :31.00 Min. :15.00
## 1st Qu.:30.00 1st Qu.:15.50 1st Qu.:31.50 1st Qu.:15.50
## Median :30.00 Median :17.00 Median :33.00 Median :16.00
## Mean :31.32 Mean :16.68 Mean :32.66 Mean :16.19
## 3rd Qu.:34.00 3rd Qu.:17.50 3rd Qu.:33.50 3rd Qu.:16.50
## Max. :35.00 Max. :18.00 Max. :34.00 Max. :18.00
## Max temp_D91_D120 Inst Wind Speed_D1_D30(in Knots)
## Min. :30.5 Min. : 4.000
## 1st Qu.:31.5 1st Qu.: 4.000
## Median :33.0 Median : 8.000
## Mean :32.7 Mean : 7.233
## 3rd Qu.:33.0 3rd Qu.:10.000
## Max. :35.0 Max. :10.000
## Inst Wind Speed_D31_D60(in Knots) Inst Wind Speed_D61_D90(in Knots)
## Min. : 4.000 Min. : 4.000
## 1st Qu.: 6.000 1st Qu.: 8.000
## Median :10.000 Median : 8.000
## Mean : 8.513 Mean : 8.173
## 3rd Qu.:12.000 3rd Qu.:10.000
## Max. :12.000 Max. :10.000
## Inst Wind Speed_D91_D120(in Knots) Wind Direction_D1_D30
## Min. : 6.000 Length:2789
## 1st Qu.: 6.000 Class :character
## Median :10.000 Mode :character
## Mean : 9.449
## 3rd Qu.:12.000
## Max. :12.000
## Wind Direction_D31_D60 Wind Direction_D61_D90 Wind Direction_D91_D120
## Length:2789 Length:2789 Length:2789
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Relative Humidity_D1_D30 Relative Humidity_D31_D60 Relative Humidity_D61_D90
## Min. :64.60 Min. :78.00 Min. :81.00
## 1st Qu.:72.00 1st Qu.:80.00 1st Qu.:83.00
## Median :72.70 Median :91.00 Median :84.00
## Mean :76.26 Mean :87.59 Mean :85.16
## 3rd Qu.:85.00 3rd Qu.:95.00 3rd Qu.:88.00
## Max. :88.50 Max. :96.00 Max. :92.00
## Relative Humidity_D91_D120 Trash(in bundles) Paddy yield(in Kg)
## Min. :79.00 Min. : 80.0 Min. : 5410
## 1st Qu.:81.00 1st Qu.:240.0 1st Qu.:16389
## Median :84.00 Median :360.0 Median :24636
## Mean :83.86 Mean :335.5 Mean :22518
## 3rd Qu.:87.00 3rd Qu.:450.0 3rd Qu.:31035
## Max. :88.00 Max. :600.0 Max. :38814
Dropping not required, ignorable and un-relatable dimension FILENAME as per dataset description.
# df <- df[ , !(names(df) %in% c("FILENAME"))];
print(head(df, 5));
## # A tibble: 5 × 45
## Hectares Agriblock Variety `Soil Types` `Seedrate(in Kg)`
## <dbl> <chr> <chr> <chr> <dbl>
## 1 6 Cuddalore CO_43 alluvial 150
## 2 6 Kurinjipadi ponmani clay 150
## 3 6 Panruti delux ponni alluvial 150
## 4 6 Kallakurichi CO_43 clay 150
## 5 6 Sankarapuram ponmani alluvial 150
## # ℹ 40 more variables: `LP_Mainfield(in Tonnes)` <dbl>, Nursery <chr>,
## # `Nursery area (Cents)` <dbl>, `LP_nurseryarea(in Tonnes)` <dbl>,
## # DAP_20days <dbl>, Weed28D_thiobencarb <dbl>, Urea_40Days <dbl>,
## # Potassh_50Days <dbl>, Micronutrients_70Days <dbl>,
## # `Pest_60Day(in ml)` <dbl>, `30DRain( in mm)` <dbl>, `30DAI(in mm)` <dbl>,
## # `30_50DRain( in mm)` <dbl>, `30_50DAI(in mm)` <dbl>,
## # `51_70DRain(in mm)` <dbl>, `51_70AI(in mm)` <dbl>, …