library(tidyverse)library(corrplot)library(lme4)library(multcomp)library(MASS)library(arm)library(ade4)library(Hmisc)library(labdsv)library(vegan)library(cowplot)library(ggpubr)library(rstatix)library(patchwork)library(multcompView)library(ggsignif)library(grid)library(FactoMineR)library(factoextra)library(explore)library(ggrepel)library(naniar)library(outliers)library(leaps)library(fastDummies)library(caret) # pour l'entrainement des modelslibrary(mgcv)library(ggeffects)library(gratia)library(GGally) # pour ggpairlibrary(openxlsx)library(readxl)library(leaflet) # pour la cartolibrary(quarto)library(raster)library(knitr)library(kableExtra)library(stringr)library(plotly)library(vcd) # pour la distribution des var reponselibrary(prospectr)# pour split data avec kenSton()library(randomForest)library(gbm)library(kernlab)library(ggforce)library(keras)library(tensorflow)library(neuralnet)library(iml) # pour l'interpretabilité des models https://cran.r-project.org/web/packages/iml/vignettes/intro.htmllibrary(stats)library(bestNormalize)library(rmarkdown)library(DT)library(gtExtras) # pour lalibrary(reshape2)library(sf)library(ggplot2)library(maptools)library(ggsn)library(spThin)library(sp)library(gstat)
rm("summary_df")levels(prodij$Details_Milieu_Niv3)[levels(prodij$Details_Milieu_Niv3)=="111_Forêt de feuillus"]="OS_111"levels(prodij$Details_Milieu_Niv3)[levels(prodij$Details_Milieu_Niv3)=="210_Prairie agricole permanente"]="OS_210"levels(prodij$Details_Milieu_Niv3)[levels(prodij$Details_Milieu_Niv3)=="214_Culture annuelle"]="OS_214"levels(prodij$Details_Milieu_Niv3)[levels(prodij$Details_Milieu_Niv3)=="218_Vignes et autres Cultures pérennes"]="OS_218"df_l <-data.frame(Niveau_Original =c("111_Forêt de feuillus", "210_Prairie agricole permanente", "214_Culture annuelle", "218_Vignes et autres Cultures pérennes"), Abréviations =c("OS_111", "OS_210", "OS_214", "OS_218"))kable(df_l)
Niveau_Original
Abréviations
111_Forêt de feuillus
OS_111
210_Prairie agricole permanente
OS_210
214_Culture annuelle
OS_214
218_Vignes et autres Cultures pérennes
OS_218
The database therefore changes from 530 to 416 observations.
3 Earthworms data
3.1 Total abundance
Sppression des valeurs aberrantes
The database therefore changes from 416 to 414 observations.
Conclusion: Il reste donc 10 variables explicatives numeric.
4.3 VIF
2 variables from the 10 input variables have collinearity problem:
SableF SableG
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( pH_eau ~ C_org ): 0.009204932
max correlation ( pH_eau ~ C.N ): -0.5074557
---------- VIFs of the remained variables --------
Variables VIF
1 GPS_X 1.723465
2 GPS_Y 1.718448
3 LimonF 1.654946
4 LimonG 1.994132
5 Argile 1.312747
6 C_org 1.896763
7 C.N 2.243654
8 pH_eau 1.957370
SableG et SableF sont multicolinéaire: on enleve donc un des deux:
SableG
No variable from the 9 input variables has collinearity problem.
The linear correlation coefficients ranges between:
min correlation ( pH_eau ~ C_org ): 0.009204932
max correlation ( LimonF ~ SableF ): -0.6233721
---------- VIFs of the remained variables --------
Variables VIF
1 GPS_X 1.745123
2 GPS_Y 1.773541
3 SableF 4.029032
4 LimonF 3.301790
5 LimonG 2.008510
6 Argile 2.422163
7 C_org 2.003050
8 C.N 2.260907
9 pH_eau 1.959802
Conclusion: Il reste donc 9 variables explicatives numerique + 3 variables factorielles
4.4 Verifications des valeurs abérants
Code
# Parcourir chaque variable dans la liste# for (var in variables_num) {# # Générer le code pour chaque variable# cat(paste0(# "## ", var, "\n\n",# "```{r fig_", var, ",fig.align='center',fig.height=10}\n",# "df_suivi = prodij\n",# "n_line = nrow(df_suivi)\n\n",# "# summary(prodij$", var, ")\n",# "df_cleaned = prodij\n\n",# "df_cleaned$", var, " = as.numeric(df_cleaned$", var, ")\n",# "explo_num(nom_col = '", var, "', titre = '", var, " (before cleaning)', df = df_cleaned)\n",# "df_cleaned <- test_grub(df_cleaned, '", var, "', direction = 'maxi')\n",# "df_cleaned <- test_grub(df_cleaned, '", var, "', direction = 'mini')\n",# "cat('Suppression des valeurs aberrantes')\n",# "explo_num(nom_col = '", var, "', titre = '", var, " (after cleaning)', df = df_cleaned)\n",# "# summary(df_cleaned$", var, ")\n",# "# prodij = df_cleaned\n",# "```\n\n",# "The database therefore changes from **`r n_line`** to **`r nrow(df_cleaned)`** observations.\n\n\n"# ))# }
4.4.1 GPS_X
Suppression des valeurs aberrantes
The database therefore changes from 319 to 319 observations.
4.4.2 GPS_Y
Suppression des valeurs aberrantes
The database therefore changes from 319 to 319 observations.
4.4.3 SableF
Suppression des valeurs aberrantes
The database therefore changes from 319 to 313 observations.
4.4.4 LimonF
Suppression des valeurs aberrantes
The database therefore changes from 313 to 313 observations.
4.4.5 LimonG
Suppression des valeurs aberrantes
The database therefore changes from 313 to 313 observations.
4.4.6 Argile
Suppression des valeurs aberrantes
The database therefore changes from 313 to 313 observations.
4.4.7 C_org
Suppression des valeurs aberrantes
The database therefore changes from 313 to 302 observations.
4.4.8 C.N
Suppression des valeurs aberrantes
The database therefore changes from 302 to 296 observations.
4.4.9 pH_eau
Suppression des valeurs aberrantes
The database therefore changes from 296 to 296 observations.
5 Relations entre les vers de terre et les variables explicatives