[Masqué] Chargement/pre-processing

[Masqué] Fonctions d’interprétations

[Masqué] Sélection des variables d’intérêts

Option 1 : Clustering hiérarchique sans ACP en supprimant : popMuni2020 + QPV CoVe, CoVe, 4 QPV

wss <- numeric(10)
for(k in 1:10){
  clusters_k <- cutree(hc, k = k)
  wss[k] <- sum(sapply(unique(clusters_k), function(cl){
    points <- data_option1[clusters_k == cl, ]
    if(nrow(points) <= 1) return(0)
    sum(rowSums((points - matrix(colMeans(points), nrow(points), ncol(points), byrow = TRUE))^2))
  }))
}

wss_df <- data.frame(k = 1:10, WSS = wss)
ggplot(wss_df, aes(x = k, y = WSS)) +
  geom_point(size = 3) +
  geom_line() +
  labs(title = "Méthode du coude",
       x = "Nombre de clusters", y = "WSS") +
  theme_minimal()

Cartographie des profils en CAH simple

## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored

Bien que cela puisse regrouper par défaut uniquement en fonction de l’envergure de la commune, il est intéressant d’un point de vue statistique de donner un poids aux communes pour mieux étaler le nuage de points. De plus, cela permettra de donner une ‘importance’ aux communes : deux communes avec des fragilités identiques n’obtiendront pas forcément les mêmes aides pour privilégier au plus grand nombre (c’est aussi plus fiable d’aider les grandes communes selon les indicateurs).

Option 2 : ACP pondérée par popMuni2020 avec individus supplémentaires : CoVe + QPV CoVe + 4 QPV

ACP

## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Interprétation des composantes principales

## 
## === INTERPRÉTATION DES DIMENSIONS ===
## 
##  Dim1 :
## Variables structurantes : partPopEt, partPopImmi, partEmpDurLim, partOuvr, txChom, partPopSansDip, partMenProp, partResPrincApp, DISP_MED_A21, partEmpSal 
## Pôle positif : partPopEt, partPopImmi, partEmpDurLim, partOuvr, txChom, partPopSansDip, partResPrincApp, partEmpSal 
## Pôle négatif : partMenProp, DISP_MED_A21 
## 
##  Dim2 :
## Variables structurantes : partPop25_59, partPop75p, indiceJeunesse, partPopBepCap, partMen1p, nbPersResPrinc 
## Pôle positif : partPop25_59, indiceJeunesse, partPopBepCap, nbPersResPrinc 
## Pôle négatif : partPop75p, partMen1p
FactoInvestigate::description(res.PCA_option2)
## 
## * * *
## 
## The **dimension 1** opposes individuals characterized by a strongly positive coordinate on the axis (to the right of the graph)
## to individuals such as *Venasque*, *Saint-Didier* and *Bédoin* (to the left of the graph, characterized by a strongly negative coordinate on the axis).
## 
## The group 1 (characterized by a positive coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.
## 
## The group in which the individuals *Venasque*, *Saint-Didier* and *Bédoin* stand (characterized by a negative coordinate on the axis) is sharing :
## 
## - high values for the variables *partResPrincSurocc*, *partEmpDurLim*, *txChom*, *partFamMono*, *partPopEt* and *partPopImmi* (variables are sorted from the strongest).
## - low values for the variables *txEmp*, *partPopBac*, *txScol15_24* and *DISP_MED_A21* (variables are sorted from the weakest).
## 
## Note that the variables *partPopSansDip*, *partMenProp*, *partResPrincApp* and *DISP_MED_A21* are highly correlated with this dimension (respective correlation of 0.94, 0.95, 0.93, 0.91). These variables could therefore summarize themselve the dimension 1.
## 
## * * *
## 
## The **dimension 2** opposes individuals such as *Carpentras*, *Loriol-du-Comtat*, *Aubignan*, *Modène* and *Mazan* (to the top of the graph, characterized by a strongly positive coordinate on the axis)
## to individuals such as *Venasque*, *Saint-Didier* and *Bédoin* (to the bottom of the graph, characterized by a strongly negative coordinate on the axis).
## 
## The group in which the individuals *Carpentras*, *Loriol-du-Comtat*, *Aubignan*, *Modène* and *Mazan* stand (characterized by a positive coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.
## 
## The group in which the individuals *Venasque*, *Saint-Didier* and *Bédoin* stand (characterized by a negative coordinate on the axis) is sharing :
## 
## - high values for the variables *partResPrincSurocc*, *partEmpDurLim*, *txChom*, *partFamMono*, *partPopEt* and *partPopImmi* (variables are sorted from the strongest).
## - low values for the variables *txEmp*, *partPopBac*, *txScol15_24* and *DISP_MED_A21* (variables are sorted from the weakest).

Clustering : CAH

Choix du nombre de clusters

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 3 as the best number of clusters 
## * 3 proposed 4 as the best number of clusters 
## * 3 proposed 5 as the best number of clusters 
## * 1 proposed 6 as the best number of clusters 
## * 4 proposed 7 as the best number of clusters 
## * 5 proposed 8 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  3 
##  
##  
## *******************************************************************
## 
## === CLASSIFICATION ===
## Nombre optimal de clusters : 3

#Factoshiny::HCPCshiny(res.PCA_option2)

Profil des clusters en CAH

plot_cluster_profils_CAH(
  data_cluster = data_clean[-ind_sup_option2, ], 
  res.HCPC = res.HCPC_option2,
  main_title = "Profils moyens des clusters CAH avec ACP pondérée par population (Option 2)"
)

Cartographie des profils en CAH

## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
FactoInvestigate::classif(res.HCPC_option2)
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## drawn <-
## c("Carpentras", "Venasque", "Aubignan", "Bédoin", "Loriol-du-Comtat", 
## "Saint-Didier", "Sarrians", "Mazan", "Modène")
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = 'tree', title = '')
## ```
## 
## **Figure.1 - Hierarchical tree.**
## 
## The classification made on individuals reveals 3 clusters.
## 
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = 'map', draw.tree = FALSE, title = '')
## ```
## 
## **Figure.2 - Ascending Hierarchical Classification of the individuals.**
## 
## The **cluster 1** is made of individuals such as *Bédoin*, *Saint-Didier* and *Venasque*. This group is characterized by :
## 
## - high values for variables like *partPop60_74*, *partArtCadr*, *partPopBacSup*, *partResSec*, *partPop75p*, *partMenProp*, *DISP_MED_A21*, *partTpsPart*, *partMenVoit* and *partPopBac* (variables are sorted from the strongest).
## - low values for variables like *indiceJeunesse*, *partPop0_14*, *partPop15_24*, *partResVac*, *partEmpl*, *nbPersResPrinc*, *partPopSansDip*, *partOuvr*, *partEmpSal* and *partPopBepCap* (variables are sorted from the weakest).
## 
## The **cluster 2** is made of individuals such as *Aubignan*, *Loriol-du-Comtat*, *Mazan*, *Modène* and *Sarrians*. This group is characterized by :
## 
## - high values for variables like *txEmp*, *partMenVoit*, *partPop25_59*, *partPopBepCap*, *DISP_MED_A21*, *partMenProp*, *nbPersResPrinc*, *partPopBac*, *partPI* and *txScol15_24* (variables are sorted from the strongest).
## - low values for variables like *partPopEt*, *partMen1p*, *partPopImmi*, *txChom*, *partResPrincApp*, *partResPrincSurocc*, *partEmpDurLim*, *partPopSansDip*, *partEmpSal* and *partOuvr* (variables are sorted from the weakest).
## 
## The **cluster 3** is made of individuals such as *Carpentras*. This group is characterized by :
## 
## - high values for variables like *partResPrincApp*, *partPopImmi*, *partPopEt*, *txChom*, *partPopSansDip*, *partOuvr*, *partEmpDurLim*, *partEmpSal*, *partResPrincSurocc* and *partMen1p* (variables are sorted from the strongest).
## - low values for variables like *partMenProp*, *partMenVoit*, *DISP_MED_A21*, *txEmp*, *partPopBac*, *partArtCadr*, *partResSec*, *partPopBacSup*, *partPop60_74* and *partPop25_59* (variables are sorted from the weakest).
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = '3D.map', ind.names=FALSE, title = '')
## ```
## 
## **Figure.3 - Hierarchical tree on the factorial map.**
## 
## The hierarchical tree can be drawn on the factorial map with the individuals colored according to their clusters.

K-means

## 
## === CLASSIFICATION K-MEANS ===
## Nombre optimal de clusters : 3

Profil des clusters en k-means

plot_cluster_profils_KMEANS(
  data_cluster = data_clean[-ind_sup_option2,],
  res.kmeans = res.km_option2,
  main_title = "Profils moyens des clusters k-means (Option 2)"
)

Cartographie des profils en k-means

## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored

Option 3 : ACP pondérée par racine(popMuni2020): Individus supplémentaires : CoVe + QPV CoVe + 4 QPV

ACP

## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Interprétation des composantes principales

## 
## === INTERPRÉTATION DES DIMENSIONS ===
## 
##  Dim1 :
## Variables structurantes : partPopImmi, partEmpDurLim, partOuvr, txChom, partPopSansDip, partMenProp, partResPrincApp, DISP_MED_A21, partEmpSal 
## Pôle positif : partPopImmi, partEmpDurLim, partOuvr, txChom, partPopSansDip, partResPrincApp, partEmpSal 
## Pôle négatif : partMenProp, DISP_MED_A21 
## 
##  Dim2 :
## Variables structurantes : partPop25_59, partPop75p, indiceJeunesse, partMen1p, nbPersResPrinc 
## Pôle positif : partPop25_59, indiceJeunesse, nbPersResPrinc 
## Pôle négatif : partPop75p, partMen1p
FactoInvestigate::description(res.PCA_option3)
## 
## * * *
## 
## The **dimension 1** opposes individuals characterized by a strongly positive coordinate on the axis (to the right of the graph)
## to individuals such as *Venasque*, *Saint-Didier*, *Crillon-le-Brave*, *Le Barroux* and *Bédoin* (to the left of the graph, characterized by a strongly negative coordinate on the axis).
## 
## The group 1 (characterized by a positive coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.
## 
## The group in which the individuals *Venasque*, *Saint-Didier*, *Crillon-le-Brave*, *Le Barroux* and *Bédoin* stand (characterized by a negative coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.
## 
## 
## * * *
## 
## The **dimension 2** opposes individuals such as *Loriol-du-Comtat* and *Modène* (to the top of the graph, characterized by a strongly positive coordinate on the axis)
## to individuals characterized by a strongly negative coordinate on the axis (to the bottom of the graph).
## 
## The group in which the individuals *Loriol-du-Comtat* and *Modène* stand (characterized by a positive coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.
## 
## The group 2 (characterized by a negative coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.

Clustering : CAH

Choix du nombre de clusters

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 3 as the best number of clusters 
## * 6 proposed 4 as the best number of clusters 
## * 2 proposed 5 as the best number of clusters 
## * 1 proposed 6 as the best number of clusters 
## * 7 proposed 8 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  8 
##  
##  
## *******************************************************************
## 
## === CLASSIFICATION ===
## Nombre optimal de clusters : 8

Profil des clusters en CAH

plot_cluster_profils_CAH(
  data_cluster = data_clean[-ind_sup_option3, ], 
  res.HCPC = res.HCPC_option3,
  main_title = "Profils moyens des clusters CAH avec ACP pondérée par sqrt(population) (Option 3)"
)

Cartographie des profils en CAH

## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
FactoInvestigate::classif(res.HCPC_option3)
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## drawn <-
## c("Carpentras", "Venasque", "Modène", "Loriol-du-Comtat", "Aubignan", 
## "Bédoin", "La Roque-sur-Pernes", "Saint-Didier", "Sarrians", 
## "Crillon-le-Brave", "Le Barroux")
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = 'tree', title = '')
## ```
## 
## **Figure.1 - Hierarchical tree.**
## 
## The classification made on individuals reveals 3 clusters.
## 
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = 'map', draw.tree = FALSE, title = '')
## ```
## 
## **Figure.2 - Ascending Hierarchical Classification of the individuals.**
## 
## The **cluster 1** is made of individuals such as *Le Barroux*, *Bédoin*, *Crillon-le-Brave*, *La Roque-sur-Pernes*, *Saint-Didier* and *Venasque*. This group is characterized by :
## 
## - high values for variables like *partPop60_74*, *partPopBacSup*, *partResSec*, *partArtCadr*, *partMenProp*, *partPop75p*, *partTpsPart*, *DISP_MED_A21*, *partMenVoit* and *txEmp* (variables are sorted from the strongest).
## - low values for variables like *indiceJeunesse*, *partPop0_14*, *partResVac*, *partPopBepCap*, *partPop15_24*, *partPopSansDip*, *nbPersResPrinc*, *partEmpl*, *partOuvr* and *partPop25_59* (variables are sorted from the weakest).
## 
## The **cluster 2** is made of individuals such as *Aubignan*, *Loriol-du-Comtat*, *Modène* and *Sarrians*. This group is characterized by :
## 
## - high values for variables like *partPopBepCap*, *partPop25_59*, *nbPersResPrinc*, *partMenVoit*, *txEmp*, *indiceJeunesse*, *partResVac*, *partEmpl*, *partPop0_14* and *DISP_MED_A21* (variables are sorted from the strongest).
## - low values for variables like *partPopEt*, *partMen1p*, *partPopImmi*, *partPop60_74*, *partPop75p*, *txChom*, *partTpsPart*, *partResPrincApp*, *partPopBacSup* and *partArtCadr* (variables are sorted from the weakest).
## 
## The **cluster 3** is made of individuals such as *Carpentras*. This group is characterized by :
## 
## - high values for variables like *partResPrincApp*, *partPopEt*, *partPopImmi*, *txChom*, *partPopSansDip*, *partEmpSal*, *partOuvr*, *partEmpDurLim*, *partPop0_14* and *partMen1p* (variables are sorted from the strongest).
## - low values for variables like *partMenProp*, *partMenVoit*, *DISP_MED_A21*, *txEmp*, *partResSec*, *partPopBacSup*, *partArtCadr*, *partPopBac*, *partPop60_74* and *partPop25_59* (variables are sorted from the weakest).
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = '3D.map', ind.names=FALSE, title = '')
## ```
## 
## **Figure.3 - Hierarchical tree on the factorial map.**
## 
## The hierarchical tree can be drawn on the factorial map with the individuals colored according to their clusters.

K-means

## 
## === CLASSIFICATION K-MEANS ===
## Nombre optimal de clusters : 3

Profil des clusters en k-means

plot_cluster_profils_KMEANS(
  data_cluster = data_clean[-ind_sup_option3,],
  res.kmeans = res.km_option3,
  main_title = "Profils moyens des clusters en k-means (Option 3)"
)

Cartographie des profils en k-means

## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored

Option 4 : ACP sans pondération avec individus supplémentaires : CoVe + QPV CoVe + 4 QPV

## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Interprétation des composantes principales

## 
## === INTERPRÉTATION DES DIMENSIONS ===
## 
##  Dim1 :
## Variables structurantes : partPop60_74, partArtCadr, partOuvr, partPopBacSup, partPopSansDip, partMenProp, partResSec, partResPrincApp, DISP_MED_A21 
## Pôle positif : partOuvr, partPopSansDip, partResPrincApp 
## Pôle négatif : partPop60_74, partArtCadr, partPopBacSup, partMenProp, partResSec, DISP_MED_A21 
## 
##  Dim2 :
## Variables structurantes : partPopEt, partPopImmi, partMenVoit, partMen1p, nbPersResPrinc 
## Pôle positif : partPopEt, partPopImmi, partMen1p 
## Pôle négatif : partMenVoit, nbPersResPrinc
FactoInvestigate::description(res.PCA_option4)
## 
## * * *
## 
## The **dimension 1** opposes individuals characterized by a strongly positive coordinate on the axis (to the right of the graph)
## to individuals such as *Aubignan*, *Loriol-du-Comtat*, *Sarrians*, *La Roque-Alric* and *Vacqueyras* (to the left of the graph, characterized by a strongly negative coordinate on the axis).
## 
## The group 1 (characterized by a positive coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.
## 
## The group in which the individual *La Roque-Alric* stands (characterized by a negative coordinate on the axis) is sharing :
## 
## - high values for the variable *partResVac*.
## 
## The group in which the individuals *Aubignan*, *Loriol-du-Comtat*, *Sarrians* and *Vacqueyras* stand (characterized by a negative coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.
## 
## 
## * * *
## 
## The **dimension 2** opposes individuals characterized by a strongly positive coordinate on the axis (to the top of the graph)
## to individuals such as *Aubignan*, *Loriol-du-Comtat*, *Sarrians* and *Vacqueyras* (to the bottom of the graph, characterized by a strongly negative coordinate on the axis).
## 
## The group 1 (characterized by a positive coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.
## 
## The group in which the individuals *Aubignan*, *Loriol-du-Comtat*, *Sarrians* and *Vacqueyras* stand (characterized by a negative coordinate on the axis) is sharing :
## 
## - variables whose values do not differ significantly from the mean.

Clustering : CAH

Choix du nombre de clusters

## Warning in pf(beale, pp, df2): NaNs produced

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 3 proposed 3 as the best number of clusters 
## * 7 proposed 4 as the best number of clusters 
## * 8 proposed 6 as the best number of clusters 
## * 5 proposed 8 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  6 
##  
##  
## *******************************************************************
## 
## === CLASSIFICATION ===
## Nombre optimal de clusters : 6

Profil des clusters en CAH

plot_cluster_profils_CAH(
  data_cluster = data_clean[-ind_sup_option4, ], 
  res.HCPC = res.HCPC_option4,
  main_title = "Profils moyens des clusters CAH avec ACP sans pondération (Option 4)"
)

Cartographie des profils en CAH

## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
FactoInvestigate::classif(res.HCPC_option4)
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## drawn <-
## c("La Roque-Alric", "Carpentras", "Modène", "Venasque", "La Roque-sur-Pernes", 
## "Loriol-du-Comtat", "Suzette", "Aubignan", "Vacqueyras", "Sarrians", 
## "Flassan", "Saint-Hippolyte-le-Graveyron", "Crillon-le-Brave"
## )
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = 'tree', title = '')
## ```
## 
## **Figure.1 - Hierarchical tree.**
## 
## The classification made on individuals reveals 3 clusters.
## 
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = 'map', draw.tree = FALSE, title = '')
## ```
## 
## **Figure.2 - Ascending Hierarchical Classification of the individuals.**
## 
## The **cluster 1** is made of individuals such as *Crillon-le-Brave*, *La Roque-Alric*, *La Roque-sur-Pernes*, *Saint-Hippolyte-le-Graveyron*, *Suzette* and *Venasque*. This group is characterized by :
## 
## - high values for the variables *partPop60_74*, *partPopBacSup*, *partArtCadr*, *partResSec*, *DISP_MED_A21*, *partMenProp*, *txScol15_24* and *partPopBac* (variables are sorted from the strongest).
## - low values for the variables *partPopBepCap*, *partPopSansDip*, *partPop15_24*, *partResVac*, *indiceJeunesse*, *partOuvr*, *txChom*, *partResPrincApp* and *partEmpDurLim* (variables are sorted from the weakest).
## 
## The **cluster 2** is made of individuals such as *Aubignan*, *Flassan*, *Loriol-du-Comtat*, *Modène*, *Sarrians* and *Vacqueyras*. This group is characterized by :
## 
## - high values for the variables *partPopBepCap*, *partResVac* and *partPop15_24* (variables are sorted from the strongest).
## - low values for the variables *partPop60_74*, *partArtCadr*, *partPopBacSup*, *partResSec* and *txScol15_24* (variables are sorted from the weakest).
## 
## The **cluster 3** is made of individuals such as *Carpentras*. This group is characterized by :
## 
## - high values for the variables *partResPrincApp*, *partPopEt*, *txChom*, *partPopImmi*, *partPopSansDip* and *partEmpSal* (variables are sorted from the strongest).
## - low values for the variables *partMenProp*, *partMenVoit*, *txEmp* and *DISP_MED_A21* (variables are sorted from the weakest).
## 
## ```{r, echo = FALSE, fig.align = 'center', fig.height = 5.5, fig.width = 5.5}
## par(mar = c(4.1, 4.1, 1.1, 2.1))
## plot.HCPC(res, choice = '3D.map', ind.names=FALSE, title = '')
## ```
## 
## **Figure.3 - Hierarchical tree on the factorial map.**
## 
## The hierarchical tree can be drawn on the factorial map with the individuals colored according to their clusters.

K-means

## 
## === CLASSIFICATION K-MEANS ===
## Nombre optimal de clusters : 6

Profil des clusters en k-means

plot_cluster_profils_KMEANS(
  data_cluster = data_clean[-ind_sup_option4,],
  res.kmeans = res.km_option4,
  main_title = "Profils moyens des clusters en k-means (Option 4)"
)

Cartographie des profils en k-means

## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored

Comparaison globale des 4 méthodes

Voir la distribution des communes par cluster

test = df_numeric_copy[-c(1:6),c(87:92)]
pheatmap(test)

Voir la distribution des communes de la CoVe

stabilité

#hclust

stab_hclust1$bootmean
## [1] 0.7250033 0.7188883 0.6900000
stab_hclust2$bootmean
## [1] 0.6001466 0.8479253 0.6135808
#km 
stab_km2$bootmean
## [1] 0.7499325 0.8851627 0.5774087
res.PCA_option2$eig[1:5, 3]
##   comp 1   comp 2   comp 3   comp 4   comp 5 
## 54.35630 72.84269 79.13745 83.10581 85.91948

option 3 et 4

stab_hclust3$bootmean
## [1] 0.5762327 0.8766981 0.5653070
stab_hclust4$bootmean
## [1] 0.6468886 0.5953831 0.7301869
stab_km3$bootmean
## [1] 0.7351310 0.9083254 0.5950130
stab_km4$bootmean
## [1] 0.6268201 0.6799787 0.5977082
## variance cumulée
res.PCA_option3$eig[1:5, 3]
##   comp 1   comp 2   comp 3   comp 4   comp 5 
## 42.08157 60.10879 67.08409 73.11820 77.91091
res.PCA_option4$eig[1:5, 3]
##   comp 1   comp 2   comp 3   comp 4   comp 5 
## 28.31289 44.41551 59.27615 66.70796 73.23771