## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
Bien que cela puisse regrouper par défaut uniquement en fonction de l’envergure de la commune, il est intéressant d’un point de vue statistique de donner un poids aux communes pour mieux étaler le nuage de points. De plus, cela permettra de donner une ‘importance’ aux communes : deux communes avec des fragilités identiques n’obtiendront pas forcément les mêmes aides pour privilégier au plus grand nombre (c’est aussi plus fiable d’aider les grandes communes selon les indicateurs).
## Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
##
## === INTERPRÉTATION DES DIMENSIONS ===
##
## Dim1 :
## Variables structurantes : partPopEt, partPopImmi, txEmp, partEmpDurLim, txChom, partMenVoit, partMen1p, partMenProp, partResPrincApp, DISP_MED_A21
## Pôle positif : partPopEt, partPopImmi, partEmpDurLim, txChom, partMen1p, partResPrincApp
## Pôle négatif : txEmp, partMenVoit, partMenProp, DISP_MED_A21
##
## Dim2 :
## Variables structurantes : partPop75p, partResSec, nbPersResPrinc
## Pôle positif : partPop75p, partResSec
## Pôle négatif : nbPersResPrinc
FactoInvestigate::description(res.PCA_option2)
##
## * * *
##
## The **dimension 1** opposes individuals such as *Avignon*, *Sorgues*, *Monteux*, *Jonquières*, *Vedène*, *Carpentras*, *Le Pontet* and *Cavaillon* (to the right of the graph, characterized by a strongly positive coordinate on the axis)
## to individuals such as *Gordes* and *Vaison-la-Romaine* (to the left of the graph, characterized by a strongly negative coordinate on the axis).
##
## The group in which the individuals *Sorgues*, *Monteux*, *Jonquières*, *Vedène*, *Carpentras*, *Le Pontet* and *Cavaillon* stand (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individual *Avignon* stands (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individuals *Gordes* and *Vaison-la-Romaine* stand (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group 4 (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## Note that the variable *partMenProp* is highly correlated with this dimension (correlation of 0.91). This variable could therefore summarize itself the dimension 1.
##
## * * *
##
## The **dimension 2** opposes individuals such as *Gordes* and *Vaison-la-Romaine* (to the top of the graph, characterized by a strongly positive coordinate on the axis)
## to individuals such as *Sorgues*, *Monteux*, *Jonquières*, *Vedène*, *Carpentras*, *Le Pontet* and *Cavaillon* (to the bottom of the graph, characterized by a strongly negative coordinate on the axis).
##
## The group in which the individuals *Gordes* and *Vaison-la-Romaine* stand (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group 2 (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individuals *Sorgues*, *Monteux*, *Jonquières*, *Vedène*, *Carpentras*, *Le Pontet* and *Cavaillon* stand (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
test_option2 = res.PCA_option2$ind$coord[rownames(res.PCA_option2$ind$coord) %in% cog_cove, 1:2]
## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 4 proposed 3 as the best number of clusters
## * 8 proposed 4 as the best number of clusters
## * 2 proposed 6 as the best number of clusters
## * 1 proposed 7 as the best number of clusters
## * 7 proposed 8 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 4
##
##
## *******************************************************************
##
## === CLASSIFICATION ===
## Nombre optimal de clusters : 4
dist_mat <- dist(test_option2, method = "euclidean")
hc <- hclust(dist_mat, method = "ward.D2")
plot(hc, main = "Arbre hiérarchique (Ward.D2)")
# Découpage en k clusters
clusters <- cutree(hc, k = nb_optimal_option2)
# Stocker résultat dans un objet "style HCPC"
res.HCPC_option2 <- list()
res.HCPC_option2$call <- list()
res.HCPC_option2$data.clust <- data.frame(clust = clusters)
rownames(res.HCPC_option2$data.clust) <- rownames(test_option2)
plot_cluster_profils_CAH(
data_cluster = data_clean[rownames(data_clean) %in% cog_cove, ],
res.HCPC = res.HCPC_option2,
main_title = "Profils moyens des clusters CAH avec ACP pondérée par population (Option 2)"
)
df_numeric_copy$cluster_cah_option2 <- NA
df_numeric_copy[rownames(res.HCPC_option2$data.clust), "cluster_cah_option2"] <- res.HCPC_option2$data.clust$clust
pal <- colorFactor(
palette = rainbow(length(unique(df_numeric_copy$cluster_cah_option2))),
domain = df_numeric_copy$cluster_cah_option2
)
leaflet(df_numeric_copy) %>%
addTiles() %>%
addCircleMarkers(
lng = ~long, lat = ~lat,
color = ~pal(cluster_cah_option2),
radius = 5,
stroke = FALSE,
fillOpacity = 0.8,
label = ~paste("Cluster:", cluster_cah_option2, rownames(df_numeric_copy))
) %>%
addLegend(
"bottomright",
pal = pal,
values = ~cluster_cah_option2,
title = "Cluster"
) %>%
# Ajouter le périmètre CoVe
addPolygons(
data = perimetre_com_cove,
color = "black",
weight = 1.5,
fill = FALSE
)
## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
##
## === CLASSIFICATION K-MEANS ===
## Nombre optimal de clusters : 3
## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
stab_hclust2$bootmean
## [1] 0.6413452 0.7356786 0.6600487 0.7882024
stab_km2$bootmean
## [1] 0.6636984 0.8145455 0.5609221
##
## === INTERPRÉTATION DES DIMENSIONS ===
##
## Dim1 :
## Variables structurantes : partPopEt, partPopImmi, txEmp, partEmpDurLim, txChom, partMenVoit, partPopSansDip, partMenProp, partResPrincApp, DISP_MED_A21
## Pôle positif : partPopEt, partPopImmi, partEmpDurLim, txChom, partPopSansDip, partResPrincApp
## Pôle négatif : txEmp, partMenVoit, partMenProp, DISP_MED_A21
##
## Dim2 :
## Variables structurantes : partPop0_14, partPop60_74, partPop75p, indiceJeunesse, nbPersResPrinc
## Pôle positif : partPop0_14, indiceJeunesse, nbPersResPrinc
## Pôle négatif : partPop60_74, partPop75p
##
## * * *
##
## The **dimension 1** opposes individuals such as *Carpentras*, *Apt*, *Orange*, *Cavaillon*, *Le Pontet* and *Avignon* (to the right of the graph, characterized by a strongly positive coordinate on the axis)
## to individuals such as *Gordes*, *Venasque* and *Vaison-la-Romaine* (to the left of the graph, characterized by a strongly negative coordinate on the axis).
##
## The group in which the individuals *Carpentras*, *Apt*, *Orange*, *Cavaillon*, *Le Pontet* and *Avignon* stand (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group 2 (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individuals *Gordes*, *Venasque* and *Vaison-la-Romaine* stand (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group 4 (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
##
## * * *
##
## The **dimension 2** opposes individuals characterized by a strongly positive coordinate on the axis (to the top of the graph)
## to individuals such as *Gordes*, *Venasque* and *Vaison-la-Romaine* (to the bottom of the graph, characterized by a strongly negative coordinate on the axis).
##
## The group 1 (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individuals *Gordes*, *Venasque* and *Vaison-la-Romaine* stand (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
test_option3 = res.PCA_option3$ind$coord[rownames(res.PCA_option3$ind$coord) %in% cog_cove, 1:2]
## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 4 proposed 3 as the best number of clusters
## * 8 proposed 4 as the best number of clusters
## * 2 proposed 5 as the best number of clusters
## * 1 proposed 6 as the best number of clusters
## * 7 proposed 8 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 4
##
##
## *******************************************************************
##
## === CLASSIFICATION ===
## Nombre optimal de clusters : 4
dist_mat <- dist(test_option3, method = "euclidean")
hc <- hclust(dist_mat, method = "ward.D2")
plot(hc, main = "Arbre hiérarchique (Ward.D2)")
# Découpage en k clusters
clusters <- cutree(hc, k = nb_optimal_option3)
# Stocker résultat dans un objet "style HCPC"
res.HCPC_option3 <- list()
res.HCPC_option3$call <- list()
res.HCPC_option3$data.clust <- data.frame(clust = clusters)
rownames(res.HCPC_option3$data.clust) <- rownames(test_option3)
plot_cluster_profils_CAH(
data_cluster = data_clean[rownames(data_clean) %in% cog_cove, ],
res.HCPC = res.HCPC_option3,
main_title = "Profils moyens des clusters CAH avec ACP pondérée par population (Option 3)"
)
df_numeric_copy$cluster_cah_option3 <- NA
df_numeric_copy[rownames(res.HCPC_option3$data.clust), "cluster_cah_option3"] <- res.HCPC_option3$data.clust$clust
pal <- colorFactor(
palette = rainbow(length(unique(df_numeric_copy$cluster_cah_option3))),
domain = df_numeric_copy$cluster_cah_option3
)
leaflet(df_numeric_copy) %>%
addTiles() %>%
addCircleMarkers(
lng = ~long, lat = ~lat,
color = ~pal(cluster_cah_option3),
radius = 5,
stroke = FALSE,
fillOpacity = 0.8,
label = ~paste("Cluster:", cluster_cah_option3, rownames(df_numeric_copy))
) %>%
addLegend(
"bottomright",
pal = pal,
values = ~cluster_cah_option3,
title = "Cluster"
) %>%
# Ajouter le périmètre CoVe
addPolygons(
data = perimetre_com_cove,
color = "black",
weight = 1.5,
fill = FALSE
)
## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
##
## === CLASSIFICATION K-MEANS ===
## Nombre optimal de clusters : 7
## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
stab_hclust3$bootmean
## [1] 0.5006587 0.7719286 0.6986270 0.7420967
stab_km3$bootmean
## [1] 0.7415000 0.7561667 0.7183333 0.8306667 0.6998333 0.8283333 0.7410476
##
## === INTERPRÉTATION DES DIMENSIONS ===
##
## Dim1 :
## Variables structurantes : txEmp, txChom, partMenVoit, partPopBacSup, partPopSansDip, partMenProp, partResPrincApp, DISP_MED_A21
## Pôle positif : txChom, partPopSansDip, partResPrincApp
## Pôle négatif : txEmp, partMenVoit, partPopBacSup, partMenProp, DISP_MED_A21
##
## Dim2 :
## Variables structurantes : partPop0_14, partPop60_74, indiceJeunesse, nbPersResPrinc
## Pôle positif : partPop0_14, indiceJeunesse, nbPersResPrinc
## Pôle négatif : partPop60_74
##
## * * *
##
## The **dimension 1** opposes individuals such as *Le Pontet*, *Cavaillon*, *Carpentras*, *Orange*, *Apt*, *Avignon* and *Saint-Christol* (to the right of the graph, characterized by a strongly positive coordinate on the axis)
## to individuals such as *Gordes*, *Savoillan* and *Saint-Léger-du-Ventoux* (to the left of the graph, characterized by a strongly negative coordinate on the axis).
##
## The group in which the individuals *Le Pontet*, *Cavaillon*, *Carpentras*, *Orange*, *Apt* and *Avignon* stand (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individual *Saint-Christol* stands (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individuals *Gordes*, *Savoillan* and *Saint-Léger-du-Ventoux* stand (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group 4 (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
##
## * * *
##
## The **dimension 2** opposes individuals characterized by a strongly positive coordinate on the axis (to the top of the graph)
## to individuals such as *Gordes*, *Saint-Christol*, *Savoillan* and *Saint-Léger-du-Ventoux* (to the bottom of the graph, characterized by a strongly negative coordinate on the axis).
##
## The group 1 (characterized by a positive coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individuals *Gordes*, *Savoillan* and *Saint-Léger-du-Ventoux* stand (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
##
## The group in which the individual *Saint-Christol* stands (characterized by a negative coordinate on the axis) is sharing :
##
## - variables whose values do not differ significantly from the mean.
test_option4 = res.PCA_option4$ind$coord[rownames(res.PCA_option4$ind$coord) %in% cog_cove, 1:2]
## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 4 proposed 3 as the best number of clusters
## * 4 proposed 4 as the best number of clusters
## * 7 proposed 5 as the best number of clusters
## * 7 proposed 8 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 5
##
##
## *******************************************************************
##
## === CLASSIFICATION ===
## Nombre optimal de clusters : 5
dist_mat <- dist(test_option4, method = "euclidean")
hc <- hclust(dist_mat, method = "ward.D2")
plot(hc, main = "Arbre hiérarchique (Ward.D2)")
# Découpage en k clusters
clusters <- cutree(hc, k = nb_optimal_option4)
# Stocker résultat dans un objet "style HCPC"
res.HCPC_option4 <- list()
res.HCPC_option4$call <- list()
res.HCPC_option4$data.clust <- data.frame(clust = clusters)
rownames(res.HCPC_option4$data.clust) <- rownames(test_option4)
plot_cluster_profils_CAH(
data_cluster = data_clean[rownames(data_clean) %in% cog_cove, ],
res.HCPC = res.HCPC_option4,
main_title = "Profils moyens des clusters CAH avec ACP pondérée par population (Option 4)"
)
df_numeric_copy$cluster_cah_option4 <- NA
df_numeric_copy[rownames(res.HCPC_option4$data.clust), "cluster_cah_option4"] <- res.HCPC_option4$data.clust$clust
pal <- colorFactor(
palette = rainbow(length(unique(df_numeric_copy$cluster_cah_option4))),
domain = df_numeric_copy$cluster_cah_option4
)
leaflet(df_numeric_copy) %>%
addTiles() %>%
addCircleMarkers(
lng = ~long, lat = ~lat,
color = ~pal(cluster_cah_option4),
radius = 5,
stroke = FALSE,
fillOpacity = 0.8,
label = ~paste("Cluster:", cluster_cah_option4, rownames(df_numeric_copy))
) %>%
addLegend(
"bottomright",
pal = pal,
values = ~cluster_cah_option4,
title = "Cluster"
) %>%
# Ajouter le périmètre CoVe
addPolygons(
data = perimetre_com_cove,
color = "black",
weight = 1.5,
fill = FALSE
)
## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
##
## === CLASSIFICATION K-MEANS ===
## Nombre optimal de clusters : 5
## Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
## missing or invalid lat/lon values and will be ignored
stab_hclust4$bootmean
## [1] 0.7549563 0.8030714 0.7166071 0.6333333 0.5915873
stab_km4$bootmean
## [1] 0.6482262 0.8338333 0.6545000 0.7391667 0.6559405
## comp 1 comp 2 comp 3 comp 4 comp 5
## 47.63647 63.15946 75.10828 79.39328 83.14553
## [1] 0.6307086 0.6219088 0.5648448
## [1] 0.6413452 0.7356786 0.6600487 0.7882024
## [1] 0.6636984 0.8145455 0.5609221
stab_hclust3$bootmean
## [1] 0.5006587 0.7719286 0.6986270 0.7420967
stab_hclust4$bootmean
## [1] 0.7549563 0.8030714 0.7166071 0.6333333 0.5915873
stab_km3$bootmean
## [1] 0.7415000 0.7561667 0.7183333 0.8306667 0.6998333 0.8283333 0.7410476
stab_km4$bootmean
## [1] 0.6482262 0.8338333 0.6545000 0.7391667 0.6559405
## variance cumulée
res.PCA_option3$eig[1:5, 3]
## comp 1 comp 2 comp 3 comp 4 comp 5
## 38.25453 55.64312 64.14563 69.39901 73.85061
res.PCA_option4$eig[1:5, 3]
## comp 1 comp 2 comp 3 comp 4 comp 5
## 25.22745 43.06824 50.24575 56.15625 61.84930