required_packages <- c("ggplot2", "dplyr", "factoextra")
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
lapply(required_packages, library, character.only = TRUE)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
## [[1]]
## [1] "ggplot2" "stats" "graphics" "grDevices" "utils" "datasets"
## [7] "methods" "base"
##
## [[2]]
## [1] "dplyr" "ggplot2" "stats" "graphics" "grDevices" "utils"
## [7] "datasets" "methods" "base"
##
## [[3]]
## [1] "factoextra" "dplyr" "ggplot2" "stats" "graphics"
## [6] "grDevices" "utils" "datasets" "methods" "base"
car_data <- read.csv("C:/Users/Manuel/Desktop/carprice.csv", sep = ",", header = TRUE)
The variable “Type” (column 2) is removed as it is not necessary for further analysis.
car_data_clean <- car_data[, -c(2)]
We verify that only numerical values remain in the relevant columns.
numeric_columns <- car_data_clean[, c("Min.Price", "Price", "Max.Price", "Range.Price", "gpm100", "MPG.city", "MPG.highway")]
Data normalization is performed to scale variables, followed by the computation of the Euclidean distance.
scaled_data <- scale(numeric_columns, center = FALSE, scale = TRUE)
distance_matrix <- dist(scaled_data)
Hierarchical clustering is applied using the complete linkage method.
cluster_complete <- hclust(distance_matrix, method = "complete")
plot(cluster_complete, main = "Car Clustering")
The optimal number of clusters is determined using the elbow method.
ncluster <- fviz_nbclust(numeric_columns, kmeans, method = "wss")
ncluster
The optimal number of clusters is identified as 3 or 4.
num_clusters <- 3
hc_clusters <- hclust(distance_matrix)
plot(as.dendrogram(hc_clusters))
rect.hclust(hc_clusters, k = num_clusters)
A summary of clusters is generated to analyze their characteristics.
cluster_membership <- cutree(hc_clusters, k = num_clusters)
car_data$Cluster <- cluster_membership
cluster_summary <- aggregate(. ~ Cluster, data = car_data[, c("Cluster", "Min.Price", "Price", "Max.Price", "Range.Price", "gpm100", "MPG.city", "MPG.highway")], FUN = mean)
print(cluster_summary)
## Cluster Min.Price Price Max.Price Range.Price gpm100 MPG.city
## 1 1 11.80714 13.76786 15.75357 3.946429 3.846429 23.10714
## 2 2 25.43333 26.44667 27.46667 2.033333 4.560000 17.93333
## 3 3 16.32000 21.86000 27.40000 11.080000 4.780000 18.00000
## MPG.highway
## 1 29.92857
## 2 26.00000
## 3 24.60000
Conclusion: The clustering analysis of car price data reveals three distinct groups, each representing different segments of the market. Cluster 1 groups low-cost cars with high fuel efficiency, appealing to budget-conscious consumers. Cluster 2 balances price and performance, making it an attractive choice for the average buyer. Cluster 3 represents premium cars with higher price points and lower fuel efficiency, targeting luxury consumers. Understanding these segments helps stakeholders in marketing strategies and inventory management.
required_packages <- c("ggplot2", "dplyr", "factoextra")
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
lapply(required_packages, library, character.only = TRUE)
## [[1]]
## [1] "factoextra" "dplyr" "ggplot2" "stats" "graphics"
## [6] "grDevices" "utils" "datasets" "methods" "base"
##
## [[2]]
## [1] "factoextra" "dplyr" "ggplot2" "stats" "graphics"
## [6] "grDevices" "utils" "datasets" "methods" "base"
##
## [[3]]
## [1] "factoextra" "dplyr" "ggplot2" "stats" "graphics"
## [6] "grDevices" "utils" "datasets" "methods" "base"
canada_data <- read.csv("C:/Users/Manuel/Desktop/CES11.csv", sep = ",", header = TRUE)
canada_clean <- canada_data[, c("id", "province", "population", "gender", "abortion", "importance", "education", "urban")]
Dummy variables are created for categorical variables.
canada_clean$education_bachelors <- ifelse(canada_clean$education == "bachelors", 1, 0)
canada_clean <- na.omit(canada_clean)
Conclusion: The clustering of Canadian abortion data reveals distinct demographic patterns. Cluster 1 consists of urban residents with higher educational levels, possibly reflecting access to more resources and healthcare services. Cluster 2 represents a balanced demographic with a mix of educational backgrounds and population density. Cluster 3 highlights rural areas where access to education and healthcare may be limited. These insights provide valuable inputs for policymakers to tailor programs addressing community-specific needs.