Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
library(cluster)library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.2
Warning: package 'ggplot2' was built under R version 4.4.2
Warning: package 'tibble' was built under R version 4.4.2
Warning: package 'tidyr' was built under R version 4.4.2
Warning: package 'readr' was built under R version 4.4.2
Warning: package 'purrr' was built under R version 4.4.2
Warning: package 'dplyr' was built under R version 4.4.2
Warning: package 'stringr' was built under R version 4.4.2
Warning: package 'forcats' was built under R version 4.4.2
Warning: package 'lubridate' was built under R version 4.4.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(kableExtra)
Warning: package 'kableExtra' was built under R version 4.4.2
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
Importing datasets for visualisation
subtest <-read_csv("sub_testing.csv")
Rows: 150 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): renewed, gender
dbl (7): id, num_contacts, contact_recency, num_complaints, spend, lor, age
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 850 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): renewed, gender
dbl (7): id, num_contacts, contact_recency, num_complaints, spend, lor, age
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(subtrain)
The echo: false option disables the printing of code (only output is displayed).
subtest_2 <-select(subtest, renewed, contact_recency, lor, spend, gender, age)d1 <-dist(subtest_2)
Warning in dist(subtest_2): NAs introduced by coercion
subtrain_2 <-select(subtrain, renewed, contact_recency, lor, spend, gender, age)d2 <-dist(subtrain_2)
Warning in dist(subtrain_2): NAs introduced by coercion
clusters1 <-cutree(h1, k =3)clusters2 <-cutree(h2, k =3)
#Step 5: Assess the quality of the segmentationsil1 <-silhouette(clusters1, d1)summary(sil1)
Silhouette of 150 units in 3 clusters from silhouette.default(x = clusters1, dist = d1) :
Cluster sizes and average silhouette widths:
71 38 41
0.5052437 0.6476118 0.3736791
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.2161 0.4653 0.5661 0.5053 0.6338 0.7597
sil2 <-silhouette(clusters2, d2)summary(sil2)
Silhouette of 850 units in 3 clusters from silhouette.default(x = clusters2, dist = d2) :
Cluster sizes and average silhouette widths:
762 82 6
0.2702136 0.9033115 0.8217519
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.3632 0.1983 0.3282 0.3352 0.3836 0.9436
#####Step 6.1: Create a table showing the size of each segment (i.e the number of customers in the cluster)# and the average revenue generated in the last 6 months per customer.size_rev <- test_clus %>%group_by(clusters1) %>%summarise(id =n(),avg_rev =mean(spend))size_rev2 <- train_clus %>%group_by(clusters2) %>%summarise(id =n(),avg_rev =mean(spend))size_rev
# A tibble: 12 × 3
cluster Contact_Method Average_Value
<chr> <fct> <dbl>
1 C1 Spend 345.
2 C1 Lor 164.
3 C1 Contact 20.1
4 C1 Age 54.3
5 C2 Spend 5.09
6 C2 Lor 16.8
7 C2 Contact 22.2
8 C2 Age 47.2
9 C3 Spend 675.
10 C3 Lor 141.
11 C3 Contact 16.5
12 C3 Age 58.7
#Visualise the mean satisfaction score for each contact method by cluster.ggplot(test_clus_tidy, mapping =aes(x = Contact_Method, y = Average_Value, group = cluster, colour = cluster)) +geom_line(linewidth =1) +geom_point(size =2) +scale_colour_manual(values =c("pink", "darkblue", "green")) +ylab("Mean Satisfaction Score") +xlab("Contact Method") +ggtitle("Mean Satisfaction Score for each Contact Method by Cluster")
Energy drinks dataset
#Step 1: Import the dataEnergy <-read_csv('energy_drinks.csv')
Rows: 840 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): ID, Gender, Age
dbl (5): D1, D2, D3, D4, D5
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The dendrogram and block layout in the heatmap highlight distinct clusters within the dataset. Meanwhile, the hierarchical clustering and heatmap demonstrate that consumers can be grouped based on similarities within certain categories
Solution for the cluster
clusters2 <-cutree(h2, k =3)sil2 <-silhouette(clusters2, d2)summary(sil2)
Silhouette of 840 units in 3 clusters from silhouette.default(x = clusters2, dist = d2) :
Cluster sizes and average silhouette widths:
441 167 232
0.1589812 0.2412339 0.3048378
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.2863 0.1262 0.2338 0.2156 0.3280 0.5065
# Profiling the clusters by Age distributionage_profile <- Energy %>%group_by(Cluster, Age) %>%summarise(count =n()) %>%group_by(Cluster) %>%mutate(percentage = count /sum(count) *100)
`summarise()` has grouped output by 'Cluster'. You can override using the
`.groups` argument.
# View the age profile by clusterprint(age_profile)
ggplot(age_profile, aes(x =factor(Cluster), y = percentage, fill = Age, width =0.5)) +geom_bar(stat ="identity", position ="stack") +labs(title ="Age Distribution by Cluster", x ="Cluster", y ="Percentage") +coord_flip()
`summarise()` has grouped output by 'Cluster'. You can override using the
`.groups` argument.
# View the gender profile by clusterkable(gender_profile)
Cluster
Gender
count
percentage
1
Female
279
63.26531
1
Male
162
36.73469
2
Female
86
51.49701
2
Male
81
48.50299
3
Female
136
58.62069
3
Male
96
41.37931
ggplot(gender_profile, aes(x =factor(Cluster), y = percentage, fill = Gender, width =0.5)) +geom_bar(stat ="identity", position ="stack") +labs(title ="Gender Distribution by Cluster", x ="Cluster", y ="Percentage") +coord_flip()
cluster_profiles_long <- cluster_profiles %>%pivot_longer(cols =starts_with("avg_rating"),names_to ="Version", values_to ="Avg_Rating")# Plot the bar chart with best practicesggplot(cluster_profiles_long, aes(x =factor(Cluster), y = Avg_Rating, fill = Version)) +geom_bar(stat ="identity", position ="dodge", width =0.7, alpha =0.8) +# Adjust position and bar widthscale_fill_manual(values =c("blue", "green", "yellow", "purple", "orange")) +# Custom colorslabs(title ="Average Ratings by Cluster and Energy Drink Version",x ="Cluster",y ="Average Rating") +scale_y_continuous(limits =c(0, 10), expand =c(0, 0)) +# Adjust y-axis limits and remove paddingtheme_minimal() +theme(legend.title =element_blank(), # Remove legend titlelegend.position ="top", # Place the legend at the topaxis.text.x =element_text(angle =45, hjust =1), # Rotate x-axis labels for readabilityplot.title =element_text(hjust =0.5, size =16, face ="bold"), # Center and style titleaxis.title =element_text(size =12), # Style axis titlespanel.grid.major =element_line(color ="gray80", size =0.5), # Lighter grid lines for better readabilitypanel.grid.minor =element_blank() # Remove minor grid lines ) +facet_wrap(~ Version, scales ="free_y") # Separate plots for each version with free y scales
Company Recommendations
Market Focused Strategies Clustering - 1 (Female focused): Develop female oriented campaigns that emphasize on packaging, nutritional attributes, and even spicy taste. Target primarily on D3, D4, and D5 since they appear to be the top-rated items for this cluster. Clustering - 2 (Balanced cluster): Craft marketing strategies with neutral tone that promote equal interests from both male and female. D3 will be promoted heavily in this group since it is the top-selling item. Moderate promotions should be considered for D4. Clustering – 3 (Males problematic): Female targeted marketing continues here but males are to some extend targeted with the campaigns as well. The highest relative preferences in this cluster include D1. Product Alteration Alteration by Gender: The need to alter by gender comes up due to the differences in the clusters in regards to their taste and the primary drivers of their purchase. Remedial Measures: Make improvements on the lower-ranking versions. (E.g. D1 for Cluster 2, D3-D5 for Cluster 3) Need to work on customer relationships Cluster profiling in purchasing pattern: P455 for D13 can expected to be higher than for D2 or D3 since the segment is easily influenced. Marketing combinations will have to be changed. Customer Forecasting: Look for gaps in the poor version sales to determine promotion targets and customer satisfaction. Cluster 1: Create and market bundles inclusive of D3, D4, and D5 to capitalize on their popularity. Consider offering loyalty programs or new flavors to keep engagement high in this case. Cluster 2: Organize sales on D3, including discounts, referral programs, and campaigns where you bring one you get the other free. Lower preference suggests de-emphasizing D1 and D5 in this cluster. Cluster 3: Offer discounts and special packages to promote D1 or promote special editions of D1. Promote the sampling of most products including D2, D3, D4, and D5 to enhance buyer appetite before launch. Product Strategy by Cluster Cluster 1: Develop and target D3, D4, and D5 in areas where there is a demand for those products. Cluster 2: Use D3 more in advertising and marketing efforts and mix it with D4 that is rated fairly good to make it more enticing. Redo or stop D1 and D5 for this group as you may have no use for them. Cluster 3: Identify D1 as the most suitable upgrade D3, D4, and D5 so that they meet the expectations of the target population. Marketing and Communication Cluster 1: Target women largely as potential buyers by marketing the premium and the elegant features of D3, D4, and D5. Cluster 2: Clarify D3’s features but take care to ensure communications are gender balanced and are not scaring away any from the campaigns. Cluster 3: Use strong D1 but encourage trial for D2, D3, D4 and D5 and reward them for doing so. Further Segmentation Opportunities Demographic Insights: Segmentations are grouped according to age, purchase rates, or localities in which they are dominant to improve targeting. Behavioral Patterns: Use data on buying behavior to target specific products and endorse them preferably within the respective clusters. With these strategies in place, the firm is in a position to fully correspond with customers requirements, improve their satisfaction, and increase profits in the competitive environment.