MUSIC VISUALISATION AND SEGMENTATION

Author

Aakash Vashisth C00313452

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

library(cluster)
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.2
Warning: package 'ggplot2' was built under R version 4.4.2
Warning: package 'tibble' was built under R version 4.4.2
Warning: package 'tidyr' was built under R version 4.4.2
Warning: package 'readr' was built under R version 4.4.2
Warning: package 'purrr' was built under R version 4.4.2
Warning: package 'dplyr' was built under R version 4.4.2
Warning: package 'stringr' was built under R version 4.4.2
Warning: package 'forcats' was built under R version 4.4.2
Warning: package 'lubridate' was built under R version 4.4.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(kableExtra)
Warning: package 'kableExtra' was built under R version 4.4.2

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Importing datasets for visualisation

subtest <- read_csv("sub_testing.csv")
Rows: 150 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): renewed, gender
dbl (7): id, num_contacts, contact_recency, num_complaints, spend, lor, age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(subtest)
subtrain <- read_csv("sub_training.csv")
Rows: 850 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): renewed, gender
dbl (7): id, num_contacts, contact_recency, num_complaints, spend, lor, age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(subtrain)

The echo: false option disables the printing of code (only output is displayed).

subtest_2 <- select(subtest, renewed, contact_recency, lor, spend, gender, age)
d1 <- dist(subtest_2)
Warning in dist(subtest_2): NAs introduced by coercion
subtrain_2 <- select(subtrain, renewed, contact_recency, lor, spend, gender, age)
d2 <- dist(subtrain_2)
Warning in dist(subtrain_2): NAs introduced by coercion

Conversion of age to Value

subtest_2$age <- as.numeric(subtest_2$age)
subtrain_2$age <- as.numeric(subtrain_2$age)
h1 <- hclust(d1)
plot(h1, hang = -1)

heatmap(as.matrix(d1), Rowv = as.dendrogram(h1), Colv = 'Rowv')

h2 <- hclust(d2)
plot(h2, hang = -1)

heatmap(as.matrix(d1), Rowv = as.dendrogram(h1), Colv = 'Rowv')

clusters1 <- cutree(h1, k = 3)
clusters2 <- cutree(h2, k = 3)
#Step 5: Assess the quality of the segmentation
sil1 <- silhouette(clusters1, d1)
summary(sil1)
Silhouette of 150 units in 3 clusters from silhouette.default(x = clusters1, dist = d1) :
 Cluster sizes and average silhouette widths:
       71        38        41 
0.5052437 0.6476118 0.3736791 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.2161  0.4653  0.5661  0.5053  0.6338  0.7597 
sil2 <- silhouette(clusters2, d2)
summary(sil2)
Silhouette of 850 units in 3 clusters from silhouette.default(x = clusters2, dist = d2) :
 Cluster sizes and average silhouette widths:
      762        82         6 
0.2702136 0.9033115 0.8217519 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.3632  0.1983  0.3282  0.3352  0.3836  0.9436 
#Step 6: Profile the clusters. 
test_clus <- cbind(subtest, clusters1)
test_clus <- mutate(test_clus, cluster = case_when(clusters1 == 1 ~ 'C1',
                                                   clusters1 == 2 ~ 'C2',
                                                   clusters1 == 3 ~ 'C3'))
train_clus <- cbind(subtrain, clusters2)
train_clus <- mutate(train_clus, cluster = case_when(clusters2 == 1 ~ 'C1',
                                                   clusters2 == 2 ~ 'C2',
                                                   clusters2 == 3 ~ 'C3'))
#####Step 6.1: Create a table showing the size of each segment (i.e the number of customers in the cluster)
#              and the average revenue generated in the last 6 months per customer.


size_rev <- test_clus %>%
  group_by(clusters1) %>%
  summarise(id = n(),
            avg_rev = mean(spend))
size_rev2 <- train_clus %>%
  group_by(clusters2) %>%
  summarise(id = n(),
            avg_rev = mean(spend))

size_rev
# A tibble: 3 × 3
  clusters1    id avg_rev
      <int> <int>   <dbl>
1         1    71    201.
2         2    38    439.
3         3    41    391.
ggplot(test_clus, aes(x = age, fill = factor(cluster))) + 
  geom_bar(aes(y = after_stat(count) / sum(after_stat(count))), stat = "count", show.legend = TRUE) +
  facet_grid(~ cluster) +
  scale_y_continuous(labels = scales::percent_format()) +  # Label as percentages
  scale_fill_brewer(palette = "Set3") +  # Use a color palette for distinction
  ylab("Percentage of People") + 
  xlab("Age Group") +
  ggtitle("Age Breakdown by Cluster") +
  theme_minimal() +  # Minimal theme for clarity
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),  # Rotate x-axis labels for better readability
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  # Title styling
    strip.text = element_text(size = 12, face = "bold"),  # Facet label styling
    axis.title = element_text(size = 12),  # Axis title styling
    legend.position = "top",  # Position legend at the top
    panel.grid.major = element_line(color = "lightblue", size = 0.5),  # Light grid lines
    panel.grid.minor = element_blank()  # Remove minor grid lines
  ) +
  coord_flip()  
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

ggplot(train_clus, aes(x = age, group = cluster)) + 
  geom_bar(aes(y = ..prop..), stat = "count", show.legend = FALSE) +
  facet_grid(~ cluster) +
  scale_y_continuous(labels = scales::percent) +
  ylab("Percentage of People") + 
  xlab("Age Group") +
  ggtitle("Age Breakdown by Cluster") +
  coord_flip() 
Warning: The dot-dot notation (`..prop..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(prop)` instead.

test_clus_means <- test_clus %>%
  group_by(cluster) %>%
  summarise(Spend = mean(spend),
            Lor = mean(lor),
            Contact = mean(contact_recency),
            Age = mean(age))

test_clus_means
# A tibble: 3 × 5
  cluster Spend   Lor Contact   Age
  <chr>   <dbl> <dbl>   <dbl> <dbl>
1 C1       201.  74.5    19.7  47.3
2 C2       439. 107.     21.2  53.9
3 C3       391. 310.     23.5  59.9
train_clus_means <- train_clus %>%
  group_by(cluster) %>%
  summarise(Spend = mean(spend),
            Lor = mean(lor),
            Contact = mean(contact_recency),
            Age = mean(age))

train_clus_means
# A tibble: 3 × 5
  cluster  Spend   Lor Contact   Age
  <chr>    <dbl> <dbl>   <dbl> <dbl>
1 C1      345.   164.     20.1  54.3
2 C2        5.09  16.8    22.2  47.2
3 C3      675.   141.     16.5  58.7
test_clus_tidy <- test_clus_means %>%
  pivot_longer(cols = c(Spend, Lor, Contact, Age), names_to = "Contact_Method", values_to = "Average_Value")

test_clus_tidy$Contact_Method <- factor(test_clus_tidy$Contact_Method, levels = c("Spend", "Lor", "Contact", "Age"))

test_clus_tidy
# A tibble: 12 × 3
   cluster Contact_Method Average_Value
   <chr>   <fct>                  <dbl>
 1 C1      Spend                  201. 
 2 C1      Lor                     74.5
 3 C1      Contact                 19.7
 4 C1      Age                     47.3
 5 C2      Spend                  439. 
 6 C2      Lor                    107. 
 7 C2      Contact                 21.2
 8 C2      Age                     53.9
 9 C3      Spend                  391. 
10 C3      Lor                    310. 
11 C3      Contact                 23.5
12 C3      Age                     59.9
train_clus_tidy <- train_clus_means %>%
  pivot_longer(cols = c(Spend, Lor, Contact, Age), names_to = "Contact_Method", values_to = "Average_Value")

train_clus_tidy$Contact_Method <- factor(test_clus_tidy$Contact_Method, levels = c("Spend", "Lor", "Contact", "Age"))

train_clus_tidy
# A tibble: 12 × 3
   cluster Contact_Method Average_Value
   <chr>   <fct>                  <dbl>
 1 C1      Spend                 345.  
 2 C1      Lor                   164.  
 3 C1      Contact                20.1 
 4 C1      Age                    54.3 
 5 C2      Spend                   5.09
 6 C2      Lor                    16.8 
 7 C2      Contact                22.2 
 8 C2      Age                    47.2 
 9 C3      Spend                 675.  
10 C3      Lor                   141.  
11 C3      Contact                16.5 
12 C3      Age                    58.7 
#Visualise the mean satisfaction score for each contact method by cluster.
ggplot(test_clus_tidy, mapping = aes(x = Contact_Method, y = Average_Value, group = cluster, colour = cluster)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  scale_colour_manual(values = c("pink", "darkblue", "green")) +
  ylab("Mean Satisfaction Score") + 
  xlab("Contact Method") +
  ggtitle("Mean Satisfaction Score for each Contact Method by Cluster")

Energy drinks dataset

#Step 1: Import the data
Energy <- read_csv('energy_drinks.csv')
Rows: 840 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): ID, Gender, Age
dbl (5): D1, D2, D3, D4, D5

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Energy)

Numerical value

Energy$Gender <- ifelse(Energy$Gender == "Female", 1, 0)

Compute distances between each pair of players

Energy_2 <- select(Energy, D1:D5)
Energy_2_scale <- scale(Energy_2)
d2 <- dist(Energy_2_scale)

Data must be scaled before calculating or creating a distance matrix. Proceed with performing hierarchical clustering

h2 <- hclust(d2, method = "average")
plot(h2, hang = -1)

heatmap(as.matrix(d2), Rowv = as.dendrogram(h2), Colv = 'Rowv', labRow = F, labCol = F)

The dendrogram and block layout in the heatmap highlight distinct clusters within the dataset. Meanwhile, the hierarchical clustering and heatmap demonstrate that consumers can be grouped based on similarities within certain categories

Solution for the cluster

clusters2 <- cutree(h2, k = 3)


sil2 <- silhouette(clusters2, d2)
summary(sil2)
Silhouette of 840 units in 3 clusters from silhouette.default(x = clusters2, dist = d2) :
 Cluster sizes and average silhouette widths:
      441       167       232 
0.1589812 0.2412339 0.3048378 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.2863  0.1262  0.2338  0.2156  0.3280  0.5065 
#Step 6: Profile the clusters. 
Energy_clus <- cbind(Energy, clusters2)
Energy_clus <- mutate(Energy_clus, cluster = case_when(clusters2 == 1 ~ 'C1',
                                                 clusters2 == 2 ~ 'C2',
                                                 clusters2 == 3 ~ 'C3'))
Energy$Cluster <- clusters2
str(Energy)
spc_tbl_ [840 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ID     : chr [1:840] "ID_1" "ID_2" "ID_3" "ID_4" ...
 $ D1     : num [1:840] 2 4 2 1 1 2 1 1 2 5 ...
 $ D2     : num [1:840] 3 4 3 6 3 3 5 3 3 5 ...
 $ D3     : num [1:840] 7 5 8 5 7 8 6 7 6 6 ...
 $ D4     : num [1:840] 7 6 8 8 7 7 5 9 7 7 ...
 $ D5     : num [1:840] 7 9 5 6 7 5 5 7 5 7 ...
 $ Gender : num [1:840] 0 0 1 1 0 0 1 0 1 1 ...
 $ Age    : chr [1:840] "Under_25" "Under_25" "Under_25" "Under_25" ...
 $ Cluster: int [1:840] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   ID = col_character(),
  ..   D1 = col_double(),
  ..   D2 = col_double(),
  ..   D3 = col_double(),
  ..   D4 = col_double(),
  ..   D5 = col_double(),
  ..   Gender = col_character(),
  ..   Age = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 
Energy$Gender <- factor(Energy$Gender, levels = c(0, 1), labels = c("Female", "Male"))

# Convert Age column to a factor (e.g., "Under_25", "25_34")
Energy$Age <- factor(Energy$Age, levels = c("Under_25", "25_34", "35_44", "45_54", "55_64", "65_Above"))

# Check the structure after conversion
str(Energy)
spc_tbl_ [840 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ID     : chr [1:840] "ID_1" "ID_2" "ID_3" "ID_4" ...
 $ D1     : num [1:840] 2 4 2 1 1 2 1 1 2 5 ...
 $ D2     : num [1:840] 3 4 3 6 3 3 5 3 3 5 ...
 $ D3     : num [1:840] 7 5 8 5 7 8 6 7 6 6 ...
 $ D4     : num [1:840] 7 6 8 8 7 7 5 9 7 7 ...
 $ D5     : num [1:840] 7 9 5 6 7 5 5 7 5 7 ...
 $ Gender : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 1 2 1 2 2 ...
 $ Age    : Factor w/ 6 levels "Under_25","25_34",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Cluster: int [1:840] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   ID = col_character(),
  ..   D1 = col_double(),
  ..   D2 = col_double(),
  ..   D3 = col_double(),
  ..   D4 = col_double(),
  ..   D5 = col_double(),
  ..   Gender = col_character(),
  ..   Age = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 
# Profiling the clusters by Age distribution
age_profile <- Energy %>%
  group_by(Cluster, Age) %>%
  summarise(count = n()) %>%
  group_by(Cluster) %>%
  mutate(percentage = count / sum(count) * 100)
`summarise()` has grouped output by 'Cluster'. You can override using the
`.groups` argument.
# View the age profile by cluster
print(age_profile)
# A tibble: 9 × 4
# Groups:   Cluster [3]
  Cluster Age      count percentage
    <int> <fct>    <int>      <dbl>
1       1 Under_25    87       19.7
2       1 25_34      192       43.5
3       1 <NA>       162       36.7
4       2 Under_25    28       16.8
5       2 25_34       58       34.7
6       2 <NA>        81       48.5
7       3 Under_25    41       17.7
8       3 25_34       88       37.9
9       3 <NA>       103       44.4

Plotting age distribution by cluster

ggplot(age_profile, aes(x = factor(Cluster), y = percentage, fill = Age, width = 0.5)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Age Distribution by Cluster", x = "Cluster", y = "Percentage") +
  coord_flip() 

  theme_minimal()
List of 136
 $ line                            :List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ lineend      : chr "butt"
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ rect                            :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ text                            :List of 11
  ..$ family       : chr ""
  ..$ face         : chr "plain"
  ..$ colour       : chr "black"
  ..$ size         : num 11
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : num 0
  ..$ lineheight   : num 0.9
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ title                           : NULL
 $ aspect.ratio                    : NULL
 $ axis.title                      : NULL
 $ axis.title.x                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.top                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.bottom             : NULL
 $ axis.title.y                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.y.left               : NULL
 $ axis.title.y.right              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text                       :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey30"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.top                 :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.bottom              : NULL
 $ axis.text.y                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.y.left                : NULL
 $ axis.text.y.right               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.theta                 : NULL
 $ axis.text.r                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0.5
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.ticks                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.ticks.x                    : NULL
 $ axis.ticks.x.top                : NULL
 $ axis.ticks.x.bottom             : NULL
 $ axis.ticks.y                    : NULL
 $ axis.ticks.y.left               : NULL
 $ axis.ticks.y.right              : NULL
 $ axis.ticks.theta                : NULL
 $ axis.ticks.r                    : NULL
 $ axis.minor.ticks.x.top          : NULL
 $ axis.minor.ticks.x.bottom       : NULL
 $ axis.minor.ticks.y.left         : NULL
 $ axis.minor.ticks.y.right        : NULL
 $ axis.minor.ticks.theta          : NULL
 $ axis.minor.ticks.r              : NULL
 $ axis.ticks.length               : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ axis.ticks.length.x             : NULL
 $ axis.ticks.length.x.top         : NULL
 $ axis.ticks.length.x.bottom      : NULL
 $ axis.ticks.length.y             : NULL
 $ axis.ticks.length.y.left        : NULL
 $ axis.ticks.length.y.right       : NULL
 $ axis.ticks.length.theta         : NULL
 $ axis.ticks.length.r             : NULL
 $ axis.minor.ticks.length         : 'rel' num 0.75
 $ axis.minor.ticks.length.x       : NULL
 $ axis.minor.ticks.length.x.top   : NULL
 $ axis.minor.ticks.length.x.bottom: NULL
 $ axis.minor.ticks.length.y       : NULL
 $ axis.minor.ticks.length.y.left  : NULL
 $ axis.minor.ticks.length.y.right : NULL
 $ axis.minor.ticks.length.theta   : NULL
 $ axis.minor.ticks.length.r       : NULL
 $ axis.line                       : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.line.x                     : NULL
 $ axis.line.x.top                 : NULL
 $ axis.line.x.bottom              : NULL
 $ axis.line.y                     : NULL
 $ axis.line.y.left                : NULL
 $ axis.line.y.right               : NULL
 $ axis.line.theta                 : NULL
 $ axis.line.r                     : NULL
 $ legend.background               : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.margin                   : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.spacing                  : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ legend.spacing.x                : NULL
 $ legend.spacing.y                : NULL
 $ legend.key                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.key.size                 : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height               : NULL
 $ legend.key.width                : NULL
 $ legend.key.spacing              : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.key.spacing.x            : NULL
 $ legend.key.spacing.y            : NULL
 $ legend.frame                    : NULL
 $ legend.ticks                    : NULL
 $ legend.ticks.length             : 'rel' num 0.2
 $ legend.axis.line                : NULL
 $ legend.text                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.text.position            : NULL
 $ legend.title                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.title.position           : NULL
 $ legend.position                 : chr "right"
 $ legend.position.inside          : NULL
 $ legend.direction                : NULL
 $ legend.byrow                    : NULL
 $ legend.justification            : chr "center"
 $ legend.justification.top        : NULL
 $ legend.justification.bottom     : NULL
 $ legend.justification.left       : NULL
 $ legend.justification.right      : NULL
 $ legend.justification.inside     : NULL
 $ legend.location                 : NULL
 $ legend.box                      : NULL
 $ legend.box.just                 : NULL
 $ legend.box.margin               : 'margin' num [1:4] 0cm 0cm 0cm 0cm
  ..- attr(*, "unit")= int 1
 $ legend.box.background           : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.box.spacing              : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
  [list output truncated]
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi TRUE
 - attr(*, "validate")= logi TRUE

Clusters by Gender distribution

gender_profile <- Energy %>%
  group_by(Cluster, Gender) %>%
  summarise(count = n()) %>%
  group_by(Cluster) %>%
  mutate(percentage = count / sum(count) * 100)
`summarise()` has grouped output by 'Cluster'. You can override using the
`.groups` argument.
# View the gender profile by cluster
kable(gender_profile)
Cluster Gender count percentage
1 Female 279 63.26531
1 Male 162 36.73469
2 Female 86 51.49701
2 Male 81 48.50299
3 Female 136 58.62069
3 Male 96 41.37931
ggplot(gender_profile, aes(x = factor(Cluster), y = percentage, fill = Gender, width = 0.5)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Gender Distribution by Cluster", x = "Cluster", y = "Percentage") +
  coord_flip() 

  theme_minimal()
List of 136
 $ line                            :List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ lineend      : chr "butt"
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ rect                            :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ text                            :List of 11
  ..$ family       : chr ""
  ..$ face         : chr "plain"
  ..$ colour       : chr "black"
  ..$ size         : num 11
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : num 0
  ..$ lineheight   : num 0.9
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ title                           : NULL
 $ aspect.ratio                    : NULL
 $ axis.title                      : NULL
 $ axis.title.x                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.top                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.bottom             : NULL
 $ axis.title.y                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.y.left               : NULL
 $ axis.title.y.right              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text                       :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey30"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.top                 :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.bottom              : NULL
 $ axis.text.y                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.y.left                : NULL
 $ axis.text.y.right               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.theta                 : NULL
 $ axis.text.r                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0.5
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.ticks                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.ticks.x                    : NULL
 $ axis.ticks.x.top                : NULL
 $ axis.ticks.x.bottom             : NULL
 $ axis.ticks.y                    : NULL
 $ axis.ticks.y.left               : NULL
 $ axis.ticks.y.right              : NULL
 $ axis.ticks.theta                : NULL
 $ axis.ticks.r                    : NULL
 $ axis.minor.ticks.x.top          : NULL
 $ axis.minor.ticks.x.bottom       : NULL
 $ axis.minor.ticks.y.left         : NULL
 $ axis.minor.ticks.y.right        : NULL
 $ axis.minor.ticks.theta          : NULL
 $ axis.minor.ticks.r              : NULL
 $ axis.ticks.length               : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ axis.ticks.length.x             : NULL
 $ axis.ticks.length.x.top         : NULL
 $ axis.ticks.length.x.bottom      : NULL
 $ axis.ticks.length.y             : NULL
 $ axis.ticks.length.y.left        : NULL
 $ axis.ticks.length.y.right       : NULL
 $ axis.ticks.length.theta         : NULL
 $ axis.ticks.length.r             : NULL
 $ axis.minor.ticks.length         : 'rel' num 0.75
 $ axis.minor.ticks.length.x       : NULL
 $ axis.minor.ticks.length.x.top   : NULL
 $ axis.minor.ticks.length.x.bottom: NULL
 $ axis.minor.ticks.length.y       : NULL
 $ axis.minor.ticks.length.y.left  : NULL
 $ axis.minor.ticks.length.y.right : NULL
 $ axis.minor.ticks.length.theta   : NULL
 $ axis.minor.ticks.length.r       : NULL
 $ axis.line                       : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.line.x                     : NULL
 $ axis.line.x.top                 : NULL
 $ axis.line.x.bottom              : NULL
 $ axis.line.y                     : NULL
 $ axis.line.y.left                : NULL
 $ axis.line.y.right               : NULL
 $ axis.line.theta                 : NULL
 $ axis.line.r                     : NULL
 $ legend.background               : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.margin                   : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.spacing                  : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ legend.spacing.x                : NULL
 $ legend.spacing.y                : NULL
 $ legend.key                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.key.size                 : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height               : NULL
 $ legend.key.width                : NULL
 $ legend.key.spacing              : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.key.spacing.x            : NULL
 $ legend.key.spacing.y            : NULL
 $ legend.frame                    : NULL
 $ legend.ticks                    : NULL
 $ legend.ticks.length             : 'rel' num 0.2
 $ legend.axis.line                : NULL
 $ legend.text                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.text.position            : NULL
 $ legend.title                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.title.position           : NULL
 $ legend.position                 : chr "right"
 $ legend.position.inside          : NULL
 $ legend.direction                : NULL
 $ legend.byrow                    : NULL
 $ legend.justification            : chr "center"
 $ legend.justification.top        : NULL
 $ legend.justification.bottom     : NULL
 $ legend.justification.left       : NULL
 $ legend.justification.right      : NULL
 $ legend.justification.inside     : NULL
 $ legend.location                 : NULL
 $ legend.box                      : NULL
 $ legend.box.just                 : NULL
 $ legend.box.margin               : 'margin' num [1:4] 0cm 0cm 0cm 0cm
  ..- attr(*, "unit")= int 1
 $ legend.box.background           : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.box.spacing              : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
  [list output truncated]
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi TRUE
 - attr(*, "validate")= logi TRUE
cluster_profiles <- Energy %>%
  group_by(Cluster) %>%
  summarise(
    avg_rating_D1 = mean(D1, na.rm = TRUE),
    avg_rating_D2 = mean(D2, na.rm = TRUE),
    avg_rating_D3 = mean(D3, na.rm = TRUE),
    avg_rating_D4 = mean(D4, na.rm = TRUE),
    avg_rating_D5 = mean(D5, na.rm = TRUE)
  )

# View the cluster profiling table
print(cluster_profiles)
# A tibble: 3 × 6
  Cluster avg_rating_D1 avg_rating_D2 avg_rating_D3 avg_rating_D4 avg_rating_D5
    <int>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
1       1          2.95          4.81          6.27          6.65          6.60
2       2          2.51          4.61          7.32          5.09          2.72
3       3          6.64          5.08          3.44          3.16          2.96
cluster_profiles_long <- cluster_profiles %>%
  pivot_longer(cols = starts_with("avg_rating"),
               names_to = "Version", 
               values_to = "Avg_Rating")

# Plot the bar chart with best practices
ggplot(cluster_profiles_long, aes(x = factor(Cluster), y = Avg_Rating, fill = Version)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7, alpha = 0.8) +  # Adjust position and bar width
  scale_fill_manual(values = c("blue", "green", "yellow", "purple", "orange")) +  # Custom colors
  labs(title = "Average Ratings by Cluster and Energy Drink Version",
       x = "Cluster",
       y = "Average Rating") +
  scale_y_continuous(limits = c(0, 10), expand = c(0, 0)) +  # Adjust y-axis limits and remove padding
  theme_minimal() + 
  theme(
    legend.title = element_blank(),  # Remove legend title
    legend.position = "top",         # Place the legend at the top
    axis.text.x = element_text(angle = 45, hjust = 1),  # Rotate x-axis labels for readability
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  # Center and style title
    axis.title = element_text(size = 12),  # Style axis titles
    panel.grid.major = element_line(color = "gray80", size = 0.5),  # Lighter grid lines for better readability
    panel.grid.minor = element_blank()  # Remove minor grid lines
  ) +
  facet_wrap(~ Version, scales = "free_y")  # Separate plots for each version with free y scales

Company Recommendations

Market Focused Strategies Clustering - 1 (Female focused): Develop female oriented campaigns that emphasize on packaging, nutritional attributes, and even spicy taste. Target primarily on D3, D4, and D5 since they appear to be the top-rated items for this cluster. Clustering - 2 (Balanced cluster): Craft marketing strategies with neutral tone that promote equal interests from both male and female. D3 will be promoted heavily in this group since it is the top-selling item. Moderate promotions should be considered for D4. Clustering – 3 (Males problematic): Female targeted marketing continues here but males are to some extend targeted with the campaigns as well. The highest relative preferences in this cluster include D1. Product Alteration Alteration by Gender: The need to alter by gender comes up due to the differences in the clusters in regards to their taste and the primary drivers of their purchase. Remedial Measures: Make improvements on the lower-ranking versions. (E.g. D1 for Cluster 2, D3-D5 for Cluster 3) Need to work on customer relationships Cluster profiling in purchasing pattern:   P455 for D13 can expected to be higher than for D2 or D3 since the segment is easily influenced. Marketing combinations will have to be changed.   Customer Forecasting: Look for gaps in the poor version sales to determine promotion targets and customer satisfaction. Cluster 1: Create and market bundles inclusive of D3, D4, and D5 to capitalize on their popularity. Consider offering loyalty programs or new flavors to keep engagement high in this case. Cluster 2: Organize sales on D3, including discounts, referral programs, and campaigns where you bring one you get the other free. Lower preference suggests de-emphasizing D1 and D5 in this cluster. Cluster 3: Offer discounts and special packages to promote D1 or promote special editions of D1. Promote the sampling of most products including D2, D3, D4, and D5 to enhance buyer appetite before launch. Product Strategy by Cluster Cluster 1: Develop and target D3, D4, and D5 in areas where there is a demand for those products. Cluster 2: Use D3 more in advertising and marketing efforts and mix it with D4 that is rated fairly good to make it more enticing. Redo or stop D1 and D5 for this group as you may have no use for them. Cluster 3: Identify D1 as the most suitable upgrade D3, D4, and D5 so that they meet the expectations of the target population. Marketing and Communication Cluster 1: Target women largely as potential buyers by marketing the premium and the elegant features of D3, D4, and D5. Cluster 2: Clarify D3’s features but take care to ensure communications are gender balanced and are not scaring away any from the campaigns. Cluster 3: Use strong D1 but encourage trial for D2, D3, D4 and D5 and reward them for doing so. Further Segmentation Opportunities Demographic Insights: Segmentations are grouped according to age, purchase rates, or localities in which they are dominant to improve targeting. Behavioral Patterns: Use data on buying behavior to target specific products and endorse them preferably within the respective clusters. With these strategies in place, the firm is in a position to fully correspond with customers requirements, improve their satisfaction, and increase profits in the competitive environment.