MUSIC VISUALISATION AND SEGMENTATION

Author

E.B

Analysis for Music company

Analysis for a music downloading company collected data about its customers’ subscription base for the last four years. The company considers that a customer has churned when his/her subscription is not renewed within one week after the expiry date. The independent variables were measured over the 36-month period prior to the date on which the customer either renewed or churned. The independent variables contain information about interactions between the customers and the company, socio-demographic information and subscription-describing information.

Loading Packages for the visualization

library(cluster)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Importing datasets for visualisation

subtest <- read_csv("sub_testing.csv")

Rows: 150 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): renewed, gender
dbl (7): id, num_contacts, contact_recency, num_complaints, spend, lor, age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(subtest)
subtrain <- read_csv("sub_training.csv")

Rows: 850 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): renewed, gender
dbl (7): id, num_contacts, contact_recency, num_complaints, spend, lor, age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(subtrain)

The echo: false option disables the printing of code (only output is displayed).

subtest_2 <- select(subtest, renewed, contact_recency, lor, spend, gender, age)
d1 <- dist(subtest_2)

Warning in dist(subtest_2): NAs introduced by coercion

subtrain_2 <- select(subtrain, renewed, contact_recency, lor, spend, gender, age)
d2 <- dist(subtrain_2)

Warning in dist(subtrain_2): NAs introduced by coercion

subtest_2_clean <- subtest %>%
  select(renewed, contact_recency, lor, spend, gender, age) %>%
  mutate(
    gender = as.factor(gender),  # Convert 'gender' to factor
    age = as.factor(age)         # Convert 'age' to factor
  ) %>%
  drop_na()  # Remove rows with missing values

subtrain_2_clean <- subtrain %>%
  select(renewed, contact_recency, lor, spend, gender, age) %>%
  mutate(
    gender = as.factor(gender),  # Convert 'gender' to factor
    age = as.factor(age)         # Convert 'age' to factor
  ) %>%
  drop_na()  # Remove rows with missing values

# Convert factors into numeric representations (if necessary)
subtest_2_clean <- subtest_2_clean %>%
  mutate(
    gender = as.numeric(gender),  # Convert gender to numeric (1 = Female, 2 = Male)
    age = as.numeric(age)         # Convert age to numeric (e.g., 1 = Under_25, 2 = 25_34, etc.)
  )

subtrain_2_clean <- subtrain_2_clean %>%
  mutate(
    gender = as.numeric(gender),
    age = as.numeric(age)
  )

# Compute Euclidean distance matrices
d1 <- dist(subtest_2_clean, method = "euclidean")

Warning in dist(subtest_2_clean, method = "euclidean"): NAs introduced by
coercion

d2 <- dist(subtrain_2_clean, method = "euclidean")

Warning in dist(subtrain_2_clean, method = "euclidean"): NAs introduced by
coercion

#summary of distances (optional)
summary(d1)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0   129.1   240.4   237.6   308.7   732.7

summary(d2)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0   135.3   243.3   247.8   316.7   817.9

Converting age to numeric value

subtest_2$age <- as.numeric(subtest_2$age)
subtrain_2$age <- as.numeric(subtrain_2$age)

Creating a heatmap

h1 <- hclust(d1)
plot(h1, hang = -1)

heatmap(as.matrix(d1), Rowv = as.dendrogram(h1), Colv = 'Rowv')

h2 <- hclust(d2)
plot(h2, hang = -1)

heatmap(as.matrix(d1), Rowv = as.dendrogram(h1), Colv = 'Rowv')

According to this dendogram, the further you go down, the similar the nodes are.

clusters1 <- cutree(h1, k = 3)
clusters2 <- cutree(h2, k = 3)

Assess the quality of the segmentation

sil1 <- silhouette(clusters1, d1)
summary(sil1)

Silhouette of 150 units in 3 clusters from silhouette.default(x = clusters1, dist = d1) :
 Cluster sizes and average silhouette widths:
       71        64        15 
0.5627435 0.4217442 0.3025624 
Individual silhouette widths:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.002092  0.354104  0.526981  0.476566  0.620396  0.698237

sil2 <- silhouette(clusters2, d2)
summary(sil2)

Silhouette of 850 units in 3 clusters from silhouette.default(x = clusters2, dist = d2) :
 Cluster sizes and average silhouette widths:
      387       276       187 
0.4591246 0.5782505 0.4968392 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.4597  0.4736  0.5515  0.5061  0.6217  0.7286

Profile the clusters.

#Step 6:  
test_clus <- cbind(subtest, clusters1)
test_clus <- mutate(test_clus, cluster = case_when(clusters1 == 1 ~ 'C1',
                                                   clusters1 == 2 ~ 'C2',
                                                   clusters1 == 3 ~ 'C3'))
train_clus <- cbind(subtrain, clusters2)
train_clus <- mutate(train_clus, cluster = case_when(clusters2 == 1 ~ 'C1',
                                                   clusters2 == 2 ~ 'C2',
                                                   clusters2 == 3 ~ 'C3'))

#####Step 6.1: Create a table showing the size of each segment (i.e the number of customers in the cluster)
#              and the average revenue generated in the last 6 months per customer.


size_rev <- test_clus %>%
  group_by(clusters1) %>%
  summarise(id = n(),
            avg_rev = mean(spend))
size_rev2 <- train_clus %>%
  group_by(clusters2) %>%
  summarise(id = n(),
            avg_rev = mean(spend))

size_rev

# A tibble: 3 × 3
  clusters1    id avg_rev
      <int> <int>   <dbl>
1         1    71    201.
2         2    64    439.
3         3    15    310.

ggplot(test_clus, aes(x = age, fill = factor(cluster))) + 
  geom_bar(aes(y = after_stat(count) / sum(after_stat(count))), stat = "count", show.legend = TRUE) +
  facet_grid(~ cluster) +
  scale_y_continuous(labels = scales::percent_format()) +  # Label as percentages
  scale_fill_brewer(palette = "Set2") +  # Use a color palette for distinction
  ylab("Percentage of People") + 
  xlab("Age Group") +
  ggtitle("Age Breakdown by Cluster") +
  theme_minimal() +  # Minimal theme for clarity
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),  # Rotate x-axis labels for better readability
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  # Title styling
    strip.text = element_text(size = 12, face = "bold"),  # Facet label styling
    axis.title = element_text(size = 12),  # Axis title styling
    legend.position = "top",  # Position legend at the top
    panel.grid.major = element_line(color = "gray80", size = 0.5),  # dark grid lines
    panel.grid.minor = element_blank()  # Remove minor grid lines
  ) +
  coord_flip()

Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

test_clus_means <- test_clus %>%
  group_by(cluster) %>%
  summarise(Spend = mean(spend),
            Lor = mean(lor),
            Contact = mean(contact_recency),
            Age = mean(age))

test_clus_means

# A tibble: 3 × 5
  cluster Spend   Lor Contact   Age
  <chr>   <dbl> <dbl>   <dbl> <dbl>
1 C1       201.  74.5    19.7  47.3
2 C2       439. 179.     22.0  56.5
3 C3       310. 354.     23.9  59.5

train_clus_means <- train_clus %>%
  group_by(cluster) %>%
  summarise(Spend = mean(spend),
            Lor = mean(lor),
            Contact = mean(contact_recency),
            Age = mean(age))

train_clus_means

# A tibble: 3 × 5
  cluster Spend   Lor Contact   Age
  <chr>   <dbl> <dbl>   <dbl> <dbl>
1 C1       182.  84.3    20.1  47.2
2 C2       440. 123.     21.0  53.9
3 C3       403. 325.     19.4  66.4

test_clus_tidy <- test_clus_means %>%
  pivot_longer(cols = c(Spend, Lor, Contact, Age), names_to = "Contact_Method", values_to = "Average_Value")

test_clus_tidy$Contact_Method <- factor(test_clus_tidy$Contact_Method, levels = c("Spend", "Lor", "Contact", "Age"))

test_clus_tidy

# A tibble: 12 × 3
   cluster Contact_Method Average_Value
   <chr>   <fct>                  <dbl>
 1 C1      Spend                  201. 
 2 C1      Lor                     74.5
 3 C1      Contact                 19.7
 4 C1      Age                     47.3
 5 C2      Spend                  439. 
 6 C2      Lor                    179. 
 7 C2      Contact                 22.0
 8 C2      Age                     56.5
 9 C3      Spend                  310. 
10 C3      Lor                    354. 
11 C3      Contact                 23.9
12 C3      Age                     59.5

train_clus_tidy <- train_clus_means %>%
  pivot_longer(cols = c(Spend, Lor, Contact, Age), names_to = "Contact_Method", values_to = "Average_Value")

train_clus_tidy$Contact_Method <- factor(test_clus_tidy$Contact_Method, levels = c("Spend", "Lor", "Contact", "Age"))

train_clus_tidy

# A tibble: 12 × 3
   cluster Contact_Method Average_Value
   <chr>   <fct>                  <dbl>
 1 C1      Spend                  182. 
 2 C1      Lor                     84.3
 3 C1      Contact                 20.1
 4 C1      Age                     47.2
 5 C2      Spend                  440. 
 6 C2      Lor                    123. 
 7 C2      Contact                 21.0
 8 C2      Age                     53.9
 9 C3      Spend                  403. 
10 C3      Lor                    325. 
11 C3      Contact                 19.4
12 C3      Age                     66.4

#Visualise the mean satisfaction score for each contact method by cluster.
ggplot(test_clus_tidy, mapping = aes(x = Contact_Method, y = Average_Value, group = cluster, colour = cluster)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  scale_colour_manual(values = c("#0e0f0f", "#8AB9F1", "#FFC40A")) +
  ylab("Mean Satisfaction Score") + 
  xlab("Contact Method") +
  ggtitle("Mean Satisfaction Score for each Contact Method by Cluster")

Question 2 Inmporting the energy drinks dataset

#Step 1: Import the data
Energy <- read_csv('energy_drinks.csv')

Rows: 840 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): ID, Gender, Age
dbl (5): D1, D2, D3, D4, D5

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(Energy)

Change gender to numerical value

Energy$Gender <- ifelse(Energy$Gender == "Female", 1, 0)

Compute distances between each pair of players

Energy_2 <- select(Energy, D1:D5)
Energy_2_scale <- scale(Energy_2)
d2 <- dist(Energy_2_scale)

Yes, data needs to be scaled before matrix of the distance can be measured or created.

Carry out the hierarchical clustering

h2 <- hclust(d2, method = "average")
plot(h2, hang = -1)

heatmap(as.matrix(d2), Rowv = as.dendrogram(h2), Colv = 'Rowv', labRow = F, labCol = F)

4a.) The dendogram and block layout in the heatmap shows the datasets have different clusters. However the hierarchical cluster and heatmeap show that consumers can be grouped according to their similiarity in certain groupings.

Creating a 3-cluster solution

clusters2 <- cutree(h2, k = 3)


sil2 <- silhouette(clusters2, d2)
summary(sil2)

Silhouette of 840 units in 3 clusters from silhouette.default(x = clusters2, dist = d2) :
 Cluster sizes and average silhouette widths:
      441       167       232 
0.1589812 0.2412339 0.3048378 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.2863  0.1262  0.2338  0.2156  0.3280  0.5065

#Step 6: Profile the clusters. 
Energy_clus <- cbind(Energy, clusters2)
Energy_clus <- mutate(Energy_clus, cluster = case_when(clusters2 == 1 ~ 'C1',
                                                 clusters2 == 2 ~ 'C2',
                                                 clusters2 == 3 ~ 'C3'))

Energy$Cluster <- clusters2

str(Energy)

spc_tbl_ [840 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ID     : chr [1:840] "ID_1" "ID_2" "ID_3" "ID_4" ...
 $ D1     : num [1:840] 2 4 2 1 1 2 1 1 2 5 ...
 $ D2     : num [1:840] 3 4 3 6 3 3 5 3 3 5 ...
 $ D3     : num [1:840] 7 5 8 5 7 8 6 7 6 6 ...
 $ D4     : num [1:840] 7 6 8 8 7 7 5 9 7 7 ...
 $ D5     : num [1:840] 7 9 5 6 7 5 5 7 5 7 ...
 $ Gender : num [1:840] 0 0 1 1 0 0 1 0 1 1 ...
 $ Age    : chr [1:840] "Under_25" "Under_25" "Under_25" "Under_25" ...
 $ Cluster: int [1:840] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   ID = col_character(),
  ..   D1 = col_double(),
  ..   D2 = col_double(),
  ..   D3 = col_double(),
  ..   D4 = col_double(),
  ..   D5 = col_double(),
  ..   Gender = col_character(),
  ..   Age = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

Energy$Gender <- factor(Energy$Gender, levels = c(0, 1), labels = c("Female", "Male"))

# Convert Age column to a factor (e.g., "Under_25", "25_34")
Energy$Age <- factor(Energy$Age, levels = c("Under_25", "25_34", "35_44", "45_54", "55_64", "65_Above"))

# Check the structure after conversion
str(Energy)

spc_tbl_ [840 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ID     : chr [1:840] "ID_1" "ID_2" "ID_3" "ID_4" ...
 $ D1     : num [1:840] 2 4 2 1 1 2 1 1 2 5 ...
 $ D2     : num [1:840] 3 4 3 6 3 3 5 3 3 5 ...
 $ D3     : num [1:840] 7 5 8 5 7 8 6 7 6 6 ...
 $ D4     : num [1:840] 7 6 8 8 7 7 5 9 7 7 ...
 $ D5     : num [1:840] 7 9 5 6 7 5 5 7 5 7 ...
 $ Gender : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 1 2 1 2 2 ...
 $ Age    : Factor w/ 6 levels "Under_25","25_34",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Cluster: int [1:840] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   ID = col_character(),
  ..   D1 = col_double(),
  ..   D2 = col_double(),
  ..   D3 = col_double(),
  ..   D4 = col_double(),
  ..   D5 = col_double(),
  ..   Gender = col_character(),
  ..   Age = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

# Profiling the clusters by Age distribution
age_profile <- Energy %>%
  group_by(Cluster, Age) %>%
  summarise(count = n()) %>%
  group_by(Cluster) %>%
  mutate(percentage = count / sum(count) * 100)

`summarise()` has grouped output by 'Cluster'. You can override using the
`.groups` argument.

# View the age profile by cluster
kable(age_profile)

Cluster	Age	count	percentage
1	Under_25	87	19.72789
1	25_34	192	43.53742
1	NA	162	36.73469
2	Under_25	28	16.76647
2	25_34	58	34.73054
2	NA	81	48.50299
3	Under_25	41	17.67241
3	25_34	88	37.93103
3	NA	103	44.39655

Plotting age distribution by cluster

ggplot(age_profile, aes(x = factor(Cluster), y = percentage, fill = Age, width = 0.5)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Age Distribution by Cluster", x = "Cluster", y = "Percentage") +
  coord_flip()

  theme_minimal()

List of 136
 $ line                            :List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ lineend      : chr "butt"
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ rect                            :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ text                            :List of 11
  ..$ family       : chr ""
  ..$ face         : chr "plain"
  ..$ colour       : chr "black"
  ..$ size         : num 11
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : num 0
  ..$ lineheight   : num 0.9
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ title                           : NULL
 $ aspect.ratio                    : NULL
 $ axis.title                      : NULL
 $ axis.title.x                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.top                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.bottom             : NULL
 $ axis.title.y                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.y.left               : NULL
 $ axis.title.y.right              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text                       :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey30"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.top                 :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.bottom              : NULL
 $ axis.text.y                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.y.left                : NULL
 $ axis.text.y.right               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.theta                 : NULL
 $ axis.text.r                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0.5
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.ticks                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.ticks.x                    : NULL
 $ axis.ticks.x.top                : NULL
 $ axis.ticks.x.bottom             : NULL
 $ axis.ticks.y                    : NULL
 $ axis.ticks.y.left               : NULL
 $ axis.ticks.y.right              : NULL
 $ axis.ticks.theta                : NULL
 $ axis.ticks.r                    : NULL
 $ axis.minor.ticks.x.top          : NULL
 $ axis.minor.ticks.x.bottom       : NULL
 $ axis.minor.ticks.y.left         : NULL
 $ axis.minor.ticks.y.right        : NULL
 $ axis.minor.ticks.theta          : NULL
 $ axis.minor.ticks.r              : NULL
 $ axis.ticks.length               : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ axis.ticks.length.x             : NULL
 $ axis.ticks.length.x.top         : NULL
 $ axis.ticks.length.x.bottom      : NULL
 $ axis.ticks.length.y             : NULL
 $ axis.ticks.length.y.left        : NULL
 $ axis.ticks.length.y.right       : NULL
 $ axis.ticks.length.theta         : NULL
 $ axis.ticks.length.r             : NULL
 $ axis.minor.ticks.length         : 'rel' num 0.75
 $ axis.minor.ticks.length.x       : NULL
 $ axis.minor.ticks.length.x.top   : NULL
 $ axis.minor.ticks.length.x.bottom: NULL
 $ axis.minor.ticks.length.y       : NULL
 $ axis.minor.ticks.length.y.left  : NULL
 $ axis.minor.ticks.length.y.right : NULL
 $ axis.minor.ticks.length.theta   : NULL
 $ axis.minor.ticks.length.r       : NULL
 $ axis.line                       : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.line.x                     : NULL
 $ axis.line.x.top                 : NULL
 $ axis.line.x.bottom              : NULL
 $ axis.line.y                     : NULL
 $ axis.line.y.left                : NULL
 $ axis.line.y.right               : NULL
 $ axis.line.theta                 : NULL
 $ axis.line.r                     : NULL
 $ legend.background               : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.margin                   : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.spacing                  : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ legend.spacing.x                : NULL
 $ legend.spacing.y                : NULL
 $ legend.key                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.key.size                 : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height               : NULL
 $ legend.key.width                : NULL
 $ legend.key.spacing              : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.key.spacing.x            : NULL
 $ legend.key.spacing.y            : NULL
 $ legend.frame                    : NULL
 $ legend.ticks                    : NULL
 $ legend.ticks.length             : 'rel' num 0.2
 $ legend.axis.line                : NULL
 $ legend.text                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.text.position            : NULL
 $ legend.title                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.title.position           : NULL
 $ legend.position                 : chr "right"
 $ legend.position.inside          : NULL
 $ legend.direction                : NULL
 $ legend.byrow                    : NULL
 $ legend.justification            : chr "center"
 $ legend.justification.top        : NULL
 $ legend.justification.bottom     : NULL
 $ legend.justification.left       : NULL
 $ legend.justification.right      : NULL
 $ legend.justification.inside     : NULL
 $ legend.location                 : NULL
 $ legend.box                      : NULL
 $ legend.box.just                 : NULL
 $ legend.box.margin               : 'margin' num [1:4] 0cm 0cm 0cm 0cm
  ..- attr(*, "unit")= int 1
 $ legend.box.background           : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.box.spacing              : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
  [list output truncated]
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi TRUE
 - attr(*, "validate")= logi TRUE

Clusters by Gender distribution

gender_profile <- Energy %>%
  group_by(Cluster, Gender) %>%
  summarise(count = n()) %>%
  group_by(Cluster) %>%
  mutate(percentage = count / sum(count) * 100)

`summarise()` has grouped output by 'Cluster'. You can override using the
`.groups` argument.

# View the gender profile by cluster
kable(gender_profile)

Cluster	Gender	count	percentage
1	Female	279	63.26531
1	Male	162	36.73469
2	Female	86	51.49701
2	Male	81	48.50299
3	Female	136	58.62069
3	Male	96	41.37931

In the cluster represented above, we see that cluster 1 has a higher percentage of females streaming than male, in cluster 2, there is a balance in the gender per percentage points and in cluster 3, there is not a wide gap in the gender because even though the females have a higher streaming percentage, there is not much difference with the males.

ggplot(gender_profile, aes(x = factor(Cluster), y = percentage, fill = Gender, width = 0.5)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Gender Distribution by Cluster", x = "Cluster", y = "Percentage") +
  coord_flip()

  theme_minimal()

List of 136
 $ line                            :List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ lineend      : chr "butt"
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ rect                            :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ text                            :List of 11
  ..$ family       : chr ""
  ..$ face         : chr "plain"
  ..$ colour       : chr "black"
  ..$ size         : num 11
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : num 0
  ..$ lineheight   : num 0.9
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ title                           : NULL
 $ aspect.ratio                    : NULL
 $ axis.title                      : NULL
 $ axis.title.x                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.top                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.bottom             : NULL
 $ axis.title.y                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.y.left               : NULL
 $ axis.title.y.right              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text                       :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey30"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.top                 :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.bottom              : NULL
 $ axis.text.y                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.y.left                : NULL
 $ axis.text.y.right               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.theta                 : NULL
 $ axis.text.r                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0.5
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.ticks                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.ticks.x                    : NULL
 $ axis.ticks.x.top                : NULL
 $ axis.ticks.x.bottom             : NULL
 $ axis.ticks.y                    : NULL
 $ axis.ticks.y.left               : NULL
 $ axis.ticks.y.right              : NULL
 $ axis.ticks.theta                : NULL
 $ axis.ticks.r                    : NULL
 $ axis.minor.ticks.x.top          : NULL
 $ axis.minor.ticks.x.bottom       : NULL
 $ axis.minor.ticks.y.left         : NULL
 $ axis.minor.ticks.y.right        : NULL
 $ axis.minor.ticks.theta          : NULL
 $ axis.minor.ticks.r              : NULL
 $ axis.ticks.length               : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ axis.ticks.length.x             : NULL
 $ axis.ticks.length.x.top         : NULL
 $ axis.ticks.length.x.bottom      : NULL
 $ axis.ticks.length.y             : NULL
 $ axis.ticks.length.y.left        : NULL
 $ axis.ticks.length.y.right       : NULL
 $ axis.ticks.length.theta         : NULL
 $ axis.ticks.length.r             : NULL
 $ axis.minor.ticks.length         : 'rel' num 0.75
 $ axis.minor.ticks.length.x       : NULL
 $ axis.minor.ticks.length.x.top   : NULL
 $ axis.minor.ticks.length.x.bottom: NULL
 $ axis.minor.ticks.length.y       : NULL
 $ axis.minor.ticks.length.y.left  : NULL
 $ axis.minor.ticks.length.y.right : NULL
 $ axis.minor.ticks.length.theta   : NULL
 $ axis.minor.ticks.length.r       : NULL
 $ axis.line                       : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.line.x                     : NULL
 $ axis.line.x.top                 : NULL
 $ axis.line.x.bottom              : NULL
 $ axis.line.y                     : NULL
 $ axis.line.y.left                : NULL
 $ axis.line.y.right               : NULL
 $ axis.line.theta                 : NULL
 $ axis.line.r                     : NULL
 $ legend.background               : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.margin                   : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.spacing                  : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ legend.spacing.x                : NULL
 $ legend.spacing.y                : NULL
 $ legend.key                      : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.key.size                 : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height               : NULL
 $ legend.key.width                : NULL
 $ legend.key.spacing              : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.key.spacing.x            : NULL
 $ legend.key.spacing.y            : NULL
 $ legend.frame                    : NULL
 $ legend.ticks                    : NULL
 $ legend.ticks.length             : 'rel' num 0.2
 $ legend.axis.line                : NULL
 $ legend.text                     :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.text.position            : NULL
 $ legend.title                    :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.title.position           : NULL
 $ legend.position                 : chr "right"
 $ legend.position.inside          : NULL
 $ legend.direction                : NULL
 $ legend.byrow                    : NULL
 $ legend.justification            : chr "center"
 $ legend.justification.top        : NULL
 $ legend.justification.bottom     : NULL
 $ legend.justification.left       : NULL
 $ legend.justification.right      : NULL
 $ legend.justification.inside     : NULL
 $ legend.location                 : NULL
 $ legend.box                      : NULL
 $ legend.box.just                 : NULL
 $ legend.box.margin               : 'margin' num [1:4] 0cm 0cm 0cm 0cm
  ..- attr(*, "unit")= int 1
 $ legend.box.background           : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.box.spacing              : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
  [list output truncated]
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi TRUE
 - attr(*, "validate")= logi TRUE

Recommendations I would recommend that for the cluster 1 and 2, targeted promotions are run as there seems to be an existing balance between the gender percentage in cluster 3.

cluster_profiles <- Energy %>%
  group_by(Cluster) %>%
  summarise(
    avg_rating_D1 = mean(D1, na.rm = TRUE),
    avg_rating_D2 = mean(D2, na.rm = TRUE),
    avg_rating_D3 = mean(D3, na.rm = TRUE),
    avg_rating_D4 = mean(D4, na.rm = TRUE),
    avg_rating_D5 = mean(D5, na.rm = TRUE)
  )

# View the cluster profiling table
kable(cluster_profiles)

Cluster	avg_rating_D1	avg_rating_D2	avg_rating_D3	avg_rating_D4	avg_rating_D5
1	2.945578	4.811791	6.274376	6.646259	6.603175
2	2.508982	4.610778	7.323353	5.089820	2.718563
3	6.642241	5.081897	3.439655	3.159483	2.956897

The recommendation from this visualisation is that the company works on the version D1, D4, and D5 because the ratings show that the customers are not exactly satisfied.

cluster_profiles_long <- cluster_profiles %>%
  pivot_longer(cols = starts_with("avg_rating"),
               names_to = "Version", 
               values_to = "Avg_Rating")

# Plot the bar chart with best practices
ggplot(cluster_profiles_long, aes(x = factor(Cluster), y = Avg_Rating, fill = Version)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7, alpha = 0.8) +  # Adjust position and bar width
  scale_fill_manual(values = c("grey", "brown", "red", "black", "yellow")) +  # Custom colors
  labs(title = "Average Ratings by Cluster and Energy Drink Version",
       x = "Cluster",
       y = "Average Rating") +
  scale_y_continuous(limits = c(0, 10), expand = c(0, 0)) +  # Adjust y-axis limits and remove padding
  theme_minimal() + 
  theme(
    legend.title = element_blank(),  # Remove legend title
    legend.position = "top",         # Place the legend at the top
    axis.text.x = element_text(angle = 45, hjust = 1),  # Rotate x-axis labels for readability
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  # Center and style title
    axis.title = element_text(size = 12),  # Style axis titles
    panel.grid.major = element_line(color = "gray80", size = 0.5),  # Lighter grid lines for better readability
    panel.grid.minor = element_blank()  # Remove minor grid lines
  ) +
  facet_wrap(~ Version, scales = "free_y")  # Separate plots for each version with free y scales

My recommendation is that discounts and deals could be offered to D3, D4, and D5. Also, I advise that that marketing strategy should be different for the different groups and the targeted messages should reflect their dissatisfaction. Diversity in the products offering could also boost brand image.