This report discusses two distinct data analysis tasks, one involving consumer insights and the other using data analytics. The initial task involves creating a classification model to determine if customers of a Portuguese bank would want to invest in a term deposit marketing campaign based on some relevant features of theirs. Using the data stemming from the marketing campaign, we will develop and present a classification tree model to help the bank better reach potential subscribers. The focus of the second task is K-means clustering, which examines consumers’ attitudes and behaviors surrounding recycling, in conjunction with segmentation. The goal here is to appreciate whether Carlow County Council can identify clusters in the responses from residents of Carlow about their behavioral pattern towards recycling, so as to better shape the recycling campaigns including the visions from more segments of people. Both tasks serve to illustrate the use of predictive modeling alongside consumer segmentation in marketing and other activities enhancing consumer awareness of the issues at hand.
Running Code
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.2
Warning: package 'ggplot2' was built under R version 4.4.2
Warning: package 'tibble' was built under R version 4.4.2
Warning: package 'tidyr' was built under R version 4.4.2
Warning: package 'readr' was built under R version 4.4.2
Warning: package 'purrr' was built under R version 4.4.2
Warning: package 'dplyr' was built under R version 4.4.2
Warning: package 'stringr' was built under R version 4.4.2
Warning: package 'forcats' was built under R version 4.4.2
Warning: package 'lubridate' was built under R version 4.4.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
From these dendograms, the model is not overfitting but the nodes are impure.
clusters1 <-cutree(h1, k =3)clusters2 <-cutree(h2, k =3)
sil1 <-silhouette(clusters1, d1)summary(sil1)
Silhouette of 1334 units in 3 clusters from silhouette.default(x = clusters1, dist = d1) :
Cluster sizes and average silhouette widths:
1254 76 4
0.7433888 0.5597983 0.7735888
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.4641 0.7555 0.8301 0.7330 0.8499 0.8579
sil2 <-silhouette(clusters2, d2)summary(sil2)
Silhouette of 1334 units in 3 clusters from silhouette.default(x = clusters2, dist = d2) :
Cluster sizes and average silhouette widths:
1274 56 4
0.7476409 0.5772877 0.7516790
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.5404 0.7751 0.8381 0.7405 0.8569 0.8629
Warning: There were 9 warnings in `summarise()`.
The first warning was:
ℹ In argument: `Loan = mean(loan)`.
ℹ In group 1: `cluster = "C1"`.
Caused by warning in `mean.default()`:
! argument is not numeric or logical: returning NA
ℹ Run `dplyr::last_dplyr_warnings()` to see the 8 remaining warnings.
comm_clus_means
# A tibble: 3 × 5
cluster Loan Subscribed Poutcome Duration
<chr> <dbl> <dbl> <dbl> <dbl>
1 C1 NA NA NA 211.
2 C2 NA NA NA 963.
3 C3 NA NA NA 1779
Warning: There were 9 warnings in `summarise()`.
The first warning was:
ℹ In argument: `Loan = mean(loan)`.
ℹ In group 1: `cluster = "C1"`.
Caused by warning in `mean.default()`:
! argument is not numeric or logical: returning NA
ℹ Run `dplyr::last_dplyr_warnings()` to see the 8 remaining warnings.
dom_clus_means
# A tibble: 3 × 5
cluster Loan Subscribed Poutcome Duration
<chr> <dbl> <dbl> <dbl> <dbl>
1 C1 NA NA NA 211.
2 C2 NA NA NA 963.
3 C3 NA NA NA 1779
Rows: 366 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): age
dbl (7): id, pos_impact, environ, money, bins, local, avoid_waste
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: There were 4 warnings in `summarise()`.
The first warning was:
ℹ In argument: `Age = mean(age)`.
ℹ In group 1: `cluster = "C1"`.
Caused by warning in `mean.default()`:
! argument is not numeric or logical: returning NA
ℹ Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.
tab_clus_means
# A tibble: 4 × 8
cluster Age Pos_Impact Environ Money Bins Local Avoid_Waste
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 C1 NA 3.86 3.63 3.97 3.18 3.56 1.34
2 C2 NA 3.91 3.67 3.95 2.83 3.49 3.18
3 C3 NA 3.43 3.36 3.72 2.54 2.08 2.36
4 C4 NA 1.48 1.15 1.19 1.07 2.48 2.81
# A tibble: 24 × 4
cluster Age Factors Average_Value
<chr> <dbl> <chr> <dbl>
1 C1 NA Pos_Impact 3.86
2 C1 NA Environ 3.63
3 C1 NA Money 3.97
4 C1 NA Bins 3.18
5 C1 NA Local 3.56
6 C1 NA Avoid_Waste 1.34
7 C2 NA Pos_Impact 3.91
8 C2 NA Environ 3.67
9 C2 NA Money 3.95
10 C2 NA Bins 2.83
# ℹ 14 more rows
#Visualise the mean satisfaction score for each contact method by cluster.
ggplot(tab_clus_tidy, mapping =aes(x = Factors, y = Average_Value, group = cluster, colour = cluster)) +geom_line(linewidth =1) +geom_point(size =2) +ylab("Mean Score") +xlab("Non-Financial Factors") +theme(axis.text.x =element_text(angle =45, hjust =1)) +#Rotates the angle of the labels and moves them down so they don't overlap the bottom of the graphggtitle("Average Rating of Non-Financial Variables by Cluster")
#Cluster Analysis and Strategic Recommendations for Carlow County Council:
The evaluation of the four clusters leads us to the conclusion that each cluster contains subjects with distinctly different views and approaches to recycling. It is noted that Cluster 1 encounters and agrees with most of the statements made about recycling such as ‘this creates a positive impact on the environment’ and ‘it is cost-effective’ therefore all of them are relatively knowledgeable about the entire process and do it actively. This group can be called “Engaged Recyclers”.
On the other hand, Cluster 2 feels more skeptical about their recycling contribution and also feels less knowledgeable regarding the details of recycling their plastic materials such as what what bins to use. Yet, this group shares the belief that recycling is indispensable to the solution of environmental problems. This group can be categorized as the “Environmentally Conscious, But Uncertain” cluster.
Cluster 3 describes people who are challenged with recycling and at times find ways to avoid waste but packaging seems difficult – they are not too active in recycling, however, they get its importance for the environment. For this group the term “Reluctant Recyclers” seems to fit well. Cluster 4, however, reported the least agreeable responses in particular the expected benefits of recycling itself and the reduced cost benefits. They are more inclined to complain that’s it’s hard not to waste and find it hard to get the thought of recycling. This group can be referred to as “Recycling Skeptics”. Strategies for Carlow County Council: Engaged Recyclers (Cluster 1). Strategy: Carlow County Council should this group actively engage by routinely informing them of the good benefits achieved as a result of their recycling efforts. They can be invited to join ecological leadership projects or act as ambassadors of the council’s campaigns. It might even encourage them to talk about their sharing to promote other people.
Conflicted Yet Environmentally Aware (Cluster 2):
Approach: An implementation of educational campaigns has also emerged as a stronger need among this group in terms of what can be recycled and how recycling and waste separation can be done. Tactics such as group sessions, laminated cards, or mobile applications that inform them about what to do with garbage would help boost their confidence. In addition, trying to demonstrate how doing it right helps communities in everyday life could be beneficial.
Cluster 3: The Defeated Recyclers
Approach: For people in this group, solutions go beyond just the infrastructure that supports recycling which includes the use of waste separation facilities. Providing easier to change solutions, such as more clearly stated instructions on recycling and what can be single use can assist to lessen confusion. Explaining that recycling does not require any special efforts and that there are regular neighborhood collections of recyclable materials may lead to greater participation.
Reserve Recyclers (Cluster 4):
Approach: The foremost recommendation for landfill diversion is to reduce the number of ‘Recycling Skeptics’ prominently. Reinventing the understanding of recycling specifically for Carlow County Council members to advocate the idea of long term better impact on the environment and cost/revenue aspects when the recycling is being performed. Custom messages that emphasize the aspects of their community and daily life might be more effective in reducing the number of “recycling skeptics.” In addition, seeing the shift in attitude toward the idea of recycling along discussing misconceptions with clarity and frequency may assist.
Having introduced such focused policies, Carlow County Council will be in a position to satisfy the needs and worries of every single person in the community and as a result enhance the level of engagement in recycling initiatives throughout the society.