Improving the customer journey and providing a positive customer experience (CX) was ranked as the number one trend, as well as top strategic priority, in the survey of global banking leaders for the 2017 Retail Banking Trends and Predictions report. “The findings show that most organizations are not prepared for the future of increased consumer expectations”. Consequently, this articles aims at focusing on customer experience in retail banking in UK as banking definetly needs a Customer Experience Wake-Up Call. To do so, we unlock the power of data to drive innovation and empower customers.

Ron Shevlin from Cornerstone Advisors probably summed it up best as an intro to the report, “Too much of the discussion around the ‘customer experience’ reflects a desire to simplify the complex, and find the silver bullet that fixes business problems and engenders customer loyalty. That’s too bad, because organizations that take a data-driven, process-oriented approach to improving customer experiences can achieve competitive advantages.”

Customers no longer view their experiences within industry silos, but instead, compare their experience to leading firms such as Google, Amazon, Uber and Apple. Consumers want organizations to simplify engagement and make their lives easier. Companies who fail to embrace CX as strategic path to growth won’t just be lagging, they’ll get left behind.

1 “Who Is the Fairest of Them All?”

“Mirror Mirror on the Wall, Who Is the Fairest of Them All?” Obviously, that is the big question many are asking in the field of banking The first thing that is crossing any business guys’ mind is strongly related to “market share”. For old time’s sake, having a strong market share paved the way to a thriving business. But welcome in the 21st century in which high brand loyalty and/or high brand awareness becomes more important than ever (market share vs. mindshare)

Mindshare carries increasingly more and more weight than Market Share. Shortly, mindshare accounts for consumer awareness or popularity regarding to a given product. Therefore, how to improve the mindshare of a brand? How to foster the amount of talk, mention, or reference of a given firm when it comes to a given product? Improving mindshare boils down to maximizing the customer experience (CX)? Indeed, which company would not like to step into their customers’ minds and give them the best customized brand experience?

2 The Trojan War Will Take Place

For the purpose of the article, let’s consider three major retail banks in the United Kingdom to analyze, compare and draw some conclusions when it comes to different customer journeys based on thousands/milions of opinions - experiences - coming from web. The underlying idea is to collect a lot of data available on web to bring to light frustrations and experiences of many customers - both positive and negative - in order to re-map a customer experience (CX). Data would blow the lid off the online opinions through the customer’s eyes.

In the framework of this paper, we chose three main retail players in the landscape of retail banking industy in UK: HSBC, Barclays and Santander.

2.1 Strategy :

  • Scraping comments - via Rvest Package or BeautifulSoup in Python - left by customers from Trustpilot

  • Comparing different data collected from different banks (HSBC vs. Santander vs. Barclays) and from different industries according to frequencies of words. Then let’s bring to light how similar and different sets of word frequencies are - using a correlation test

  • Undertaking a sentiment analysis to highlight emotional intent of words to infer whether a section of comments is mostly positive or negative [wordclouds]

  • Analyzing word and document frequency [Term frequency-Inverse Document frequency (tf-idf)] to reflect how important (or not) a word is to a document or a corpus (i.e a collection of documents). The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.

  • Having a look to relationships between words: n-grams and correlations to know which words tend to follow others immediately, or that tend to co-occur within the same documents. Very useful to provide context in sentiment analysis and solve “I’m not happy”

  • Visualizing how words changed in frequency over time [Regression]

  • Discovering topic modeling via unsepervised approach thanks to Latent Dirichlet allocation (LDA) and then finding terms that are most common within each topic [k-means] : Document-topic probabilities + Word-topic probabilities

  • Exploring Word2Vec when it comes to giving linguistic context to a word. Indeed, by producing word embeddings vectors, words that share common contexts in the corpus are located in close proximity to one another in the space. Golve to obtaine vector representations for words.

3 Data collection via Web Scraping

This part aims at collecting enough data to understand, analyze and compare customer experiences from different banks in order to re-map CX according to what it works and what it fails. To carry out this project, we have scrapped Trustpilot which is a famours online review community containing milion of relevant information when it comes to customers opinions.

3.1 Data scrapping

First step involves in pulling as many comments as possible straight out of Trustpilot when it comes a given retail bank. In the specific case of Santander, we harvest hundreds of comments spread over dozens of pages.

Now you grasp the main point, let’s repeat the experiment for all the other retail banks under consideration. To do so, we use both href tag displayed by html pages to get all comments related to a given overview page as well as loop to repeat wherever necessary for the purpose of carrying out the data collection.

3.2 How close customers complaints and concerns are within the same industry?

Thousands lines of comments were collected and put in a data frame, and then turned into a table with one-token-per-row.

freq_fac<-readRDS("~/Desktop/customer_experiences/data_transformation/freq_fac.rds")

ggplot(freq_fac, aes(x=word, y=n, fill=bank))+
  geom_bar(stat="identity", color="black")+
  scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
  theme_minimal()

What is really at stake concerns the most customer experiences are related to :

  • Regardless banks, customers complain about issues regarding their account or their card. Customer is in heart of all complaints [customer-centric approach]

  • Specifically speaking, HSBC encounters some issues regarding their branchs - which is a retail location where they offer a wide array of face-to-face and automated services to their customers, their call centers or services they offer. For the time being, we can guess that they are pretty not efficient. All requests happening in HSBC are time-consuming.

frequency_banks<-readRDS("~/Desktop/customer_experiences/data_transformation/frequency_banks.rds")

ggplot(frequency_banks, aes(x = proportion, y = `santander`, color = abs(`santander` - proportion))) +
  geom_abline(color = "gray40", lty = 2) +
  geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3) +
  geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
  scale_x_log10(labels = percent_format()) +
  scale_y_log10(labels = percent_format()) +
  scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") +
  facet_wrap(~bank, ncol = 2) +
  theme(legend.position="none") +
  labs(y = "santander", x = NULL)
## Warning: Removed 2 rows containing missing values (geom_text).

When words are close to the line, a similarity conclusion can be drawn between two sets of texts. That is to say, words used by customers from two different banks are the same. They encounter the same issues whereas they are clients in two banks which compete each others.

To measure how similar or different these three sets of word frequencies are, let’s use a correlation coefficient. First, let’s consider how correlated are the concerns of HSBC and Santander and then let’s do same between Barclays and HSBC.

cor.test(data = frequency_banks[frequency_banks$bank == "hsbc",],
         ~ proportion + `santander`)
## 
##  Pearson's product-moment correlation
## 
## data:  proportion and santander
## t = Inf, df = 10280, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  1 1
## sample estimates:
## cor 
##   1
cor.test(data = frequency_banks[frequency_banks$bank == "barclays",],
         ~ proportion + `santander`)
## 
##  Pearson's product-moment correlation
## 
## data:  proportion and santander
## t = Inf, df = 10280, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  1 1
## sample estimates:
## cor 
##   1

As shown in the previous graph and then expected, the issues undergone by two customers from two different banks are expressed in the same way; with the same words. Satistically speaking, words used by two clients from two different banks are fully correlated; words used to bring to light a problem are the same respectless the banks under consideration. But theses issues are the same in other industries, aren’t they?

3.3 How far away customers complaints and concerns are amongst different industries?

frequency_others<-readRDS("~/Desktop/customer_experiences/data_transformation/frequency_others.rds")

ggplot(frequency_others, aes(x = proportion, y = `santander`, color = abs(`santander` - proportion))) +
  geom_abline(color = "gray40", lty = 2) +
  geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3) +
  geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
  scale_x_log10(labels = percent_format()) +
  scale_y_log10(labels = percent_format()) +
  scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") +
  facet_wrap(~bank, ncol = 2) +
  theme(legend.position="none") +
  labs(y = "santander", x = NULL)
## Warning: Removed 2 rows containing missing values (geom_text).

4 Sentiment Analysis

bing_word_counts<-readRDS("~/Desktop/customer_experiences/data_transformation/bing_word_counts.rds")

bing_word_counts %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",
       x = NULL) +
  coord_flip()

data_santander_sentiments<-readRDS("~/Desktop/customer_experiences/data_transformation/data_santander_sentiments.RDS")

ggplot(data_santander_sentiments, aes(x = group, y = value ,fill = group)) +
  geom_bar(width = 0.85, stat="identity") +    
  coord_polar(theta = "y") +    
  xlab("") + ylab("") +
  ylim(c(0,75)) + 
  geom_text(data = data_santander_sentiments, hjust = 1, size = 3, aes(x = group, y = 0, label = group)) +
  theme(legend.position = "none" , axis.text.y = element_blank() , axis.ticks = element_blank()) + theme_bw()

data_barclays_sentiments<-readRDS("~/Desktop/customer_experiences/data_transformation/data_barclays_sentiments.RDS")

ggplot(data_barclays_sentiments, aes(x = group, y = value ,fill = group)) +
  geom_bar(width = 0.85, stat="identity") +    
  coord_polar(theta = "y") +    
  xlab("") + ylab("") +
  ylim(c(0,75)) + 
  geom_text(data = data_barclays_sentiments, hjust = 1, size = 3, aes(x = group, y = 0, label = group)) +
  theme(legend.position = "none" , axis.text.y = element_blank() , axis.ticks = element_blank()) + theme_bw()

data_hsbc_sentiments<-readRDS("~/Desktop/customer_experiences/data_transformation/data_hsbc_sentiments.RDS")

ggplot(data_hsbc_sentiments, aes(x = group, y = value ,fill = group)) +
  geom_bar(width = 0.85, stat="identity") +    
  coord_polar(theta = "y") +    
  xlab("") + ylab("") +
  ylim(c(0,75)) + 
  geom_text(data = data_hsbc_sentiments, hjust = 1, size = 3, aes(x = group, y = 0, label = group)) +
  theme(legend.position = "none" , axis.text.y = element_blank() , axis.ticks = element_blank()) + theme_bw()

bing_word_counts %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"))
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): secure
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): recommend
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): support
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): rude could
## not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): complaints
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): useless
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): friendly
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): terrible
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): refund
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): premier
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): issues
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): unable
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): worse could
## not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): excellent
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): promised
## could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80")): refused
## could not be fit on page. It will not be plotted.

5 Analyzing words & Documents frequency

5.1 TF-IDF

How to quantify what a document is about. Can we do this by looking at the words that make up the document? One measure of how important a word may be is its term frequency (tf), how frequently a word occurs in a document.

bank_words<-readRDS("~/Desktop/customer_experiences/data_transformation/bank_words.rds")

ggplot(bank_words, aes(n/total, fill = bank)) +
  geom_histogram(show.legend = FALSE) +
  xlim(NA, 0.0009) +
  facet_wrap(~bank, ncol = 2, scales = "free_y")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 514 rows containing non-finite values (stat_bin).

5.2 Zip-Laws

freq_by_rank<-readRDS("~/Desktop/customer_experiences/data_transformation/freq_by_rank.rds")

freq_by_rank %>% 
  ggplot(aes(rank, `term frequency`, color = bank)) + 
  geom_line(size = 1.1, alpha = 0.8, show.legend = FALSE) + 
  scale_x_log10() +
  scale_y_log10()

6 N-grams and relationships between words

bigram_graph<-readRDS("~/Desktop/customer_experiences/data_transformation/bigram_graph.rds")

bigram_graph <- bigram_graph %>%
  graph_from_data_frame()

ggraph(bigram_graph, layout = "fr") +
  geom_edge_link() +
  geom_node_point() +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1)

a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

ggraph(bigram_graph, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "lightblue", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

7 Topic Modeling

7.1 t-distributed stochastic neighbor embedding

all_comments<-readRDS("~/Desktop/customer_experiences/all_comments.rds")
all_comments$id<-rep(paste0("comments_",c(1:NROW(all_comments))))
colnames(all_comments)<-c("review","bank","id")
data_vectorization<-all_comments[sample(1:NROW(all_comments),100),]

7.1.1 Vectorization to transform comments into a vector space model

prep_fun = tolower
tok_fun = word_tokenizer

it_data_vectorization = itoken(data_vectorization$review, 
                  preprocessor = prep_fun, 
                  tokenizer = tok_fun, 
                  ids = data_vectorization$id, 
                  progressbar = FALSE)
vocab = create_vocabulary(it_data_vectorization)

data_vectorization_tokens = data_vectorization$review %>% 
  prep_fun %>% 
  tok_fun
it_data_vectorization = itoken(data_vectorization_tokens, 
                  ids = data_vectorization$id,
                  progressbar = FALSE)

vocab = create_vocabulary(it_data_vectorization)
vocab
## Number of docs: 100 
## 0 stopwords:  ... 
## ngram_min = 1; ngram_max = 1 
## Vocabulary: 
##            term term_count doc_count
##    1:   attempt          1         1
##    2:     pious          1         1
##    3: empathise          1         1
##    4:  amything          1         1
##    5:        tv          1         1
##   ---                               
## 1950:         a        253        83
## 1951:       and        257        79
## 1952:       the        373        80
## 1953:         i        421        78
## 1954:        to        461        87
vectorizer = vocab_vectorizer(vocab)
dtm_data_vectorization = create_dtm(it_data_vectorization, vectorizer)

dim(dtm_data_vectorization)
## [1]  100 1954

7.1.2 t-Distributed Stochastic Neighbor Embedding (t-SNE)

t-SNE is a technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. The technique can be implemented via Barnes-Hut approximations. The underlying idea is to have a lower dimension space in keeping with the dtm_data_vectorization matrix dimensions.

Labels<-data_vectorization$bank
data_vectorization$bank<-as.factor(data_vectorization$bank)

colors = rainbow(length(unique(data_vectorization$bank)))
names(colors) = unique(data_vectorization$bank)

PERPLEXITY : is a value which meffectively controls how many nearest neighbours are taken into account when constructing the embedding in the low-dimensional space. For the lowdimensional space we use the Cauchy distribution (t-distribution with one degree of freedom) as the distribution of the distances to neighbouring objects

tsne <- Rtsne(as.matrix(dtm_data_vectorization), dims = 2, perplexity=5, verbose=TRUE, max_iter = 500)
## Read the 100 x 50 data matrix successfully!
## Using no_dims = 2, perplexity = 5.000000, and theta = 0.500000
## Computing input similarities...
## Normalizing input...
## Building tree...
##  - point 0 of 100
## Done in 0.00 seconds (sparsity = 0.266800)!
## Learning embedding...
## Iteration 50: error is 77.509167 (50 iterations in 0.03 seconds)
## Iteration 100: error is 76.303222 (50 iterations in 0.04 seconds)
## Iteration 150: error is 76.707251 (50 iterations in 0.03 seconds)
## Iteration 200: error is 72.380825 (50 iterations in 0.03 seconds)
## Iteration 250: error is 75.167491 (50 iterations in 0.03 seconds)
## Iteration 300: error is 2.623339 (50 iterations in 0.02 seconds)
## Iteration 350: error is 2.220866 (50 iterations in 0.02 seconds)
## Iteration 400: error is 1.820352 (50 iterations in 0.02 seconds)
## Iteration 450: error is 1.526139 (50 iterations in 0.02 seconds)
## Iteration 500: error is 1.300042 (50 iterations in 0.02 seconds)
## Fitting performed in 0.27 seconds.
data_ploting_tsne<-as.data.frame(tsne$Y)
ggplot(data_ploting_tsne, aes(V1, V2, label = rownames(dtm_data_vectorization)))+
  geom_point() + geom_text() + theme_bw()

ggplot(data_ploting_tsne, aes(V1, V2)) +
  geom_point(color = 'red') +
  theme_classic(base_size = 10) + 
  geom_label_repel(aes(label = data_vectorization$bank,
                    fill = colors[data_vectorization$bank]), color = 'white',
                    size = 3.5) +
   theme(legend.position = "bottom")

7.1.3 K-means to identify groups of similar objects in a multivariate data sets

7.1.3.1 Distance measure

res.dist <- get_dist(data_ploting_tsne, stand = TRUE, method = "pearson")
fviz_dist(res.dist, 
   gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

7.1.3.2 Clustering validation : Does data contain any inherent grouping structure?

res.nbclust <- data_ploting_tsne %>%
  scale() %>%
  NbClust(distance = "euclidean",
          min.nc = 2, max.nc = 10, 
          method = "complete", index ="all") 

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 8 proposed 2 as the best number of clusters 
## * 3 proposed 3 as the best number of clusters 
## * 1 proposed 4 as the best number of clusters 
## * 6 proposed 6 as the best number of clusters 
## * 1 proposed 7 as the best number of clusters 
## * 1 proposed 9 as the best number of clusters 
## * 3 proposed 10 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  2 
##  
##  
## *******************************************************************
fviz_nbclust(res.nbclust, ggtheme = theme_minimal())
## Among all indices: 
## ===================
## * 2 proposed  0 as the best number of clusters
## * 1 proposed  1 as the best number of clusters
## * 8 proposed  2 as the best number of clusters
## * 3 proposed  3 as the best number of clusters
## * 1 proposed  4 as the best number of clusters
## * 6 proposed  6 as the best number of clusters
## * 1 proposed  7 as the best number of clusters
## * 1 proposed  9 as the best number of clusters
## * 3 proposed  10 as the best number of clusters
## 
## Conclusion
## =========================
## * According to the majority rule, the best number of clusters is  2 .

7.1.3.3 Clustering Partitionning

km.res <- kmeans(data_ploting_tsne, 3, nstart = 25)
fviz_cluster(km.res, data = data_ploting_tsne,
             ellipse.type = "convex",
             palette = "jco",
             ggtheme = theme_minimal())

res.hc <- data_ploting_tsne %>%
  scale() %>%                    
  dist(method = "euclidean") %>% 
  hclust(method = "ward.D2")   
fviz_dend(res.hc, k = 4, # Cut in four groups
          cex = 0.5, # label size
          k_colors = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"),
          color_labels_by_k = TRUE, # color labels by groups
          rect = TRUE # Add rectangle around groups
)

7.1.3.4 Clustering Evaluation

res.hc <- data_ploting_tsne %>%
  scale() %>%
  eclust("hclust", k = 3, graph = FALSE)
fviz_silhouette(res.hc)
##   cluster size ave.sil.width
## 1       1   46          0.26
## 2       2   26          0.60
## 3       3   28          0.35

7.2 Latent Dirichlet allocation

The basic assumption behind LDA is that each of the documents in a collection consist of a mixture of collection-wide topics. However, in reality we observe only documents and words, not topics – the latter are part of the hidden (or latent) structure of documents. The aim is to infer the latent topic structure given the words and document. LDA does this by recreating the documents in the corpus by adjusting the relative importance of topics in documents and words in topics iteratively.

Topic modelling provides a quick and convenient way to perform unsupervised classification of a corpus of documents.