R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Notice:

You should not focus on memorizing syntax.

Segmentation

Objective - Dividing the target market or customers on the basis of some significant features which could help a company sell more products in less marketing expenses.

A potentially interesting question might be are some products (or customers) more alike than the others.

The goal of this tutorial is to show you how market segmentation works using a 2-variable dataset. For a complex example that involves more than 2 factors, you may want to visit the following tutorial: https://rpubs.com/utjimmyx/pcacluster.

Market segmentation

Market segmentation is a strategy that divides a broad target market of customers into smaller, more similar groups, and then designs a marketing strategy specifically for each group. Clustering is a common technique for market segmentation since it automatically finds similar groups given a data set.

Example

The file segmetation_analysis.csv is automatically generated, and contains information on consumers’ perceptions toward two sporting events. The purpose of the case analysis is to gain a better understanding of the consumer segments for the brands, in hopes that such understanding would allow the brand to develop effective segment- or product-specific advertising campaigns.

library(cluster)
library(fpc)
library(readr)
mydata <-read_csv('segmentation_analysis.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Consumer = col_double(),
##   NASCAR = col_double(),
##   NCAA_College_Football = col_double()
## )
# Kmeans cluster analysis
clus <- kmeans(mydata, centers=3)
# Fig 01
plotcluster(mydata, clus$cluster)

# More complex
clusplot(mydata, clus$cluster, color=TRUE, shade=TRUE, 
         labels=2, lines=0)

References

https://cran.r-project.org/web/packages/fpc/fpc.pdf

#install.packages('dplyr')
library(dplyr) # sane data manipulation
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr) # sane data munging
library(ggplot2) # needs no introduction
library(ggfortify) # super-helpful for plotting non-"standard" stats objects

#identifying your working directory
getwd() #confirm your working directory is accurate
## [1] "/cloud/project"
library(readr)

mydata <-read_csv('segmentation_analysis.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Consumer = col_double(),
##   NASCAR = col_double(),
##   NCAA_College_Football = col_double()
## )
# read csv file #This allows you to read the data from my Github site.

#Open the data. Note that some students will see an Excel option in "Import Dataset";
#those that do not will need to save the original data as a csv and import that as a text file.
#rm(list = ls()) #used to clean your working environment
fit <- kmeans(mydata[,-1], 3, iter.max=1000)
#exclude the first column since it is "id" instead of a factor #or variable.
#3 means you want to have 3 clusters
table(fit$cluster)
## 
## 1 2 3 
## 5 4 1
barplot(table(fit$cluster), col="#336699") #plot

pca <- prcomp(mydata[,-1]) #principle component analysis
pca_data <- mutate(fortify(pca), col=fit$cluster)
#We want to examine the cluster memberships for each #observation - see last column

ggplot(pca_data) + geom_point(aes(x=PC1, y=PC2, fill=factor(col)),
size=3, col="#7f7f7f", shape=21) + theme_bw(base_family="Helvetica")

autoplot(fit, data=mydata[,-1], frame=TRUE, frame.type='norm')
## Warning: `select_()` is deprecated as of dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Too few points to calculate an ellipse

write.csv(pca_data, "pca_data.csv")
#save your cluster solutions in the working directory
#We want to examine the cluster memberships for each observation - see last column of pca_data

References

Cluster analysis - reading (p.385-p.399) https://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf

Introduction to k-Means clustering in R https://www.r-bloggers.com/introduction-to-k-means-clustering-in-r/

Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L) https://www.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000100014&lng=en&nrm=iso

Principal Component Methods in R: Practical Guide http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/118-principal-component-analysis-in-r-prcomp-vs-princomp/