qualtrics

##Our Client’s Objectives Our client has primarily focused on auto insurance sales He has established good relations with local car dealers. Whenever someone buys a new car, the car salesman points to Manny for insurance plans Manny wants to gain insightful information to attract more customers Determine which ad do customers prefer Do customers prefer pricing or coverage? Which social media do customers prefer?

##How We Will Conduct Our Research What will research: Modern Edge’s competitors Is there a seasonality issue? Is there a peak time to car insurance? Google Trends Income Statements We will then conduct primary research through surveys Regression Analysis Cluster Analysis

##Our Hypothesis Our first hypothesis was that younger ages preferred lower deductibles over better coverage. However, after our survey concluded the results showed we didn’t get enough diversity in age. 22 of the the 23 responses fell into the 18-24 category. Instead, we chose to go a different route. Our new hypothesis is that women are more likely to prefer a higher deductible from a national insurance company than men.

library(readr)
mydata <-read_csv('MKTG4000survey.csv')

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   ID = col_double(),
##   Gender = col_double(),
##   `Age:` = col_double(),
##   SocialMedia = col_double(),
##   PriceorCoverage = col_double(),
##   `ProgressiveCustomer Service` = col_double(),
##   ProgressivePrice = col_double(),
##   ProgressiveCoverage = col_double(),
##   `State FarmCustomer Service` = col_double(),
##   `State FarmPrice` = col_double(),
##   `State FarmCoverage` = col_double(),
##   `All StateCustomer Service` = col_double(),
##   `All StatePrice` = col_double(),
##   `All StateCoverage` = col_double(),
##   localService = col_double(),
##   localinsurancebrokersPrice = col_double(),
##   localinsurancebrokersCoverage = col_double(),
##   Whichoptionbetter = col_double(),
##   Whichadisbetter = col_double()
## )

Building distance function and ploting the trees (dendrograms)

Hierarchical clustering (using the function hclust) is an informative way to visualize the data.

We will see if we could discover subgroups among the variables or among the observations.

use = scale(mydata[,-c(1)], center = TRUE, scale = TRUE)
dist = dist(use)  
d <- dist(as.matrix(dist))   # find distance matrix
seg.hclust <- hclust(d)                # apply hirarchical clustering
library(ggplot2) # needs no introduction
plot(seg.hclust)

Identifying clustering memberships for each cluster

Imagine if your goal is to find some profitable customers to target. Now you will be able to see the number of customers using this algorithm.

groups.3 = cutree(seg.hclust,3)
table(groups.3)  #A good first step is to use the table function to see how # many observations are in each cluster

## groups.3
##  1  2  3 
## 16  4  3

#In the following step, we will find the members in each cluster or group.
mydata$ID[groups.3 == 1]

##  [1]  1  2  6  8  9 10 11 12 13 14 15 18 19 21 22 23

mydata$ID[groups.3 == 2]

## [1]  3  5  7 16

mydata$ID[groups.3 == 3]

## [1]  4 17 20

Identifying common features of each cluster using the aggregate function

#?aggregate
aggregate(mydata,list(groups.3),median)

##   Group.1   ID Gender Age: SocialMedia PriceorCoverage
## 1       1 12.5    1.0    1           1             1.5
## 2       2  6.0    0.5    1           2             2.0
## 3       3 17.0    1.0    1           2             1.0
##   ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1                           4               NA                   4
## 2                          NA               NA                  NA
## 3                          NA               NA                  NA
##   State FarmCustomer Service State FarmPrice State FarmCoverage
## 1                          4               4                5.5
## 2                         NA              NA                 NA
## 3                         NA              NA                 NA
##   All StateCustomer Service All StatePrice All StateCoverage localService
## 1                         4              4                 4            4
## 2                        NA             NA                NA           NA
## 3                        NA             NA                NA           NA
##   localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1                          4                             4                25
## 2                         NA                            NA                NA
## 3                         NA                            NA                25
##   Whichadisbetter
## 1               2
## 2               1
## 3               1

aggregate(mydata,list(groups.3),mean)

##   Group.1       ID    Gender     Age: SocialMedia PriceorCoverage
## 1       1 12.75000 0.5625000 1.000000      2.0625        1.500000
## 2       2  7.75000 0.5000000 1.000000      2.0000        2.000000
## 3       3 13.66667 0.6666667 1.333333      3.0000        1.333333
##   ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1                      4.0625               NA              4.0625
## 2                          NA               NA                  NA
## 3                          NA               NA                  NA
##   State FarmCustomer Service State FarmPrice State FarmCoverage
## 1                     4.8125           4.625             5.4375
## 2                         NA              NA                 NA
## 3                         NA              NA                 NA
##   All StateCustomer Service All StatePrice All StateCoverage localService
## 1                    4.0625          4.125            4.4375        4.125
## 2                        NA             NA                NA           NA
## 3                        NA             NA                NA           NA
##   localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1                     3.9375                             4           24.0625
## 2                         NA                            NA                NA
## 3                         NA                            NA           25.0000
##   Whichadisbetter
## 1           1.625
## 2           1.000
## 3           1.000

aggregate(mydata[,-1],list(groups.3),median)

##   Group.1 Gender Age: SocialMedia PriceorCoverage ProgressiveCustomer Service
## 1       1    1.0    1           1             1.5                           4
## 2       2    0.5    1           2             2.0                          NA
## 3       3    1.0    1           2             1.0                          NA
##   ProgressivePrice ProgressiveCoverage State FarmCustomer Service
## 1               NA                   4                          4
## 2               NA                  NA                         NA
## 3               NA                  NA                         NA
##   State FarmPrice State FarmCoverage All StateCustomer Service All StatePrice
## 1               4                5.5                         4              4
## 2              NA                 NA                        NA             NA
## 3              NA                 NA                        NA             NA
##   All StateCoverage localService localinsurancebrokersPrice
## 1                 4            4                          4
## 2                NA           NA                         NA
## 3                NA           NA                         NA
##   localinsurancebrokersCoverage Whichoptionbetter Whichadisbetter
## 1                             4                25               2
## 2                            NA                NA               1
## 3                            NA                25               1

aggregate(mydata[,-1],list(groups.3),mean)

##   Group.1    Gender     Age: SocialMedia PriceorCoverage
## 1       1 0.5625000 1.000000      2.0625        1.500000
## 2       2 0.5000000 1.000000      2.0000        2.000000
## 3       3 0.6666667 1.333333      3.0000        1.333333
##   ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1                      4.0625               NA              4.0625
## 2                          NA               NA                  NA
## 3                          NA               NA                  NA
##   State FarmCustomer Service State FarmPrice State FarmCoverage
## 1                     4.8125           4.625             5.4375
## 2                         NA              NA                 NA
## 3                         NA              NA                 NA
##   All StateCustomer Service All StatePrice All StateCoverage localService
## 1                    4.0625          4.125            4.4375        4.125
## 2                        NA             NA                NA           NA
## 3                        NA             NA                NA           NA
##   localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1                     3.9375                             4           24.0625
## 2                         NA                            NA                NA
## 3                         NA                            NA           25.0000
##   Whichadisbetter
## 1           1.625
## 2           1.000
## 3           1.000

cluster_means <- aggregate(mydata[,-1],list(groups.3),mean)

data <- read.csv("MKTG4000survey.csv")
head(data)

##   ID Gender Age. SocialMedia PriceorCoverage ProgressiveCustomer.Service
## 1  1      0    1           5               2                           4
## 2  2      1    1           1               2                           3
## 3  3      1    1           2               2                           7
## 4  4      0    2           2               1                          NA
## 5  5      0    1           2               2                          NA
## 6  6      1    1           1               2                           4
##   ProgressivePrice ProgressiveCoverage State.FarmCustomer.Service
## 1                4                   4                          7
## 2                4                   2                          6
## 3               NA                  NA                          4
## 4               NA                  NA                          7
## 5               NA                  NA                         NA
## 6               NA                   4                          4
##   State.FarmPrice State.FarmCoverage All.StateCustomer.Service All.StatePrice
## 1               7                  7                         4              4
## 2               5                  5                         6              6
## 3              NA                 NA                         4             NA
## 4               4                  6                        NA             NA
## 5              NA                 NA                        NA             NA
## 6               4                  4                         4              4
##   All.StateCoverage localService localinsurancebrokersPrice
## 1                 4            4                          4
## 2                 6            5                          4
## 3                NA            2                         NA
## 4                NA           NA                         NA
## 5                NA           NA                         NA
## 6                 4            4                          4
##   localinsurancebrokersCoverage Whichoptionbetter Whichadisbetter
## 1                             4                25               2
## 2                             4                25               2
## 3                            NA                22               1
## 4                            NA                25               1
## 5                            NA                NA               1
## 6                             4                25               2

summary(lm(localinsurancebrokersPrice~ factor(Gender) + factor(SocialMedia), data = data))

## 
## Call:
## lm(formula = localinsurancebrokersPrice ~ factor(Gender) + factor(SocialMedia), 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5000 -0.3158 -0.2463  0.4626  1.5726 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.3432     0.4045   8.266 5.74e-07 ***
## factor(Gender)1        0.9032     0.4371   2.066   0.0565 .  
## factor(SocialMedia)2   1.2537     0.6485   1.933   0.0723 .  
## factor(SocialMedia)3  -0.6442     0.5669  -1.136   0.2736    
## factor(SocialMedia)5   1.1811     0.5272   2.240   0.0406 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8292 on 15 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.5426, Adjusted R-squared:  0.4206 
## F-statistic: 4.448 on 4 and 15 DF,  p-value: 0.01436

qualtrics

yvette federspiel & Raul Mancilla

12/3/2020

Building distance function and ploting the trees (dendrograms)

Identifying clustering memberships for each cluster

Identifying common features of each cluster using the aggregate function