Introduction

For our final project we chose to conduct marketing research for a local insurance company, Modern Edge Insurance Brokers, Inc. We created and distributed a survey, and performed regression and cluster analysis to intrepret the results.

Our Client’s Objectives

Our client Manny, a broker at Modern Edge, has primarily focused on auto insurance sales. He has established good relations with local car dealers. Whenever someone buys a new car, the car salesman points to Manny for insurance plans. Manny wants to gain insightful information to attract more customers. Some questions we hope to answer are which ad do customers prefer, do customers prefer pricing or coverage and which social media do customers prefer?

Our Hypothesis

Our first hypothesis was that younger ages preferred lower deductibles over better coverage. However, after our survey concluded the results showed we didn’t get enough diversity in age. 22 of the the 23 responses fell into the 18-24 category. Instead, we chose to go a different route. Our new hypothesis is that women are more likely to prefer a higher deductible from a national insurance company than a lower deductible from a local company.

Regression Analysis

data <- read.csv("MKTG4000survey.csv")
head(data)
##   ID Gender Age. SocialMedia PriceorCoverage ProgressiveCustomer.Service
## 1  1      0    1           5               2                           4
## 2  2      1    1           1               2                           3
## 3  3      1    1           2               2                           7
## 4  4      0    2           2               1                          NA
## 5  5      0    1           2               2                          NA
## 6  6      1    1           1               2                           4
##   ProgressivePrice ProgressiveCoverage State.FarmCustomer.Service
## 1                4                   4                          7
## 2                4                   2                          6
## 3               NA                  NA                          4
## 4               NA                  NA                          7
## 5               NA                  NA                         NA
## 6               NA                   4                          4
##   State.FarmPrice State.FarmCoverage All.StateCustomer.Service All.StatePrice
## 1               7                  7                         4              4
## 2               5                  5                         6              6
## 3              NA                 NA                         4             NA
## 4               4                  6                        NA             NA
## 5              NA                 NA                        NA             NA
## 6               4                  4                         4              4
##   All.StateCoverage localService localinsurancebrokersPrice
## 1                 4            4                          4
## 2                 6            5                          4
## 3                NA            2                         NA
## 4                NA           NA                         NA
## 5                NA           NA                         NA
## 6                 4            4                          4
##   localinsurancebrokersCoverage Whichoptionbetter Whichadisbetter
## 1                             4                25               2
## 2                             4                25               2
## 3                            NA                22               1
## 4                            NA                25               1
## 5                            NA                NA               1
## 6                             4                25               2
summary(lm(localinsurancebrokersPrice~ factor(Gender) + factor(SocialMedia), data = data))
## 
## Call:
## lm(formula = localinsurancebrokersPrice ~ factor(Gender) + factor(SocialMedia), 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5000 -0.3158 -0.2463  0.4626  1.5726 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.3432     0.4045   8.266 5.74e-07 ***
## factor(Gender)1        0.9032     0.4371   2.066   0.0565 .  
## factor(SocialMedia)2   1.2537     0.6485   1.933   0.0723 .  
## factor(SocialMedia)3  -0.6442     0.5669  -1.136   0.2736    
## factor(SocialMedia)5   1.1811     0.5272   2.240   0.0406 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8292 on 15 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.5426, Adjusted R-squared:  0.4206 
## F-statistic: 4.448 on 4 and 15 DF,  p-value: 0.01436

Cluster Analysis

library(readr)
mydata <-read_csv('MKTG4000survey.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   ID = col_double(),
##   Gender = col_double(),
##   `Age:` = col_double(),
##   SocialMedia = col_double(),
##   PriceorCoverage = col_double(),
##   `ProgressiveCustomer Service` = col_double(),
##   ProgressivePrice = col_double(),
##   ProgressiveCoverage = col_double(),
##   `State FarmCustomer Service` = col_double(),
##   `State FarmPrice` = col_double(),
##   `State FarmCoverage` = col_double(),
##   `All StateCustomer Service` = col_double(),
##   `All StatePrice` = col_double(),
##   `All StateCoverage` = col_double(),
##   localService = col_double(),
##   localinsurancebrokersPrice = col_double(),
##   localinsurancebrokersCoverage = col_double(),
##   Whichoptionbetter = col_double(),
##   Whichadisbetter = col_double()
## )

Building distance function and ploting the trees (dendrograms)

Hierarchical clustering (using the function hclust) is an informative way to visualize the data.

We will see if we could discover subgroups among the variables or among the observations.

use = scale(mydata[,-c(1)], center = TRUE, scale = TRUE)
dist = dist(use)  
d <- dist(as.matrix(dist))   # find distance matrix
seg.hclust <- hclust(d)                # apply hirarchical clustering
library(ggplot2) # needs no introduction
plot(seg.hclust)

Identifying clustering memberships for each cluster

Imagine if your goal is to find some profitable customers to target. Now you will be able to see the number of customers using this algorithm.

groups.3 = cutree(seg.hclust,3)
table(groups.3)  #A good first step is to use the table function to see how # many observations are in each cluster
## groups.3
##  1  2  3 
## 16  4  3
#In the following step, we will find the members in each cluster or group.
mydata$ID[groups.3 == 1]
##  [1]  1  2  6  8  9 10 11 12 13 14 15 18 19 21 22 23
mydata$ID[groups.3 == 2]
## [1]  3  5  7 16
mydata$ID[groups.3 == 3]
## [1]  4 17 20

Identifying common features of each cluster using the aggregate function

#?aggregate
aggregate(mydata,list(groups.3),median)
##   Group.1   ID Gender Age: SocialMedia PriceorCoverage
## 1       1 12.5    1.0    1           1             1.5
## 2       2  6.0    0.5    1           2             2.0
## 3       3 17.0    1.0    1           2             1.0
##   ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1                           4               NA                   4
## 2                          NA               NA                  NA
## 3                          NA               NA                  NA
##   State FarmCustomer Service State FarmPrice State FarmCoverage
## 1                          4               4                5.5
## 2                         NA              NA                 NA
## 3                         NA              NA                 NA
##   All StateCustomer Service All StatePrice All StateCoverage localService
## 1                         4              4                 4            4
## 2                        NA             NA                NA           NA
## 3                        NA             NA                NA           NA
##   localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1                          4                             4                25
## 2                         NA                            NA                NA
## 3                         NA                            NA                25
##   Whichadisbetter
## 1               2
## 2               1
## 3               1
aggregate(mydata,list(groups.3),mean)
##   Group.1       ID    Gender     Age: SocialMedia PriceorCoverage
## 1       1 12.75000 0.5625000 1.000000      2.0625        1.500000
## 2       2  7.75000 0.5000000 1.000000      2.0000        2.000000
## 3       3 13.66667 0.6666667 1.333333      3.0000        1.333333
##   ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1                      4.0625               NA              4.0625
## 2                          NA               NA                  NA
## 3                          NA               NA                  NA
##   State FarmCustomer Service State FarmPrice State FarmCoverage
## 1                     4.8125           4.625             5.4375
## 2                         NA              NA                 NA
## 3                         NA              NA                 NA
##   All StateCustomer Service All StatePrice All StateCoverage localService
## 1                    4.0625          4.125            4.4375        4.125
## 2                        NA             NA                NA           NA
## 3                        NA             NA                NA           NA
##   localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1                     3.9375                             4           24.0625
## 2                         NA                            NA                NA
## 3                         NA                            NA           25.0000
##   Whichadisbetter
## 1           1.625
## 2           1.000
## 3           1.000
aggregate(mydata[,-1],list(groups.3),median)
##   Group.1 Gender Age: SocialMedia PriceorCoverage ProgressiveCustomer Service
## 1       1    1.0    1           1             1.5                           4
## 2       2    0.5    1           2             2.0                          NA
## 3       3    1.0    1           2             1.0                          NA
##   ProgressivePrice ProgressiveCoverage State FarmCustomer Service
## 1               NA                   4                          4
## 2               NA                  NA                         NA
## 3               NA                  NA                         NA
##   State FarmPrice State FarmCoverage All StateCustomer Service All StatePrice
## 1               4                5.5                         4              4
## 2              NA                 NA                        NA             NA
## 3              NA                 NA                        NA             NA
##   All StateCoverage localService localinsurancebrokersPrice
## 1                 4            4                          4
## 2                NA           NA                         NA
## 3                NA           NA                         NA
##   localinsurancebrokersCoverage Whichoptionbetter Whichadisbetter
## 1                             4                25               2
## 2                            NA                NA               1
## 3                            NA                25               1
aggregate(mydata[,-1],list(groups.3),mean)
##   Group.1    Gender     Age: SocialMedia PriceorCoverage
## 1       1 0.5625000 1.000000      2.0625        1.500000
## 2       2 0.5000000 1.000000      2.0000        2.000000
## 3       3 0.6666667 1.333333      3.0000        1.333333
##   ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1                      4.0625               NA              4.0625
## 2                          NA               NA                  NA
## 3                          NA               NA                  NA
##   State FarmCustomer Service State FarmPrice State FarmCoverage
## 1                     4.8125           4.625             5.4375
## 2                         NA              NA                 NA
## 3                         NA              NA                 NA
##   All StateCustomer Service All StatePrice All StateCoverage localService
## 1                    4.0625          4.125            4.4375        4.125
## 2                        NA             NA                NA           NA
## 3                        NA             NA                NA           NA
##   localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1                     3.9375                             4           24.0625
## 2                         NA                            NA                NA
## 3                         NA                            NA           25.0000
##   Whichadisbetter
## 1           1.625
## 2           1.000
## 3           1.000
cluster_means <- aggregate(mydata[,-1],list(groups.3),mean)

Conclusion

In conclusion, customers preferred better coverage to price. This was to our surprise because we initially believed younger customers would prefer lower premium costs. With their low insurance prices, full coverage, and cheaper deductibles Modern Edge can compete against nationwide insurance companies like Progressive. From our survey results, we found out Twitter was the preferred social media site, it was biased because I distributed the survey on only that platform. Through cluster analysis we identified 3 clusters that can help with market segmentation. Through the regression analysis, we sought out correlation between women, social media, and choosing a local insurance company. Our hypothesis was disapproved and the results showed women tend to be more likely to choose a cheaper option than men.