For our final project we chose to conduct marketing research for a local insurance company, Modern Edge Insurance Brokers, Inc. We created and distributed a survey, and performed regression and cluster analysis to intrepret the results.
Our client Manny, a broker at Modern Edge, has primarily focused on auto insurance sales. He has established good relations with local car dealers. Whenever someone buys a new car, the car salesman points to Manny for insurance plans. Manny wants to gain insightful information to attract more customers. Some questions we hope to answer are which ad do customers prefer, do customers prefer pricing or coverage and which social media do customers prefer?
Our first hypothesis was that younger ages preferred lower deductibles over better coverage. However, after our survey concluded the results showed we didn’t get enough diversity in age. 22 of the the 23 responses fell into the 18-24 category. Instead, we chose to go a different route. Our new hypothesis is that women are more likely to prefer a higher deductible from a national insurance company than a lower deductible from a local company.
data <- read.csv("MKTG4000survey.csv")
head(data)
## ID Gender Age. SocialMedia PriceorCoverage ProgressiveCustomer.Service
## 1 1 0 1 5 2 4
## 2 2 1 1 1 2 3
## 3 3 1 1 2 2 7
## 4 4 0 2 2 1 NA
## 5 5 0 1 2 2 NA
## 6 6 1 1 1 2 4
## ProgressivePrice ProgressiveCoverage State.FarmCustomer.Service
## 1 4 4 7
## 2 4 2 6
## 3 NA NA 4
## 4 NA NA 7
## 5 NA NA NA
## 6 NA 4 4
## State.FarmPrice State.FarmCoverage All.StateCustomer.Service All.StatePrice
## 1 7 7 4 4
## 2 5 5 6 6
## 3 NA NA 4 NA
## 4 4 6 NA NA
## 5 NA NA NA NA
## 6 4 4 4 4
## All.StateCoverage localService localinsurancebrokersPrice
## 1 4 4 4
## 2 6 5 4
## 3 NA 2 NA
## 4 NA NA NA
## 5 NA NA NA
## 6 4 4 4
## localinsurancebrokersCoverage Whichoptionbetter Whichadisbetter
## 1 4 25 2
## 2 4 25 2
## 3 NA 22 1
## 4 NA 25 1
## 5 NA NA 1
## 6 4 25 2
summary(lm(localinsurancebrokersPrice~ factor(Gender) + factor(SocialMedia), data = data))
##
## Call:
## lm(formula = localinsurancebrokersPrice ~ factor(Gender) + factor(SocialMedia),
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5000 -0.3158 -0.2463 0.4626 1.5726
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.3432 0.4045 8.266 5.74e-07 ***
## factor(Gender)1 0.9032 0.4371 2.066 0.0565 .
## factor(SocialMedia)2 1.2537 0.6485 1.933 0.0723 .
## factor(SocialMedia)3 -0.6442 0.5669 -1.136 0.2736
## factor(SocialMedia)5 1.1811 0.5272 2.240 0.0406 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8292 on 15 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.5426, Adjusted R-squared: 0.4206
## F-statistic: 4.448 on 4 and 15 DF, p-value: 0.01436
library(readr)
mydata <-read_csv('MKTG4000survey.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## ID = col_double(),
## Gender = col_double(),
## `Age:` = col_double(),
## SocialMedia = col_double(),
## PriceorCoverage = col_double(),
## `ProgressiveCustomer Service` = col_double(),
## ProgressivePrice = col_double(),
## ProgressiveCoverage = col_double(),
## `State FarmCustomer Service` = col_double(),
## `State FarmPrice` = col_double(),
## `State FarmCoverage` = col_double(),
## `All StateCustomer Service` = col_double(),
## `All StatePrice` = col_double(),
## `All StateCoverage` = col_double(),
## localService = col_double(),
## localinsurancebrokersPrice = col_double(),
## localinsurancebrokersCoverage = col_double(),
## Whichoptionbetter = col_double(),
## Whichadisbetter = col_double()
## )
Hierarchical clustering (using the function hclust) is an informative way to visualize the data.
We will see if we could discover subgroups among the variables or among the observations.
use = scale(mydata[,-c(1)], center = TRUE, scale = TRUE)
dist = dist(use)
d <- dist(as.matrix(dist)) # find distance matrix
seg.hclust <- hclust(d) # apply hirarchical clustering
library(ggplot2) # needs no introduction
plot(seg.hclust)
Imagine if your goal is to find some profitable customers to target. Now you will be able to see the number of customers using this algorithm.
groups.3 = cutree(seg.hclust,3)
table(groups.3) #A good first step is to use the table function to see how # many observations are in each cluster
## groups.3
## 1 2 3
## 16 4 3
#In the following step, we will find the members in each cluster or group.
mydata$ID[groups.3 == 1]
## [1] 1 2 6 8 9 10 11 12 13 14 15 18 19 21 22 23
mydata$ID[groups.3 == 2]
## [1] 3 5 7 16
mydata$ID[groups.3 == 3]
## [1] 4 17 20
#?aggregate
aggregate(mydata,list(groups.3),median)
## Group.1 ID Gender Age: SocialMedia PriceorCoverage
## 1 1 12.5 1.0 1 1 1.5
## 2 2 6.0 0.5 1 2 2.0
## 3 3 17.0 1.0 1 2 1.0
## ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1 4 NA 4
## 2 NA NA NA
## 3 NA NA NA
## State FarmCustomer Service State FarmPrice State FarmCoverage
## 1 4 4 5.5
## 2 NA NA NA
## 3 NA NA NA
## All StateCustomer Service All StatePrice All StateCoverage localService
## 1 4 4 4 4
## 2 NA NA NA NA
## 3 NA NA NA NA
## localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1 4 4 25
## 2 NA NA NA
## 3 NA NA 25
## Whichadisbetter
## 1 2
## 2 1
## 3 1
aggregate(mydata,list(groups.3),mean)
## Group.1 ID Gender Age: SocialMedia PriceorCoverage
## 1 1 12.75000 0.5625000 1.000000 2.0625 1.500000
## 2 2 7.75000 0.5000000 1.000000 2.0000 2.000000
## 3 3 13.66667 0.6666667 1.333333 3.0000 1.333333
## ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1 4.0625 NA 4.0625
## 2 NA NA NA
## 3 NA NA NA
## State FarmCustomer Service State FarmPrice State FarmCoverage
## 1 4.8125 4.625 5.4375
## 2 NA NA NA
## 3 NA NA NA
## All StateCustomer Service All StatePrice All StateCoverage localService
## 1 4.0625 4.125 4.4375 4.125
## 2 NA NA NA NA
## 3 NA NA NA NA
## localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1 3.9375 4 24.0625
## 2 NA NA NA
## 3 NA NA 25.0000
## Whichadisbetter
## 1 1.625
## 2 1.000
## 3 1.000
aggregate(mydata[,-1],list(groups.3),median)
## Group.1 Gender Age: SocialMedia PriceorCoverage ProgressiveCustomer Service
## 1 1 1.0 1 1 1.5 4
## 2 2 0.5 1 2 2.0 NA
## 3 3 1.0 1 2 1.0 NA
## ProgressivePrice ProgressiveCoverage State FarmCustomer Service
## 1 NA 4 4
## 2 NA NA NA
## 3 NA NA NA
## State FarmPrice State FarmCoverage All StateCustomer Service All StatePrice
## 1 4 5.5 4 4
## 2 NA NA NA NA
## 3 NA NA NA NA
## All StateCoverage localService localinsurancebrokersPrice
## 1 4 4 4
## 2 NA NA NA
## 3 NA NA NA
## localinsurancebrokersCoverage Whichoptionbetter Whichadisbetter
## 1 4 25 2
## 2 NA NA 1
## 3 NA 25 1
aggregate(mydata[,-1],list(groups.3),mean)
## Group.1 Gender Age: SocialMedia PriceorCoverage
## 1 1 0.5625000 1.000000 2.0625 1.500000
## 2 2 0.5000000 1.000000 2.0000 2.000000
## 3 3 0.6666667 1.333333 3.0000 1.333333
## ProgressiveCustomer Service ProgressivePrice ProgressiveCoverage
## 1 4.0625 NA 4.0625
## 2 NA NA NA
## 3 NA NA NA
## State FarmCustomer Service State FarmPrice State FarmCoverage
## 1 4.8125 4.625 5.4375
## 2 NA NA NA
## 3 NA NA NA
## All StateCustomer Service All StatePrice All StateCoverage localService
## 1 4.0625 4.125 4.4375 4.125
## 2 NA NA NA NA
## 3 NA NA NA NA
## localinsurancebrokersPrice localinsurancebrokersCoverage Whichoptionbetter
## 1 3.9375 4 24.0625
## 2 NA NA NA
## 3 NA NA 25.0000
## Whichadisbetter
## 1 1.625
## 2 1.000
## 3 1.000
cluster_means <- aggregate(mydata[,-1],list(groups.3),mean)
In conclusion, customers preferred better coverage to price. This was to our surprise because we initially believed younger customers would prefer lower premium costs. With their low insurance prices, full coverage, and cheaper deductibles Modern Edge can compete against nationwide insurance companies like Progressive. From our survey results, we found out Twitter was the preferred social media site, it was biased because I distributed the survey on only that platform. Through cluster analysis we identified 3 clusters that can help with market segmentation. Through the regression analysis, we sought out correlation between women, social media, and choosing a local insurance company. Our hypothesis was disapproved and the results showed women tend to be more likely to choose a cheaper option than men.