For this assignment, I have been asked to answer the business question, “What factors drive seasonality for homeowners quotes and production (conversion)?”. Using auto and homeowner quotes that were received in 2017 and 2018 (data has been modified as to not be actual customer data) sourced from State Farm, I will examine this question in depth with RStudio.
There are multiple parts to this portion of the analysis. First, I will address factors behind seasonality for homeowners’ quotes. Second, I will investigate the influences behind production or conversion. Lastly, I present my conclusions and offer findings that may inform company strategy.
I will begin by showing summary statistics for the entire Interview
dataset. This is tedious reading and note that some variables are
non-numeric which yields no statistical analysis. I have converted some
of these into dummy variables, such as in the case of the
First Point of Contact variable, which becomes
Agent if the First Point of Contact is equal to ‘Agent’.
Other variables I have created for my own personal exploration, but do
not use in my regression and written analysis.
summary(Interview) ## Month.x Year Home Purchase Year Home/Auto Discount
## Min. : 1.000 Min. :2017 Min. :1007 Min. :0.0000
## 1st Qu.: 4.000 1st Qu.:2017 1st Qu.:2006 1st Qu.:1.0000
## Median : 6.000 Median :2017 Median :2016 Median :1.0000
## Mean : 6.321 Mean :2017 Mean :2010 Mean :0.8627
## 3rd Qu.: 9.000 3rd Qu.:2018 3rd Qu.:2017 3rd Qu.:1.0000
## Max. :12.000 Max. :2018 Max. :2019 Max. :1.0000
##
## Amount of Insurance First Point of Contact Written Indicator Score
## Min. : 950 Length:173501 Min. :0.0000 Min. : -1.0
## 1st Qu.: 170050 Class :character 1st Qu.:0.0000 1st Qu.:647.0
## Median : 228000 Mode :character Median :0.0000 Median :731.0
## Mean : 269934 Mean :0.4868 Mean :713.6
## 3rd Qu.: 314450 3rd Qu.:1.0000 3rd Qu.:796.0
## Max. :7046150 Max. :1.0000 Max. :884.0
## NA's :7335
## Homeower Age Aggregator Month.y Quarter.x
## Min. : 20.00 Length:173501 Min. : 1.000 Min. :1.000
## 1st Qu.: 38.00 Class :character 1st Qu.: 4.000 1st Qu.:2.000
## Median : 48.00 Mode :character Median : 6.000 Median :2.000
## Mean : 49.46 Mean : 6.321 Mean :2.449
## 3rd Qu.: 60.00 3rd Qu.: 9.000 3rd Qu.:3.000
## Max. :131.00 Max. :12.000 Max. :4.000
##
## Quarter.y Q1 Q2 Q3
## Min. :1.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:2.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :2.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :2.449 Mean :0.2469 Mean :0.2781 Mean :0.2545
## 3rd Qu.:3.000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :4.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
##
## Q4 Q23 Q123 PoorCredit
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.000
## Median :0.0000 Median :1.0000 Median :1.0000 Median :0.000
## Mean :0.2206 Mean :0.5326 Mean :0.7794 Mean :0.133
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000
## NA's :7335
## VGExCredit Agent HomeAge
## Min. :0.000 Min. :0.0000 Min. : -2.000
## 1st Qu.:0.000 1st Qu.:1.0000 1st Qu.: 0.000
## Median :0.000 Median :1.0000 Median : 1.000
## Mean :0.466 Mean :0.8842 Mean : 7.038
## 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.: 11.000
## Max. :1.000 Max. :1.0000 Max. :1010.000
## NA's :7335
I examine the variable of Seasonality through the creation of Quarter variables. I attempt to quantify the factors behind seasonality by regarding my time variables as the outcome variable. When I look closer at homeowner’s quotes over the periods 2017 to 2018, I observe that the number of quotes appears to be greater within Q2 and Q3. I also note identifiable trends, such as a general increase over Q1 and a decrease over Q4. In simpler terms, quotes start lower in January, but increase by March. Then they are at a peak somewhere in the middle of the year, between April to September. Onward from October, homeowners’ quotes cool down with a rapid decrease.
Interview_Agg1 <- aggregate(`Amount of Insurance` ~ Month.x + Year,
Interview,
FUN = sum)
head(Interview_Agg1)## Month.x Year Amount of Insurance
## 1 1 2017 1828238209
## 2 2 2017 1714349272
## 3 3 2017 2076891081
## 4 4 2017 2049191708
## 5 5 2017 2306468947
## 6 6 2017 2204912979
Graph1 <- ggplot(Interview_Agg1, aes(x=Month.x, y=`Amount of Insurance`, group = Year)) +
geom_line() +
geom_point() +
facet_wrap(vars(Year)) +
scale_x_continuous('Month',breaks = scales::breaks_extended(n = 12), limits = c(1, 12)) +
theme_light()
Graph1 Why do homeowners’ quotes follow this trend? To answer this question, we might look into the factors that lead one to purchase an insurance policy. According to my research, there is no current federal law requiring homeowners to have insurance on their homes. Homeowner insurance pertains largely to damage that might arise from external stressors, such as burglaries or robberies and natural disaster damages sustained for floods, fires, and earthquakes, just to name a few. However, weather & natural disasters and/or crime may not explain this entirely.
An alternative approach is to think of when the opportunity of purchasing homeowners’ insurance arises for the average person; many people would likely ponder it during or not long after the purchasing process. Therefore, I hypothesize that most policy purchases for home insurance are new homeowners.
To test this, I’ve created a variable called HomeAge, which measures the number of years since the home was purchased. HomeAge is equivalent to the difference between the Home Purchasing Year and the Current Year [of the Policy]. I then produce summary statistics and the distribution of quotes by HomeAge.
print(summary(Interview$HomeAge))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.000 0.000 1.000 7.038 11.000 1010.000
histHomeAge<- hist(Interview$HomeAge, breaks='FD', xlim=c(-2,52), col="black")plot(histHomeAge, xlab = "Years Since Purchasing Home", ylab = "Counts",
main = "Histogram of Homeownership Year Distribution", xlim=c(-2,52), col = "black")Note that the majority of observations tend to occur in the first 0 to 2ish years, so I will take a closer look.
## Number of obs for which HomeAge==0 (or just purchased)
print(summary(Interview$HomeAge==0))## Mode FALSE TRUE
## logical 93471 80030
## Number of obs for which HomeAge==1 (or purchased within the last year)
print(summary(Interview$HomeAge==1))## Mode FALSE TRUE
## logical 162986 10515
## Number of obs for which HomeAge==2 (or purchased within last 2 years)
print(summary(Interview$HomeAge==2))## Mode FALSE TRUE
## logical 167200 6301
The summary statistics and distribution show that HomeAge is right-skewed. How many quotes are within the first year of homeownership or second year of homeownership?
NewHome01 <- Interview[ which(Interview$HomeAge>=0
& Interview$HomeAge < 1), ]
NewHome02 <- Interview[ which(Interview$HomeAge>=0
& Interview$HomeAge < 2), ]
print(nrow(NewHome01)/nrow(Interview))## [1] 0.4612654
print(nrow(NewHome02)/nrow(Interview))## [1] 0.5218702
When I focus specifically on the first year of home ownership, I find that approximately 46.1 percent of quotes take place within the first year of purchasing. This is understandable as there is seasonality in the housing market, where spring and summer have a greater number of houses sold and winter tails off.
Expanding this a little farther, I find further that over half of quotes–approximately 52.8 percent–take place within the first two years of purchasing. Note that this time range starts at 0 and does not include negative years as those circumstances (assuming property development, etc) are likely to have exogenous variation or differences that would be unaccounted for.
I can also test possible causes of seasonality through a regression model. Putting it simply, I want to see the effect of year of homeownership on the month/quarter that the policy is quoted. I expect that new owners, who most frequently buy in spring and summer, will be quoted policies. I suspect also that the coefficient or effect of homeownership year will be greater in quarters 2 and 3 (where Q2==1, Q3==1).For the outcome variable–the time of the year when the policy is quoted–I can use a variety of time variables, including dummy variables in specific months or quarters that I am interested in.
I will exclude relevant covariates for the models below.
## Using Months (1-12)
regM_1 = glm(formula = `Month.x` ~ HomeAge, data = Interview)
print(summary(regM_1))##
## Call:
## glm(formula = Month.x ~ HomeAge, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.3516 -2.3516 -0.3172 2.6656 8.1677
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.3516367 0.0088997 713.693 < 2e-16 ***
## HomeAge -0.0043076 0.0005749 -7.493 6.79e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 10.90105)
##
## Null deviance: 1891934 on 173500 degrees of freedom
## Residual deviance: 1891322 on 173499 degrees of freedom
## AIC: 906848
##
## Number of Fisher Scoring iterations: 2
## Using Quarters (Categorical, 1-4)
regQ_1 = glm(formula = `Quarter.x` ~ HomeAge, data = Interview)
print(summary(regQ_1))##
## Call:
## glm(formula = Quarter.x ~ HomeAge, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4583 -0.4571 -0.4323 0.5642 2.5124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4571186 0.0029305 838.463 < 2e-16 ***
## HomeAge -0.0011837 0.0001893 -6.253 4.04e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.181966)
##
## Null deviance: 205116 on 173500 degrees of freedom
## Residual deviance: 205070 on 173499 degrees of freedom
## AIC: 521384
##
## Number of Fisher Scoring iterations: 2
## By Quarter
## Individual Quarter Dummies
# Quarter 1
regQ1_1 = glm(formula = Q1 ~ HomeAge, data = Interview, family = binomial)
print(summary(regQ1_1))##
## Call:
## glm(formula = Q1 ~ HomeAge, family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9489 -0.7508 -0.7401 -0.7383 1.6929
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1549745 0.0065030 -177.61 <2e-16 ***
## HomeAge 0.0054901 0.0004584 11.97 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 193924 on 173500 degrees of freedom
## Residual deviance: 193756 on 173499 degrees of freedom
## AIC: 193760
##
## Number of Fisher Scoring iterations: 4
# Quarter 2
regQ2_1 = glm(formula = Q2 ~ HomeAge, data = Interview, family = binomial)
print(summary(regQ2_1))##
## Call:
## glm(formula = Q2 ~ HomeAge, family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8229 -0.8193 -0.8052 1.5842 3.3919
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.9192408 0.0062801 -146.37 <2e-16 ***
## HomeAge -0.0050950 0.0004906 -10.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 205118 on 173500 degrees of freedom
## Residual deviance: 204999 on 173499 degrees of freedom
## AIC: 205003
##
## Number of Fisher Scoring iterations: 4
# Quarter 3
regQ3_1 = glm(formula = Q3 ~ HomeAge, data = Interview, family = binomial)
print(summary(regQ3_1))##
## Call:
## glm(formula = Q3 ~ HomeAge, family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7712 -0.7701 -0.7664 1.6493 2.3332
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0635882 0.0063312 -167.993 < 2e-16 ***
## HomeAge -0.0015902 0.0004519 -3.519 0.000433 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 196835 on 173500 degrees of freedom
## Residual deviance: 196821 on 173499 degrees of freedom
## AIC: 196825
##
## Number of Fisher Scoring iterations: 4
# Quarter 4
regQ4_1 = glm(formula = Q4 ~ HomeAge, data = Interview, family = binomial)
print(summary(regQ4_1))##
## Call:
## glm(formula = Q4 ~ HomeAge, family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7774 -0.7059 -0.7055 -0.7055 1.7396
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2639358 0.0064814 -195.011 <2e-16 ***
## HomeAge 0.0002199 0.0004124 0.533 0.594
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 183085 on 173500 degrees of freedom
## Residual deviance: 183085 on 173499 degrees of freedom
## AIC: 183089
##
## Number of Fisher Scoring iterations: 4
## Quarters Grouped (2 & 3)
regQ23_1 = glm(formula = Q23 ~ HomeAge, data = Interview, family = binomial)
print(summary(regQ23_1))##
## Call:
## glm(formula = Q23 ~ HomeAge, family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.253 -1.244 1.108 1.110 3.107
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1654571 0.0056380 29.35 <2e-16 ***
## HomeAge -0.0049842 0.0004201 -11.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 239786 on 173500 degrees of freedom
## Residual deviance: 239629 on 173499 degrees of freedom
## AIC: 239633
##
## Number of Fisher Scoring iterations: 3
regQ123_1 = glm(formula = Q123 ~ HomeAge, data = Interview, family = binomial)
print(summary(regQ123_1))##
## Call:
## glm(formula = Q123 ~ HomeAge, family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7396 0.7055 0.7055 0.7059 0.7774
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.2639358 0.0064814 195.011 <2e-16 ***
## HomeAge -0.0002199 0.0004124 -0.533 0.594
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 183085 on 173500 degrees of freedom
## Residual deviance: 183085 on 173499 degrees of freedom
## AIC: 183089
##
## Number of Fisher Scoring iterations: 4
If I include relevant controls to add to this model, I can account for various factors that influence one’s season of the policy quote but also their housing purchasing behavior. Some relevant controls that I am given are the age and the credit score of the homeowner.
I create age dummies for the 30s and 40s as homeowner quotes are frequently observed among these age groups. I do not have a strong opinion of how being within these age groups will affect seeking a quote in a season.
I also create a dummy for Poor Credit, which is equal to 1 if the
given Score is less than or equal to 579. I hypothesize
that poor credit is negatively correlated with insurance quotes,
particularly in this case as the sample seems to be skewed towards
homeownership–which is typically dependent upon a credit score
threshold.
## Age
Interview$Age30s <- ifelse(Interview$`Homeower Age`>=30 & Interview$`Homeower Age`<40,1,0)
Interview$Age40s <- ifelse(Interview$`Homeower Age`>=40 & Interview$`Homeower Age`<50,1,0)
## Poor Credit
Interview$PoorCredit <- ifelse(Interview$Score<=579, 1, 0)
## Quarter Categorical
regQ_2 = glm(formula = Quarter.x ~ HomeAge + Age30s + Age40s + PoorCredit, data = Interview)
print(summary(regQ_2))##
## Call:
## glm(formula = Quarter.x ~ HomeAge + Age30s + Age40s + PoorCredit,
## data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4798 -0.4798 -0.4210 0.5745 2.7340
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4797701 0.0043230 573.617 < 2e-16 ***
## HomeAge -0.0014820 0.0002001 -7.407 1.30e-13 ***
## Age30s -0.0528263 0.0068607 -7.700 1.37e-14 ***
## Age40s -0.0211524 0.0066851 -3.164 0.001556 **
## PoorCredit -0.0283520 0.0078462 -3.613 0.000302 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.17692)
##
## Null deviance: 195680 on 166165 degrees of freedom
## Residual deviance: 195558 on 166161 degrees of freedom
## (7335 observations deleted due to missingness)
## AIC: 498634
##
## Number of Fisher Scoring iterations: 2
## Quarters Grouped/Dummy (2 & 3)
regQ23_2 = glm(formula = Q23 ~ HomeAge + Age30s + Age40s + PoorCredit, data = Interview, family = binomial)
print(summary(regQ23_2))##
## Call:
## glm(formula = Q23 ~ HomeAge + Age30s + Age40s + PoorCredit, family = binomial,
## data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.263 -1.244 1.096 1.109 2.980
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1636300 0.0084139 19.448 < 2e-16 ***
## HomeAge -0.0046178 0.0004538 -10.176 < 2e-16 ***
## Age30s 0.0302781 0.0128866 2.350 0.01879 *
## Age40s 0.0265197 0.0124665 2.127 0.03340 *
## PoorCredit -0.0403949 0.0144937 -2.787 0.00532 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 229546 on 166165 degrees of freedom
## Residual deviance: 229377 on 166161 degrees of freedom
## (7335 observations deleted due to missingness)
## AIC: 229387
##
## Number of Fisher Scoring iterations: 3
Conversion is dependent on whether the policy is filed. The outcome
variable is the Written Indicator, where a value of ‘1’
indicates that State Farm wrote/converted the policy. I capture
seasonality in my approach through the Quarter variable. This yields
insignificant coefficients with the categorical variable and strongly
significant effects for the dummy for Quarters 2 & 3 only. Based on
this and other statistics, I elect to proceed with the dummy variable
for Quarters 2 & 3 only.
## Quarter Categorical
policy_reg = glm(formula = `Written Indicator` ~ Quarter.x, data = Interview, family = binomial)
print(summary(policy_reg))##
## Call:
## glm(formula = `Written Indicator` ~ Quarter.x, family = binomial,
## data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.157 -1.155 -1.154 1.200 1.201
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.058069 0.011836 -4.906 9.3e-07 ***
## Quarter.x 0.002200 0.004418 0.498 0.618
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 240403 on 173500 degrees of freedom
## Residual deviance: 240403 on 173499 degrees of freedom
## AIC: 240407
##
## Number of Fisher Scoring iterations: 3
## Quarters 2 & 3
policy_regQ23_1 = glm(formula = `Written Indicator` ~ Q23, data = Interview, family = binomial)
print(summary(policy_regQ23_1))##
## Call:
## glm(formula = `Written Indicator` ~ Q23, family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.192 -1.192 -1.114 1.163 1.243
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.151910 0.007043 -21.57 <2e-16 ***
## Q23 0.186090 0.009639 19.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 240403 on 173500 degrees of freedom
## Residual deviance: 240030 on 173499 degrees of freedom
## AIC: 240034
##
## Number of Fisher Scoring iterations: 3
Next, I explore two control variables that I believe to be relevant.
The first is Agent which is equal to ‘1’ if a State Farm
Agent is the ‘First Point of Contact’. I expect that if an Agent is the
FPOC, then there will be higher likelihood (a positive coefficent) of
the policy being converted by State Farm.
The second is Home/Auto Discount, which is equal to ‘1’
if the quote received a discount for also quoting auto insurance. I
expect that if auto insurance is also quoted, then there is greater
likelihood (a positive coefficient) of the policy being converted by
State Farm.
## Agent
policy_regQ23_2 = glm(formula = `Written Indicator` ~ Q23 + Agent, data = Interview, family = binomial)
print(summary(policy_regQ23_2)) ##
## Call:
## glm(formula = `Written Indicator` ~ Q23 + Agent, family = binomial,
## data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.223 -1.145 -0.886 1.132 1.500
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.732533 0.015721 -46.60 <2e-16 ***
## Q23 0.184185 0.009689 19.01 <2e-16 ***
## Agent 0.655101 0.015701 41.72 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 240403 on 173500 degrees of freedom
## Residual deviance: 238215 on 173498 degrees of freedom
## AIC: 238221
##
## Number of Fisher Scoring iterations: 4
## Discount
policy_regQ23_3 = glm(formula = `Written Indicator` ~ Q23 + `Home/Auto Discount`, data = Interview, family = binomial)
print(summary(policy_regQ23_3)) ##
## Call:
## glm(formula = `Written Indicator` ~ Q23 + `Home/Auto Discount`,
## family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2484 -1.1703 -0.7742 1.1081 1.6438
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.051290 0.015393 -68.30 <2e-16 ***
## Q23 0.182035 0.009775 18.62 <2e-16 ***
## `Home/Auto Discount` 1.034531 0.015374 67.29 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 240403 on 173500 degrees of freedom
## Residual deviance: 235057 on 173498 degrees of freedom
## AIC: 235063
##
## Number of Fisher Scoring iterations: 4
## Discount and Auto
policy_regQ23_4 = glm(formula = `Written Indicator` ~ Q23 + Agent + `Home/Auto Discount`, data = Interview, family = binomial)
print(summary(policy_regQ23_4)) ##
## Call:
## glm(formula = `Written Indicator` ~ Q23 + Agent + `Home/Auto Discount`,
## family = binomial, data = Interview)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2617 -1.1835 -0.6894 1.1713 1.7627
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.315931 0.019225 -68.45 <2e-16 ***
## Q23 0.181597 0.009791 18.55 <2e-16 ***
## Agent 0.393929 0.016513 23.86 <2e-16 ***
## `Home/Auto Discount` 0.936402 0.015882 58.96 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 240403 on 173500 degrees of freedom
## Residual deviance: 234479 on 173497 degrees of freedom
## AIC: 234487
##
## Number of Fisher Scoring iterations: 4
When these two controls are included in the estimation of policy conversion, I find that the Quarters 2 & 3, Agent, and Home/Auto Discount dummies have a positive, significant effect on a policy conversion outcome. Specifically:
My findings in Rstudio suggest that Q2 and Q3 are the most profitable for State Farm due to the increased quote volume. The month of October appears to be an outlier while Q4 shows a decrease overall. I find that these patterns reflect a correlation with the housing market with the statistically significant association to recent homeownership and general research of housing market seasonal patterns.
The housing market is exogenous to State Farm, but company strategy could focus on controllable variables that would return greater profits. One suggestion is to potentially increase the number of home and auto discounts as there is a very strong, significant effect of these on the policy being converted by State Farm. This would, in turn, boost auto quote volume in Q2 and Q3, which shows room for improvement as they are visually ‘deflated’ compared to those that I see within homeowners’ quotes.
In conclusion, I find that:
Further research could employ other models in regards to this
context–particularly in regards to a two-stage approach as conversion is
dependent on whether the policy is filed. I can predict whether a policy
is filed using an Instrumental Variable approach, and then see how this
influences conversion. This would be a two-stage approach which would
mean: Written Indicator is a function of the Quarter, which
is a function of the Year of Homeownership, Age Dummies, and Poor Credit
Dummy. Another alternative would be a Mixed Effects model.