What factors drive seasonality for homeowners quotes and production (conversion)?

Carly Blaier

2023-04-19


Introduction

For this assignment, I have been asked to answer the business question, “What factors drive seasonality for homeowners quotes and production (conversion)?”. Using auto and homeowner quotes that were received in 2017 and 2018 (data has been modified as to not be actual customer data) sourced from State Farm, I will examine this question in depth with RStudio.

There are multiple parts to this portion of the analysis. First, I will address factors behind seasonality for homeowners’ quotes. Second, I will investigate the influences behind production or conversion. Lastly, I present my conclusions and offer findings that may inform company strategy.

Dataset

I will begin by showing summary statistics for the entire Interview dataset. This is tedious reading and note that some variables are non-numeric which yields no statistical analysis. I have converted some of these into dummy variables, such as in the case of the First Point of Contact variable, which becomes Agent if the First Point of Contact is equal to ‘Agent’. Other variables I have created for my own personal exploration, but do not use in my regression and written analysis.

summary(Interview) 
##     Month.x            Year      Home Purchase Year Home/Auto Discount
##  Min.   : 1.000   Min.   :2017   Min.   :1007       Min.   :0.0000    
##  1st Qu.: 4.000   1st Qu.:2017   1st Qu.:2006       1st Qu.:1.0000    
##  Median : 6.000   Median :2017   Median :2016       Median :1.0000    
##  Mean   : 6.321   Mean   :2017   Mean   :2010       Mean   :0.8627    
##  3rd Qu.: 9.000   3rd Qu.:2018   3rd Qu.:2017       3rd Qu.:1.0000    
##  Max.   :12.000   Max.   :2018   Max.   :2019       Max.   :1.0000    
##                                                                       
##  Amount of Insurance First Point of Contact Written Indicator     Score      
##  Min.   :    950     Length:173501          Min.   :0.0000    Min.   : -1.0  
##  1st Qu.: 170050     Class :character       1st Qu.:0.0000    1st Qu.:647.0  
##  Median : 228000     Mode  :character       Median :0.0000    Median :731.0  
##  Mean   : 269934                            Mean   :0.4868    Mean   :713.6  
##  3rd Qu.: 314450                            3rd Qu.:1.0000    3rd Qu.:796.0  
##  Max.   :7046150                            Max.   :1.0000    Max.   :884.0  
##                                                               NA's   :7335   
##   Homeower Age     Aggregator           Month.y         Quarter.x    
##  Min.   : 20.00   Length:173501      Min.   : 1.000   Min.   :1.000  
##  1st Qu.: 38.00   Class :character   1st Qu.: 4.000   1st Qu.:2.000  
##  Median : 48.00   Mode  :character   Median : 6.000   Median :2.000  
##  Mean   : 49.46                      Mean   : 6.321   Mean   :2.449  
##  3rd Qu.: 60.00                      3rd Qu.: 9.000   3rd Qu.:3.000  
##  Max.   :131.00                      Max.   :12.000   Max.   :4.000  
##                                                                      
##    Quarter.y           Q1               Q2               Q3        
##  Min.   :1.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:2.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :2.000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :2.449   Mean   :0.2469   Mean   :0.2781   Mean   :0.2545  
##  3rd Qu.:3.000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :4.000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                                                    
##        Q4              Q23              Q123          PoorCredit   
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.000  
##  Median :0.0000   Median :1.0000   Median :1.0000   Median :0.000  
##  Mean   :0.2206   Mean   :0.5326   Mean   :0.7794   Mean   :0.133  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
##                                                     NA's   :7335   
##    VGExCredit        Agent           HomeAge        
##  Min.   :0.000   Min.   :0.0000   Min.   :  -2.000  
##  1st Qu.:0.000   1st Qu.:1.0000   1st Qu.:   0.000  
##  Median :0.000   Median :1.0000   Median :   1.000  
##  Mean   :0.466   Mean   :0.8842   Mean   :   7.038  
##  3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:  11.000  
##  Max.   :1.000   Max.   :1.0000   Max.   :1010.000  
##  NA's   :7335

Seasonality in Homeowners’ Quotes

I examine the variable of Seasonality through the creation of Quarter variables. I attempt to quantify the factors behind seasonality by regarding my time variables as the outcome variable. When I look closer at homeowner’s quotes over the periods 2017 to 2018, I observe that the number of quotes appears to be greater within Q2 and Q3. I also note identifiable trends, such as a general increase over Q1 and a decrease over Q4. In simpler terms, quotes start lower in January, but increase by March. Then they are at a peak somewhere in the middle of the year, between April to September. Onward from October, homeowners’ quotes cool down with a rapid decrease.

Interview_Agg1 <- aggregate(`Amount of Insurance` ~ Month.x + Year, 
                        Interview,
                        FUN = sum)
head(Interview_Agg1)
##   Month.x Year Amount of Insurance
## 1       1 2017          1828238209
## 2       2 2017          1714349272
## 3       3 2017          2076891081
## 4       4 2017          2049191708
## 5       5 2017          2306468947
## 6       6 2017          2204912979
Graph1 <- ggplot(Interview_Agg1, aes(x=Month.x, y=`Amount of Insurance`, group = Year)) +
  geom_line() + 
  geom_point() +
  facet_wrap(vars(Year)) +
  scale_x_continuous('Month',breaks = scales::breaks_extended(n = 12), limits = c(1, 12)) +
  theme_light()



Graph1 

Why do homeowners’ quotes follow this trend? To answer this question, we might look into the factors that lead one to purchase an insurance policy. According to my research, there is no current federal law requiring homeowners to have insurance on their homes. Homeowner insurance pertains largely to damage that might arise from external stressors, such as burglaries or robberies and natural disaster damages sustained for floods, fires, and earthquakes, just to name a few. However, weather & natural disasters and/or crime may not explain this entirely.

An alternative approach is to think of when the opportunity of purchasing homeowners’ insurance arises for the average person; many people would likely ponder it during or not long after the purchasing process. Therefore, I hypothesize that most policy purchases for home insurance are new homeowners.

To test this, I’ve created a variable called HomeAge, which measures the number of years since the home was purchased. HomeAge is equivalent to the difference between the Home Purchasing Year and the Current Year [of the Policy]. I then produce summary statistics and the distribution of quotes by HomeAge.

print(summary(Interview$HomeAge))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   -2.000    0.000    1.000    7.038   11.000 1010.000
histHomeAge<- hist(Interview$HomeAge, breaks='FD', xlim=c(-2,52), col="black")

plot(histHomeAge, xlab = "Years Since Purchasing Home", ylab = "Counts",
     main = "Histogram of Homeownership Year Distribution", xlim=c(-2,52), col = "black")

Note that the majority of observations tend to occur in the first 0 to 2ish years, so I will take a closer look.

## Number of obs for which HomeAge==0 (or just purchased)
print(summary(Interview$HomeAge==0))
##    Mode   FALSE    TRUE 
## logical   93471   80030
## Number of obs for which HomeAge==1 (or purchased within the last year)
print(summary(Interview$HomeAge==1))
##    Mode   FALSE    TRUE 
## logical  162986   10515
## Number of obs for which HomeAge==2 (or purchased within last 2 years)
print(summary(Interview$HomeAge==2))
##    Mode   FALSE    TRUE 
## logical  167200    6301

The summary statistics and distribution show that HomeAge is right-skewed. How many quotes are within the first year of homeownership or second year of homeownership?

NewHome01 <- Interview[ which(Interview$HomeAge>=0
& Interview$HomeAge < 1), ]

NewHome02 <- Interview[ which(Interview$HomeAge>=0
& Interview$HomeAge < 2), ]


print(nrow(NewHome01)/nrow(Interview))
## [1] 0.4612654
print(nrow(NewHome02)/nrow(Interview))
## [1] 0.5218702

When I focus specifically on the first year of home ownership, I find that approximately 46.1 percent of quotes take place within the first year of purchasing. This is understandable as there is seasonality in the housing market, where spring and summer have a greater number of houses sold and winter tails off.

Expanding this a little farther, I find further that over half of quotes–approximately 52.8 percent–take place within the first two years of purchasing. Note that this time range starts at 0 and does not include negative years as those circumstances (assuming property development, etc) are likely to have exogenous variation or differences that would be unaccounted for.

Possible Causes of Homeowners’ Seasonality

I can also test possible causes of seasonality through a regression model. Putting it simply, I want to see the effect of year of homeownership on the month/quarter that the policy is quoted. I expect that new owners, who most frequently buy in spring and summer, will be quoted policies. I suspect also that the coefficient or effect of homeownership year will be greater in quarters 2 and 3 (where Q2==1, Q3==1).For the outcome variable–the time of the year when the policy is quoted–I can use a variety of time variables, including dummy variables in specific months or quarters that I am interested in.

I will exclude relevant covariates for the models below.

## Using Months (1-12)
regM_1 = glm(formula = `Month.x` ~ HomeAge, data = Interview)
print(summary(regM_1))
## 
## Call:
## glm(formula = Month.x ~ HomeAge, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.3516  -2.3516  -0.3172   2.6656   8.1677  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.3516367  0.0088997 713.693  < 2e-16 ***
## HomeAge     -0.0043076  0.0005749  -7.493 6.79e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 10.90105)
## 
##     Null deviance: 1891934  on 173500  degrees of freedom
## Residual deviance: 1891322  on 173499  degrees of freedom
## AIC: 906848
## 
## Number of Fisher Scoring iterations: 2
## Using Quarters (Categorical, 1-4)
regQ_1 = glm(formula = `Quarter.x` ~ HomeAge, data = Interview)
print(summary(regQ_1))
## 
## Call:
## glm(formula = Quarter.x ~ HomeAge, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4583  -0.4571  -0.4323   0.5642   2.5124  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.4571186  0.0029305 838.463  < 2e-16 ***
## HomeAge     -0.0011837  0.0001893  -6.253 4.04e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.181966)
## 
##     Null deviance: 205116  on 173500  degrees of freedom
## Residual deviance: 205070  on 173499  degrees of freedom
## AIC: 521384
## 
## Number of Fisher Scoring iterations: 2
## By Quarter

    ## Individual Quarter Dummies
      # Quarter 1
        regQ1_1 = glm(formula = Q1 ~ HomeAge, data = Interview, family = binomial)
        print(summary(regQ1_1))
## 
## Call:
## glm(formula = Q1 ~ HomeAge, family = binomial, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9489  -0.7508  -0.7401  -0.7383   1.6929  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.1549745  0.0065030 -177.61   <2e-16 ***
## HomeAge      0.0054901  0.0004584   11.97   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 193924  on 173500  degrees of freedom
## Residual deviance: 193756  on 173499  degrees of freedom
## AIC: 193760
## 
## Number of Fisher Scoring iterations: 4
      # Quarter 2
        regQ2_1 = glm(formula = Q2 ~ HomeAge, data = Interview, family = binomial)
        print(summary(regQ2_1))
## 
## Call:
## glm(formula = Q2 ~ HomeAge, family = binomial, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8229  -0.8193  -0.8052   1.5842   3.3919  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.9192408  0.0062801 -146.37   <2e-16 ***
## HomeAge     -0.0050950  0.0004906  -10.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 205118  on 173500  degrees of freedom
## Residual deviance: 204999  on 173499  degrees of freedom
## AIC: 205003
## 
## Number of Fisher Scoring iterations: 4
      # Quarter 3
        regQ3_1 = glm(formula = Q3 ~ HomeAge, data = Interview, family = binomial)
        print(summary(regQ3_1))
## 
## Call:
## glm(formula = Q3 ~ HomeAge, family = binomial, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7712  -0.7701  -0.7664   1.6493   2.3332  
## 
## Coefficients:
##               Estimate Std. Error  z value Pr(>|z|)    
## (Intercept) -1.0635882  0.0063312 -167.993  < 2e-16 ***
## HomeAge     -0.0015902  0.0004519   -3.519 0.000433 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 196835  on 173500  degrees of freedom
## Residual deviance: 196821  on 173499  degrees of freedom
## AIC: 196825
## 
## Number of Fisher Scoring iterations: 4
      # Quarter 4
        regQ4_1 = glm(formula = Q4 ~ HomeAge, data = Interview, family = binomial)
        print(summary(regQ4_1))
## 
## Call:
## glm(formula = Q4 ~ HomeAge, family = binomial, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7774  -0.7059  -0.7055  -0.7055   1.7396  
## 
## Coefficients:
##               Estimate Std. Error  z value Pr(>|z|)    
## (Intercept) -1.2639358  0.0064814 -195.011   <2e-16 ***
## HomeAge      0.0002199  0.0004124    0.533    0.594    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 183085  on 173500  degrees of freedom
## Residual deviance: 183085  on 173499  degrees of freedom
## AIC: 183089
## 
## Number of Fisher Scoring iterations: 4
## Quarters Grouped (2 & 3) 
    regQ23_1 = glm(formula = Q23 ~ HomeAge, data = Interview, family = binomial)
    print(summary(regQ23_1))
## 
## Call:
## glm(formula = Q23 ~ HomeAge, family = binomial, data = Interview)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.253  -1.244   1.108   1.110   3.107  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.1654571  0.0056380   29.35   <2e-16 ***
## HomeAge     -0.0049842  0.0004201  -11.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 239786  on 173500  degrees of freedom
## Residual deviance: 239629  on 173499  degrees of freedom
## AIC: 239633
## 
## Number of Fisher Scoring iterations: 3
    regQ123_1 = glm(formula = Q123 ~ HomeAge, data = Interview, family = binomial)
    print(summary(regQ123_1))
## 
## Call:
## glm(formula = Q123 ~ HomeAge, family = binomial, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7396   0.7055   0.7055   0.7059   0.7774  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.2639358  0.0064814 195.011   <2e-16 ***
## HomeAge     -0.0002199  0.0004124  -0.533    0.594    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 183085  on 173500  degrees of freedom
## Residual deviance: 183085  on 173499  degrees of freedom
## AIC: 183089
## 
## Number of Fisher Scoring iterations: 4

If I include relevant controls to add to this model, I can account for various factors that influence one’s season of the policy quote but also their housing purchasing behavior. Some relevant controls that I am given are the age and the credit score of the homeowner.

I create age dummies for the 30s and 40s as homeowner quotes are frequently observed among these age groups. I do not have a strong opinion of how being within these age groups will affect seeking a quote in a season.

I also create a dummy for Poor Credit, which is equal to 1 if the given Score is less than or equal to 579. I hypothesize that poor credit is negatively correlated with insurance quotes, particularly in this case as the sample seems to be skewed towards homeownership–which is typically dependent upon a credit score threshold.

## Age
    Interview$Age30s <- ifelse(Interview$`Homeower Age`>=30 & Interview$`Homeower Age`<40,1,0)
    Interview$Age40s <- ifelse(Interview$`Homeower Age`>=40 & Interview$`Homeower Age`<50,1,0)

## Poor Credit    
    Interview$PoorCredit <- ifelse(Interview$Score<=579, 1, 0)
    
    
## Quarter Categorical
    regQ_2 = glm(formula = Quarter.x ~ HomeAge + Age30s + Age40s + PoorCredit, data = Interview)
    print(summary(regQ_2))
## 
## Call:
## glm(formula = Quarter.x ~ HomeAge + Age30s + Age40s + PoorCredit, 
##     data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4798  -0.4798  -0.4210   0.5745   2.7340  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.4797701  0.0043230 573.617  < 2e-16 ***
## HomeAge     -0.0014820  0.0002001  -7.407 1.30e-13 ***
## Age30s      -0.0528263  0.0068607  -7.700 1.37e-14 ***
## Age40s      -0.0211524  0.0066851  -3.164 0.001556 ** 
## PoorCredit  -0.0283520  0.0078462  -3.613 0.000302 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.17692)
## 
##     Null deviance: 195680  on 166165  degrees of freedom
## Residual deviance: 195558  on 166161  degrees of freedom
##   (7335 observations deleted due to missingness)
## AIC: 498634
## 
## Number of Fisher Scoring iterations: 2
    ## Quarters Grouped/Dummy (2 & 3) 
          regQ23_2 = glm(formula = Q23 ~ HomeAge + Age30s + Age40s + PoorCredit, data = Interview, family = binomial)
          print(summary(regQ23_2))
## 
## Call:
## glm(formula = Q23 ~ HomeAge + Age30s + Age40s + PoorCredit, family = binomial, 
##     data = Interview)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.263  -1.244   1.096   1.109   2.980  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.1636300  0.0084139  19.448  < 2e-16 ***
## HomeAge     -0.0046178  0.0004538 -10.176  < 2e-16 ***
## Age30s       0.0302781  0.0128866   2.350  0.01879 *  
## Age40s       0.0265197  0.0124665   2.127  0.03340 *  
## PoorCredit  -0.0403949  0.0144937  -2.787  0.00532 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 229546  on 166165  degrees of freedom
## Residual deviance: 229377  on 166161  degrees of freedom
##   (7335 observations deleted due to missingness)
## AIC: 229387
## 
## Number of Fisher Scoring iterations: 3

Seasonality on Production (Conversion)

Conversion is dependent on whether the policy is filed. The outcome variable is the Written Indicator, where a value of ‘1’ indicates that State Farm wrote/converted the policy. I capture seasonality in my approach through the Quarter variable. This yields insignificant coefficients with the categorical variable and strongly significant effects for the dummy for Quarters 2 & 3 only. Based on this and other statistics, I elect to proceed with the dummy variable for Quarters 2 & 3 only.

## Quarter Categorical
policy_reg = glm(formula = `Written Indicator` ~ Quarter.x, data = Interview, family = binomial)
    print(summary(policy_reg))
## 
## Call:
## glm(formula = `Written Indicator` ~ Quarter.x, family = binomial, 
##     data = Interview)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.157  -1.155  -1.154   1.200   1.201  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.058069   0.011836  -4.906  9.3e-07 ***
## Quarter.x    0.002200   0.004418   0.498    0.618    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 240403  on 173500  degrees of freedom
## Residual deviance: 240403  on 173499  degrees of freedom
## AIC: 240407
## 
## Number of Fisher Scoring iterations: 3
## Quarters 2 & 3
policy_regQ23_1 = glm(formula = `Written Indicator` ~ Q23, data = Interview, family = binomial)
    print(summary(policy_regQ23_1))
## 
## Call:
## glm(formula = `Written Indicator` ~ Q23, family = binomial, data = Interview)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.192  -1.192  -1.114   1.163   1.243  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.151910   0.007043  -21.57   <2e-16 ***
## Q23          0.186090   0.009639   19.31   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 240403  on 173500  degrees of freedom
## Residual deviance: 240030  on 173499  degrees of freedom
## AIC: 240034
## 
## Number of Fisher Scoring iterations: 3

Next, I explore two control variables that I believe to be relevant.

The first is Agent which is equal to ‘1’ if a State Farm Agent is the ‘First Point of Contact’. I expect that if an Agent is the FPOC, then there will be higher likelihood (a positive coefficent) of the policy being converted by State Farm.

The second is Home/Auto Discount, which is equal to ‘1’ if the quote received a discount for also quoting auto insurance. I expect that if auto insurance is also quoted, then there is greater likelihood (a positive coefficient) of the policy being converted by State Farm.

## Agent
policy_regQ23_2 = glm(formula = `Written Indicator` ~ Q23 + Agent, data = Interview, family = binomial)
    print(summary(policy_regQ23_2))    
## 
## Call:
## glm(formula = `Written Indicator` ~ Q23 + Agent, family = binomial, 
##     data = Interview)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.223  -1.145  -0.886   1.132   1.500  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.732533   0.015721  -46.60   <2e-16 ***
## Q23          0.184185   0.009689   19.01   <2e-16 ***
## Agent        0.655101   0.015701   41.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 240403  on 173500  degrees of freedom
## Residual deviance: 238215  on 173498  degrees of freedom
## AIC: 238221
## 
## Number of Fisher Scoring iterations: 4
## Discount
policy_regQ23_3 = glm(formula = `Written Indicator` ~ Q23 + `Home/Auto Discount`, data = Interview, family = binomial)
    print(summary(policy_regQ23_3))        
## 
## Call:
## glm(formula = `Written Indicator` ~ Q23 + `Home/Auto Discount`, 
##     family = binomial, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2484  -1.1703  -0.7742   1.1081   1.6438  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -1.051290   0.015393  -68.30   <2e-16 ***
## Q23                   0.182035   0.009775   18.62   <2e-16 ***
## `Home/Auto Discount`  1.034531   0.015374   67.29   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 240403  on 173500  degrees of freedom
## Residual deviance: 235057  on 173498  degrees of freedom
## AIC: 235063
## 
## Number of Fisher Scoring iterations: 4
## Discount and Auto
policy_regQ23_4 = glm(formula = `Written Indicator` ~ Q23 + Agent + `Home/Auto Discount`, data = Interview, family = binomial)
    print(summary(policy_regQ23_4))   
## 
## Call:
## glm(formula = `Written Indicator` ~ Q23 + Agent + `Home/Auto Discount`, 
##     family = binomial, data = Interview)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2617  -1.1835  -0.6894   1.1713   1.7627  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -1.315931   0.019225  -68.45   <2e-16 ***
## Q23                   0.181597   0.009791   18.55   <2e-16 ***
## Agent                 0.393929   0.016513   23.86   <2e-16 ***
## `Home/Auto Discount`  0.936402   0.015882   58.96   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 240403  on 173500  degrees of freedom
## Residual deviance: 234479  on 173497  degrees of freedom
## AIC: 234487
## 
## Number of Fisher Scoring iterations: 4

When these two controls are included in the estimation of policy conversion, I find that the Quarters 2 & 3, Agent, and Home/Auto Discount dummies have a positive, significant effect on a policy conversion outcome. Specifically:

  • A homeowner policy quote in either Q2 or Q3 makes an individual 18.2 percentage points more likely to have their policy converted by State Farm.
  • A homeowner policy quote with a State Farm Agent as the First Point of Contact makes an individual 39.4 percentage points more likely to have their policy converted by State Farm.
  • A homeowner policy quote with a discount for also quoting auto insurance makes the individual 93.6 percentage points more likely to have their policy converted by State Farm.

Company Strategy

My findings in Rstudio suggest that Q2 and Q3 are the most profitable for State Farm due to the increased quote volume. The month of October appears to be an outlier while Q4 shows a decrease overall. I find that these patterns reflect a correlation with the housing market with the statistically significant association to recent homeownership and general research of housing market seasonal patterns.

The housing market is exogenous to State Farm, but company strategy could focus on controllable variables that would return greater profits. One suggestion is to potentially increase the number of home and auto discounts as there is a very strong, significant effect of these on the policy being converted by State Farm. This would, in turn, boost auto quote volume in Q2 and Q3, which shows room for improvement as they are visually ‘deflated’ compared to those that I see within homeowners’ quotes.

Conclusion

In conclusion, I find that:

  • Unlike homeowners’ quotes, seasonality varies by month in the auto insurance market, although auto insurance discounts have positive, significant effects on policy conversion for homeowners.
  • Both auto and homeowners’ insurance see a drastic decrease occurring within Q4, from October to December.
  • Homeowner quotes are greatest in Q2 and Q3, which mimics the trends of the housing market.
  • Policies are also 18 percentage points (p < 0.001) more likely to be converted in Q2 and Q3.
  • The majority (52.8 percent) of homeowner quotes occur within the first 2 years of purchasing a home. Just under half (46.1 percent) occur within the first year.
  • A homeowner policy quote in either Q2 or Q3 makes an individual 18.2 percentage points (p < 0.001) more likely to have their policy converted by State Farm.
  • A homeowner policy quote with a State Farm Agent as the First Point of Contact makes an individual 39.4 percentage points (p < 0.001) more likely to have their policy converted by State Farm.
  • Having a discount on Home and Auto insurance increases the likelihood of State Farm converting the policy by 93.6 percentage points (p < 0.001).

Further research could employ other models in regards to this context–particularly in regards to a two-stage approach as conversion is dependent on whether the policy is filed. I can predict whether a policy is filed using an Instrumental Variable approach, and then see how this influences conversion. This would be a two-stage approach which would mean: Written Indicator is a function of the Quarter, which is a function of the Year of Homeownership, Age Dummies, and Poor Credit Dummy. Another alternative would be a Mixed Effects model.