9/25/2019

Textbook discusses…

  • Linear Regression
    • This we understand—best linear fit
  • Logistic Regression
    • Convert binary classification into a linear regression problem via the logistic function
  • Singular-Value Decomposition
    • Originally developed for data which is linearly separable in higher dimensions
      • Increase the number of dimensions to \(n\) where one can find a hyperplane of dimension \(n-1\) that separates the data as needed
    • Has been extended to non-linearly separable sets via a hinge function
      • Hinge loss is a continuous linear function, but not differentiable

Note the similarities…

  • If you really know how to use a hammer
    • you want everything to look like a nail
  • Statisticians REALLY KNOW linear modeling
    • So try to convert problems into some form of linear model
  • YOU get a linear model, and YOU get a linear model, and YOU get a linear model, EVERYBODY GETS A LINEAR MODEL!!

CC 2.0 BY - https://www.flickr.com/photos/aphrodite-in-nyc

Linear-type models work well with…

  • Known dependent variable

  • Many independent variables (features)

  • Classifying or describing in-sample

  • Model can be coerced into linear format

    • LM \(\rightarrow\) GLM \(\rightarrow\) GLMM \(\rightarrow\) GAM

What about…

  • Modeling outcomes with no clear or clean dependent/independent relationships
    • Hurricane-caused flood damage in Florida gulf coast

    • Property damage due to electrical power plant turbine explosion

    • Corporate liability due to faulty products

  • Other kinds of modeling are necessary

Non-linear-regression based modeling techniques

Sample Data

Here is a summary of 25 synthetic years of commercial liability losses suffered by XYZ Manufacturing last year (assume brought to common rate and exposure level…)

##        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
##   "194,994"   "531,144"   "907,272"   "948,864" "1,262,869" "2,386,361"

Histogram

Question

  • If the XYZ wants to put aside money to cover their expected loss 99.5% of the time, how much do they need to reserve?

  • Not enough other information to relate to independent features like policyholder income or state of domicile

  • Need new methods to estimate probabilities

  • The following slides are a bit of a simplification 8-)

Method of Moments (MoM)

  • Select a probability distribution and solve for parameters which match the first \(n\) moments as needed

  • Normal Distribution:

    • \(\hat{\mu} = \bar{x}\) and \(\hat{\sigma^2} = \bar{v}\) where this is the biased empirical variance
    • \(\hat{\mu}\): 948,864; \(\hat{\sigma}\): 523,150
  • Gamma Distribution

    • \(f(x) = \frac{x^{\alpha - 1}}{\Gamma(\alpha)\theta^\alpha}e^{-\frac{x}{\theta}}\)
    • Shape = \(\alpha\) = 3.28969; Scale = \(\theta\) = 288,436
    • \(\hat{\mu}\): 948,864; \(\hat{\sigma}\): 523,150

Maximum Likelihood Estimation (MLE)

Find the parameters that maximize the total likelihood of the observed data for the given distribution

  • A likelihood is the value of the probability density function at the observed data point
  • A data sets likelihood is the product of the likelihoods of the observations
  • This gets very hard to calculate very quickly due to most values being well below zero
  • Logarithms to the rescue!
    • The log of a product is the sum of the logs of the components
  • Most common procedure is to minimize sum of negative log likelihoods
    • Most non-linear optimizers are calibrated to minimize and objective function

MLE Example

  • Normal Distribution:
    • Can be shown that the MLE = MoM
    • \(\hat{\mu}\): 948,864; \(\hat{\sigma}\): 523,150
  • Gamma Distribution
    • Shape = \(\alpha\) = 3.35264; Scale = \(\theta\) = 283,020
    • \(\hat{\mu}\): 948,867; \(\hat{\sigma}\): 518,217

Tabular Comparison

Distribution Mean Loss SD of Loss
Empirical 948,864 533,938
Normal 948,864 523,150
Gamma MoM 948,864 523,150
Gamma MLE 948,867 518,217

Graphical Comparison

Observations

  • Gamma options have thicker tail than Normal
  • In this case, MLE returned thicker tail than MoM for Gamma
    • Does tail thickness matter when trying to predict open-ended liabilities which haven’t happened yet?
  • Neither family fits that well
    • A lot of parameter risk with only 25 observations
    • Try other distributional families
    • Use various goodness-of-fit measures to select best distribution
      • Information Criteria
      • Q-Q plots
      • Cramer-Von Mises
      • Anderson-Darling

Other Methods

  • Method of Maximum Spacing
    • Based on Probability Integral Transform
    • True cumulative distribution function (CDF) should have uniform distribution of quantiles so observations should be as evenly spread out as possible
    • This is equivalent to maximizing the geometric mean of the spacings
  • Bayesian Hierarchical Models
    • Not only picks family and set of starting parameters but also distribution around those parameters (the prior)
    • Uses the data to adjust distribution of those parameters using Bayes Law (the posterior)
    • Uses Markov Chain Monte Carlo to estimate otherwise intractable integrals
    • Can use each draw from posterior parameter distribution to generate observation
    • Final collection of observations is sample from posterior predictive distribution

What about that reserve fund?

Modeled vs. “Actual” 99.5%-ile

Simulated is 99.5%-ile of 10M simulations from true generating distribution
Distribution 99.5%-ile Estimated Error
Normal 2,296,410 -11.6%
Gamma MoM 2,820,494 8.59%
Gamma MLE 2,798,208 7.73%
Simulated 2,597,313

PDF comparison with true distribution

True distribution was a Weibull with shape = 2 and scale \(\approx\) 1,128,379

Questions?