Abstract

This is a simple study to analyze the process of fitting probability distribution models to the severity of insurance claims in a set of automobile insurance claims data. The data set used in this work is found in the R package, “insuranceData”, and is called dataCar. It is a data set based on one-year vehicle insurance policies taken out in 2004 or 2005. There are 67,856 policies in the data set with 4,624 claims greater than 200 (we do not have information on the currency used).

A policy in this data set may have more than one claim or no claims at all. Insurance, by design, needs to have sufficient capital to pay out claims from the premium paid by policyholders and therefore needs to be able to accurately predict the number and cost of losses.

Auto claims data poses two obvious problems: finding the right probability distribution to fit the data and having a way to benchmark the fit to know how “correct” it is with some quantitative supporting information.

The ability to accurately evaluate claim frequency and loss severity affects the ability to accurately define how much premium to charge policyholders, how much reserves to hold, how profitable the insurer is, and how any policy modifications might impact the insurer.

The study examines the fitting of probability distributions to the cost of claims using a few selected probability distributions to the claims data and use the maximum likelihood method to estimate parameters. We then applied the Kolmogov-Smirnov and Anderson-Darling tests to the claim severity distributions.

The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are used to chose between competing distributions.

Empirical results show that the data is better modeled using skewed and heavy-tailed distributions. We find that the lognormal model fit the size of loss data best while the negative binomial and the geometric distributions worked best for the frequency of claims data best.

1. Introduction

The “big” idea of insurance is to create a fund from the monies collected from the customer, called policyholders here, that are sufficient enough to pay out the cost of insurance claims made by policyholders due to covered losses they incur.

In order for the insurance company to succeed it needs to charge enough to pay out those claims in a given time period, say, one year, but not too much, in which case the policyholders take their business elsewhere. Because some, hopefully most, policyholders will not incur any losses in the given time period, this is a risk sharing where the “less risky” policyholders, in essence subsidize the “more risky” policyholders where the ability to define a risky policyholder in advance of a loss occurring is limited due to our unfortunate inability to accurately predict the future.

The auto insurance industry has a number of risk events to manage including theft and collision accidents and also the cost of injuries to involved parties and property damage beyond the involved automobiles. Managing the risk involves collecting and effectively analyzing data about the policyholders and their behavior patterns as well as the insured vehicles and driving conditions in a way that can assist in predicting the future patterns of claims.

Claims modeling follows a general approach that separates the number of claims an insured might incur from the cost or claim size that occurs. We call these claim frequency and claim severity and both are random variables whose past behavior can and likely will deviate from future patterns. Taken together they form what is referred to as aggregate claims data. Claims frequency is the total count of claims that occurs with a set of customer policies while claim severity is the monetary loss from each policy or on the set of customer policies. The random variables are assumed to follow certain probability distributions, referred to as loss distributions. These loss distributions are typically skewed to the right with relatively high right-tailed probabilities.

We forgo analyzing the frequency of claims and just limit the scope of analysis to the severity of claims in this paper.

2. Modeling

The modeling process as described by Klugman, et al, in the course textbook has several stages:

We are concerned with loss distribution modeling where the approach generally involves modeling aggregate losses by separately fitting a frequency distribution to the number of losses and a severity distribution to the size of the losses. The estimated aggregate loss distribution combines the two by a convolution process. Discrete distributions are used for frequency distributions and continuous distributions for losses and claim sizes.

2.1 Discussion of Distributions used to Fit the Data

Claim severity is best modeled using non-zero continuous distributions which are skewed to the right and which have heavy tails. Intuitively, this makes sense as most automobile accident claims are going to be relatively minor and the occurrence of very expensive claims is infrequent.

There are ample research and data that bear this out; therefore’ selecting probability distributions that have these characteristics is an appropriate approach to modeling these claims.

Here, I have selected a number of standard probability distributions to approximate the distributions for claim size. There are many other possible distributions including custom distributions that could be combinations of standard distributions or creating using some other process known to the modeler. Here we limited the exploration to the exponential, gamma, Weibull, Pareto, and the lognormal distributions .

2.2 Parameter Estimation

The parameters for the selected distributions are found using the maximum likelihood estimations (MLE) technique. There are several approaches that can be used, including the method of moments, the quantile method, and the least squares estimation but the MLE is arguably the approach that best uses all the information in the data to inform the parameters and the estimator is considered to be very flexible with better asymptotic properties than alternative approaches.

The MLE is defined as choosing an asymptotically efficient estimator for a parameter, or parameters, \(\hat{\theta}\), as a value for the unknown true parameter that makes the observed data most likely. So, if we have a random sample of independent and identically distributed (iid) observations drawn from some unknown population and let \(X = x\) be the realization of some random variable or vector with a probability mass or density function \(f(x; \theta)\); where \(\theta\) is a scalar or a vector with unknown properties that we want to discover by estimation then the goal is to find a way to best infer \(\theta\) from the known data. The MLE finds the likelihood function for a random variable. The likelihood function \(L(\theta)\) is the probability mass or density function of the observed data \(x\), expressed as a function of \(\theta\).

The likelihood function of \(\theta\) is defined by the form

\[L(\theta)=L(\theta|x_1, x_2, ..., x_n) = f(x_1, x_2, \cdots x_n) = \prod_{i=1}^n f(x_i|\theta)\]

The principle of maximum likelihood creates a path to select a value for an estimator of the parameter or parameters in question, \(\hat{\theta}\) that makes the observed data “most likely”.

Maximizing the likelihood function \(L(\theta)\),

\[\hat{\theta}(x) = \text{ arg }\underset{\theta}{max} L(\theta)\]

To ease the computational aspects and because the log of the likelihood is a monotonically decreasing function of \(X\), we typically maximize the log \(L(\theta)\).

\[l(\theta) = \text{ ln }L(\theta) = \text{ ln }\prod_{i=1}^n f(x_i|\theta) = \sum_{i=1}^n \text{ ln }f(x_i|\theta)\]

For each of the probability distributions under consideration here, we detail the MLE for each distribution used below.

2.3 Assessing the Models

We use goodness-of-fit (GoF) tests to describe how well a distribution fits a set of data. For continuous data, the Kolmogorov-Smirnov (K-S), Anderson-Darling (A-D), and Cramer-von Mises test statistics are generated by the R packages and functions used but we are going to asses just the first two of these.

The K-S and A-D use the cumulative distribution function and are used to test the null hypothesis that the unknown distribution is actually the known specified distribution being assessed.

For the GoF test we have the following hypotheses:

2.3.1 The Kolmogorov- Smirnov Test

The K-S test compares a fitted cdf \(\hat{F}(x)\) with and empirical cdf \(F_n(x)\) to assess the GoF of a given data set to a theoretical distribution. The cumulative distribution function (cdf) characterizes a distribution in a unique way.

The test statistic measures the maximum absolute distance between the empirical and fitted cdfs and computes a test statistic \(D\) as

\[D=\underset{t\leq x \leq u}{max}|F_n(x)-\hat{F}(x)|\]
where \(t\) is the left truncation point and \(u\) is the upper bound if there is any censoring of the data.

2.3.2 The Anderson-Darling Test

The A-D test uses a test statistic that is a weighted average of the squared differences between the empirical and the theoretical model cdfs. The test statistic is simplified as

\[A^2 = -n\hat{F}(u)+n\sum_{i=0}^n[1-F_n(x_i)]^2(\text{ ln }[1-\hat{F}(x_i)]-\text{ ln }[1-\hat{F}(x_{i=1})])+n\sum_{i+1}^n F_n(x_i)^2[\text{ ln }\hat{F}(x_{i+1}) - \text{ ln }\hat{F}(x_i)]\]
where the unique non-censored data points are \(t=x_0<x_1<...<x_i<x_{i+1} = u\). When \(u=\infty\) the last term of the first sum is zero. The critical values at \(1.933, 2.492, 3.857\) for the \(10\)%, \(5\)%, and \(1\)% significance levels, respectively.

2.3.3 Akaike’s Information Criterion

The Akaike’s Information Criterion (AIC) is a tool for distribution selection. If several distributions are being fit to a given data set the smallest AIC will represent the most appropriate distribution for modeling. The AIC is defined as

\[AIC=-2\text{ l }(\hat{\theta}|x)+2k\]

where \(\text{ l }(\theta)\) is the loglikelihood function, \(\hat{\theta}\) is the estimated parameter of the model, \(x\) is the empirical data, and \(k\) is the length of the parameter vector. The first part is the measure of GoF and the second is a penalty term that penalizes the complexity of the model.

Bayesian Information Criterion

The Bayesian Information Criterion (BIC) is made of the number of observations in the penalty term. The penalty for additional parameters is stronger in the BIC. They are similar though and the form for the BIC is

\[-2\text{ l }(\hat{\theta}|x)+2\text{ log }(n)\]

For both the AIC and the BIC the smallest value implies the best fit.

2.3 Probability Distributions Used to Fit Severity Data

2.3.1 Exponential Distribution

The exponential distribution is a continuous distribution used commonly to model the time until the occurrence of some event. A continuous random variable \(X\) has an exponential distribution if the probability density function (pdf) is given by:

\[f_X(x) = \lambda e^{-\lambda x}, x > 0\] Where \(\lambda\) is the rate parameter of the distribution. This is a special case of the gamma and Weibull distributions.

The likelihood function is:

\[L(\lambda)=\prod_{i=1}^n \lambda e^{\lambda x_i}=\lambda^n exp \big(-\lambda \sum{i=1}^n x_i \ big)\]

and the log-likelihood is therefore

\[l(\lambda)=n \text{ ln }\lambda - \lambda \sum_{i=1}^n x_i\] We find the parameter \(\hat{\lambda}\) by setting the derivative of the log-likelihood equal to zero and solving for \(\hat{\lambda}\). The resulting MLE is:

\[\frac{n}{\lambda}-\sum_{i=1}^n x_i = 0 \implies \hat{\lambda} = 1/\bar{x} \text{ where } \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i \] \(\hat{\lambda} = \frac{1}{\bar{x}}\) and \(\bar{x}\) is the mean of the sample data.

2.1.2 Gamma Distribution

The gamma distribution is an extension of the exponential distribution, along with the Weibull. The gamma distribution is expressed in terms of the gamma function

\[\Gamma(x)=\int_0^\infty t^{x-1} e^{-t} dt, x > 0\]

We are concerned with the two parameter gamma distribution function given by

\[f_X(x, \alpha, \beta)=\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1} e^{\beta/x}, x>0\] where \(\alpha>0\) is the shape parameter and \(\beta>0\) is the scale parameter. When \(\alpha=1\) the gamma distribution becomes the exponential distribution and then \(\alpha=n/2\) and \(\theta=2\) the gamma becomes the chi-square distribution with \(n\) degrees of freedom.

The likelihood function for the gamma distribution is given by

\[L(\alpha, \beta)=\prod_{i=1}^n \frac{\beta^\alpha}{\Gamma(\alpha)} x_i^{\alpha-1} e^{-\beta x}\] The log-likelihood function is then

\[l(\alpha, \beta)=n(\alpha \text{ ln } \beta - \text{ ln } \Gamma(\alpha))+(\alpha-1)\sum_{i=1}^n \text{ ln }x_i - \beta \sum_{i=1}^n x_i\] Taking the partial derivatives with respect to \(\alpha\) and \(\beta\), set equal to zero, respectively, and solving the resulting equations gives

\[\text{ with respect to } \alpha, n\big(\text{ ln }\beta-\frac{d}{d\alpha}\text{ ln }(\hat{\alpha})\big)+\sum_{i=1}^n \text{ ln }x_i = 0\] and

\[ \text{ with respect to } \beta, n\frac{\hat{\alpha}}{\hat{beta}}-\sum_{i=1}^n x_i \text{ which results in }\bar{x}=\frac{\hat{\alpha}}{\hat{\beta}}\] Performing the substitution, \(\hat{\beta}=\hat{\alpha}/\bar{x}\) we have the formula to solve for \(\hat{\alpha}\)

\[n\big( \text{ ln }\hat{\alpha}-\text{ ln }\bar{x}-\frac{d}{dx} \text{ ln }\Gamma(\hat{\alpha})+\text{ ln }\bar{x}\big)=0\]

This is a nonlinear equation in \(\hat{\alpha}\) that doesn’t have a closed form solution and must be approximated with a numerical methods approach.

2.1.3 Weibull Distribution

The Weibull distribution has two parameters to determine locations and shape and both are positive constants. The Weibull distribution is widely used in reliability modeling in weather forecasting, earthquake prediction, life expectancy modeling and general insurance claims.

The probability density of the Weibull distribution is

\[f_X(x, \gamma, \beta)=\frac{\gamma}{\beta^\gamma}x^{\gamma-1}exp\big(-(\frac{x}{\beta})^\gamma\big); x>0, \gamma>0, \beta>0\] where \(\gamma\) is the shape parameter and \(\beta\) is the scale parameter. When \(\gamma=1\), the Weibull become the exponential distribution with \(\lambda=\beta\).

The likelihood function for the Weibull distribution is given by

\[L(\gamma, \beta)=\prod_{i=1}^n\big(\frac{\gamma}{\beta}\big) \big(\frac{x_i}{\beta}\big)^{\gamma-1} exp\big(-(\frac{x_i}{\beta})^\gamma) \big)\] Then the loglikelihood function is \[l(\gamma, \beta)=n\text{ ln }\gamma-n\gamma\text{ ln }\beta+(\gamma-1)\sum_{i=1}^n\text{ ln }x_i-\sum_{i=1}^n\big(\frac{x_i}{\beta}\big)^\gamma\] Find the partial derivative for the shape and scale parameters, \(\gamma\) and \(\beta\) and set the result equal to zero.

With respect to gamma, we have

\[\frac{n}{\gamma}-n\text{ ln }\beta+\sum_{i=1}^n\text{ ln }x_i-\frac{1}{\beta^\gamma}\sum_{i=1}^n x_i^\gamma\text{ ln }x_i=0\]

and, with respect to beta, we have

\[-\frac{n\gamma}{\beta}+\frac{\gamma}{\beta^{\gamma-1}}\sum_{i=1}^n x_i^\gamma=0\]

We can eliminate \(\beta\) and simplify the equation to arrive at a non-linear equation that allows \(\hat{\gamma}\) to be estimated using a numerical optimization approach

\[\frac{\sum_{i=1}^nx_i^\gamma \text{ ln }x_i}{\sum_{i=1}^n x_i^\gamma}-\frac{1}{\gamma}-\frac{1}{n}\sum_{i=1}^n\text{ ln }x_i=0\]

The estimate for beta is given by \(\hat{\beta} = \frac{1}{n}\sum_{i=1}^n x_i^\gamma\).

2.1.4 Log-Normal Distribution

The lognormal distribution is commonly used in analyzing natural events. Observable natural growth processes tend to follow the building up of a large number of small occurrences which become additive on a log scale. Because it is the transformation of the normal distribution through exponentiation some of the properties useful in the normal distribution can be used in the lognormal distribution. Severity modeling involves right skewed data that the normal distribution is not well suited for but the lognormal distribution can deal with this data.

If the logarithm of a random variable \(X\) is normally distributed, then the positive random variable \(X\) is log-normally distributed with the probability density function given by

\[f_X(x; \mu, sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}exp\big(-\frac{(\text{ ln }x-\mu)^2}{2\sigma^2}\big), x>0\]

Here we have two parameters, the location parameter \(\mu\) and the scale parameter \(\sigma>0\).

The likelihood function is

\[L(\mu, \sigma)=\prod_{i=1}^n\frac{1}{\sqrt{2\pi\sigma^2}}exp\big(-\frac{(\text{ ln }x-\mu)^2}{2\sigma^2}\big)\]

And the loglikelihood function is

\[l(\mu \sigma)=-n\text{ ln }\sigma-\frac{n}{2}\text{ ln }2\pi-\sum_{i=1}^n\big[\text{ ln }x_i+\frac{1}{2\sigma^2}(\text{ ln }x_i-\mu)^2\big]\]

If we set the partial derivates equal to zero and solve for the parameters as

\[\frac{\partial\text{ ln } L(\mu, \sigma)}{\partial \mu} = 0, \text{ and } \frac{\partial\text{ ln }L(\mu, \sigma)}{\partial \sigma}=0\]

then we have estimates

\[\hat{\mu}=\frac{1}{n}\sum_{i=1}^n\text{ ln }x_i, \text{ and } \hat{\sigma}=\sqrt{\frac{1}{n}\sum_{i=1}^n(\text{ ln }x_i-\hat{\mu})^2}\]

respectively.

2.1.5 Pareto Distribution

The Pareto distribution is a right skewed and heavy tailed distribution where the survival function decays slowly to zero. These attributes make it useful for many economic, financial, and insurance modeling applications.

The Pareto distribution with parameters \(\alpha>0\) and \(\gamma>0\) is given by

\[f_X(x; \alpha, \gamma)=\frac{\alpha \gamma^\alpha}{(x + \gamma)^{\alpha+1}}, x\geq 0\]

where \(\alpha>0\) represents the shape parameter and \(\gamma>0\) the scale parameter.

The likelihood function is

\[L(\alpha, \gamma)=\prod_{i=1}^n\frac{\alpha\gamma^\alpha}{x_i^{\alpha+1}}, 0<\gamma<min(x_i), \alpha>0\]

The loglikelhood function is

\[l(\gamma, \alpha)=n\text{ ln }(\alpha)+\alpha n\text{ ln }(\gamma)-(\alpha+1)\sum_{i=1}^n\text{ ln }(x_i)\]

The best way to maximize the loglikelihood is to make the adjustment so that \(\hat{\gamma} = min(x_i)\) such that \(\gamma\) is never larger than the smallest value of \(x\) in the data. Then setting the partial derivative of the loglikelihood function with respect to \(\alpha\) equal to zero.

\[\frac{n}{\alpha}+n\text{ ln }(\gamma)-\sum_{i=1}^n\text{ ln }(x_i)=0\]

Then we have the estimator

\[\hat{\alpha}=\frac{n}{\sum_{i=1}^n\text{ ln }\big(\frac{x_i}{\hat{\gamma}}\big)}\]

2.2 Using R for Model Fitting

R is a statistical programming language that is very popular in a broad array of scientific and mathematical areas certainly including insurance modeling. It is an open source software that allows independent individual contributions of specific applications of modeling and analysis to be contributed to the ecosystem via packages.

There is an enormous number of packages that have been written and they exist for almost any imaginable statistical, mathematical, scientific, economic, financial, etc. modeling area of interest. There are well over 17,000 of these packages and many of them have sub-packages within further increasing the library.

We limit this paper to the use of a couple of them in this effort. First, we acquire a data set from the package insuranceData and then I use the packages fitdistrplus, MASS and actuar to perform our activities.

2.2.1 Acquire dataCar dataset from insuranceData package in R

To begin, we need to install and load the packages of interest. This paper in RStudio, an IDE designed to work with R using the RMarkdown application that allows compilation to a more readable file.

We used the dataCar data set from the insuranceData package but had some trouble acquiring it and running the modeling efforts so we stored the data set as a csv file on my laptop for ease of use.

To begin, we load the data set dataCar and look at a summary of the data. We are concerned with the cost of claims, labeled claimcst0 in the data summary. We can see that across the 67,856 policies in the data set, if we include the policies with a claim cost of zero we have a mean cost of \(137.3\) with a maximum claim cost of \(55,922.1\).

##    veh_value         exposure             clm            numclaims      
##  Min.   : 0.000   Min.   :0.002738   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.: 1.010   1st Qu.:0.219028   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median : 1.500   Median :0.446270   Median :0.00000   Median :0.00000  
##  Mean   : 1.777   Mean   :0.468651   Mean   :0.06814   Mean   :0.07276  
##  3rd Qu.: 2.150   3rd Qu.:0.709103   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :34.560   Max.   :0.999316   Max.   :1.00000   Max.   :4.00000  
##                                                                         
##    claimcst0          veh_body        veh_age      gender    area     
##  Min.   :    0.0   SEDAN  :22233   Min.   :1.000   F:38603   A:16312  
##  1st Qu.:    0.0   HBACK  :18915   1st Qu.:2.000   M:29253   B:13341  
##  Median :    0.0   STNWG  :16261   Median :3.000             C:20540  
##  Mean   :  137.3   UTE    : 4586   Mean   :2.674             D: 8173  
##  3rd Qu.:    0.0   TRUCK  : 1750   3rd Qu.:4.000             E: 5912  
##  Max.   :55922.1   HDTOP  : 1579   Max.   :4.000             F: 3578  
##                    (Other): 2532                                      
##      agecat                     X_OBSTAT_    
##  Min.   :1.000   01101    0    0    0:67856  
##  1st Qu.:2.000                               
##  Median :3.000                               
##  Mean   :3.485                               
##  3rd Qu.:5.000                               
##  Max.   :6.000                               
## 

It is a good idea to look at the structure of the data set using the str() function to determine the class of variable that you are working with. In this case claimcst0 is a number so we don’t have to make any transformations.

## 'data.frame':    67856 obs. of  11 variables:
##  $ veh_value: num  1.06 1.03 3.26 4.14 0.72 2.01 1.6 1.47 0.52 0.38 ...
##  $ exposure : num  0.304 0.649 0.569 0.318 0.649 ...
##  $ clm      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ numclaims: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ claimcst0: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ veh_body : Factor w/ 13 levels "BUS","CONVT",..: 4 4 13 11 4 5 8 4 4 4 ...
##  $ veh_age  : int  3 2 2 2 4 3 3 2 4 4 ...
##  $ gender   : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 2 1 1 ...
##  $ area     : Factor w/ 6 levels "A","B","C","D",..: 3 1 5 4 3 3 1 2 1 2 ...
##  $ agecat   : int  2 4 2 2 2 4 4 6 3 4 ...
##  $ X_OBSTAT_: Factor w/ 1 level "01101    0    0    0": 1 1 1 1 1 1 1 1 1 1 ...

To have a visual feel for the data of interest we can look at a plot of the number of claims versus the cost of the claims where a claim was greater than zero. We see that most policy of the losses occur where a policy holder made only one claim and proportion of claims per policy holder drops rapidly with a maximum of four claims per policy holder in a small number of instances.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.068   1.000   4.000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   200.0   353.8   761.6  2014.4  2091.4 55922.1

Looking at the summary statistics using the descdist() function from the fitdistrplus we can generate the minimum, maximum, median, mean, sample standard deviation, and unbiased estimates of the skewness and Pearson’s kurtosis values. Also a skewness-kurtosis plot called the Cullen and frey plot is given for the empirical distribution.

On the Cullen and Frey plot the values for common distributions are displayed alongside the empirical distribution as a tool to help understand the choice of distributions to fit the data.

## summary statistics
## ------
## min:  200   max:  55922.13 
## median:  761.565 
## mean:  2014.404 
## estimated sd:  3548.907 
## estimated skewness:  5.04178 
## estimated kurtosis:  43.25942

This summary table neatly lays out the summary statistics of interest for the loss data.

2.2.2 Fitting Severity Data

At this point we now have our vector of claim cost data (df$claimcst0) that we want to analyze by fitting our probability models of interest to.

To begin, we will fit the data to an exponential distribution using the fitdist() function from the fitdistrplus package and then look at a summary of the fitted model and a graphical view of the density and distribution of the empirical and theoretical models as well as a graphical representation of the fit via the probability-probability and the quantile-quantile plots that also map the empirical against the theoretical. A good fit would show a near perfect overlay of the two.

2.2.2.1 Exponential

Fit an exponential distribution naming the model FITexp and using the fitdist() function. then print a summary and a plot of the results.

## Fitting of the distribution ' exp ' by maximum likelihood 
## Parameters : 
##         estimate   Std. Error
## rate 0.004964247 6.988737e-05
## Loglikelihood:  -29156.6   AIC:  58315.2   BIC:  58321.64

Next, we use the gofstat() function from the fitdistrplus package to assess the goodness-of-fit using statistics generated for the Kolmogorov-Smirnov, Anderson-Darling, and Cramer-von Mises methods as well as goodness-of-fit criteria represented by the Akaike’s Information Criterion and the Bayesian Information Criterion assessment tools.

## Goodness-of-fit statistics
##                                1-mle-exp
## Kolmogorov-Smirnov statistic   0.1870179
## Cramer-von Mises statistic    65.3104850
## Anderson-Darling statistic   341.7652234
## 
## Goodness-of-fit criteria
##                                1-mle-exp
## Akaike's Information Criterion  58315.20
## Bayesian Information Criterion  58321.64

2.2.2.2 Gamma

Name the gamma model FITgamma fit it to the loss data and print out a summary of the fitted model and a plot of the fitted model.

## Fitting of the distribution ' gamma ' by maximum likelihood 
## Parameters : 
##           estimate Std. Error
## scale 2685.3322605         NA
## shape    0.7499285         NA
## Loglikelihood:  -39662.92   AIC:  79329.85   BIC:  79342.72 
## Correlation matrix:
## [1] NA

Next, print out the GoF statistics for the fitted gamma distribution.

## Goodness-of-fit statistics
##                              1-mle-gamma
## Kolmogorov-Smirnov statistic   0.1503186
## Cramer-von Mises statistic    34.0035836
## Anderson-Darling statistic   191.1180219
## 
## Goodness-of-fit criteria
##                                1-mle-gamma
## Akaike's Information Criterion    79329.85
## Bayesian Information Criterion    79342.72

2.2.2.3 Weibull

Name the Weibull model FITweibull fit it to the loss data and print out a summary of the fitted model and a plot of the fitted model.

## Fitting of the distribution ' weibull ' by maximum likelihood 
## Parameters : 
##           estimate Std. Error
## shape    0.7857048  0.0081805
## scale 1690.9055745 33.6783355
## Loglikelihood:  -39491.6   AIC:  78987.19   BIC:  79000.07 
## Correlation matrix:
##           shape     scale
## shape 1.0000000 0.3398823
## scale 0.3398823 1.0000000

Print out the GoF statistics using the gofstat() function.

## Goodness-of-fit statistics
##                              1-mle-weibull
## Kolmogorov-Smirnov statistic     0.1704634
## Cramer-von Mises statistic      21.3159533
## Anderson-Darling statistic     139.5430172
## 
## Goodness-of-fit criteria
##                                1-mle-weibull
## Akaike's Information Criterion      78987.19
## Bayesian Information Criterion      79000.07

2.2.2.4 Pareto

Name the Pareto model FITpareto fit it to the loss data and print out a summary of the fitted model and a plot of the fitted model.

## Fitting of the distribution ' pareto ' by maximum likelihood 
## Parameters : 
##          estimate   Std. Error
## shape    2.046569   0.08776192
## scale 2206.086511 132.31772724
## Loglikelihood:  -39169.85   AIC:  78343.7   BIC:  78356.58 
## Correlation matrix:
##           shape     scale
## shape 1.0000000 0.9393593
## scale 0.9393593 1.0000000

Run the gofstat() function on the fitted Pareto model to view the GoF statistics.

## Goodness-of-fit statistics
##                              1-mle-pareto
## Kolmogorov-Smirnov statistic    0.1627262
## Cramer-von Mises statistic     10.7299802
## Anderson-Darling statistic     87.9184259
## 
## Goodness-of-fit criteria
##                                1-mle-pareto
## Akaike's Information Criterion     78343.70
## Bayesian Information Criterion     78356.58

2.2.2.5 Lognormal

Name the lognormal model FITlnorm fit it to the loss data and print out a summary of the fitted model and a plot of the fitted model.

## Fitting of the distribution ' lnorm ' by maximum likelihood 
## Parameters : 
##         estimate Std. Error
## meanlog 6.810081 0.01748793
## sdlog   1.189179 0.01236580
## Loglikelihood:  -38852.15   AIC:  77708.31   BIC:  77721.19 
## Correlation matrix:
##         meanlog sdlog
## meanlog       1     0
## sdlog         0     1

Run the gofstat() function on the fitted lognormal distribution model, FITlnorm and view the GoF statistics.

## Goodness-of-fit statistics
##                              1-mle-lnorm
## Kolmogorov-Smirnov statistic   0.1021038
## Cramer-von Mises statistic    10.5839198
## Anderson-Darling statistic    72.4949308
## 
## Goodness-of-fit criteria
##                                1-mle-lnorm
## Akaike's Information Criterion    77708.31
## Bayesian Information Criterion    77721.19

3. Summary

Of our fitted distributions, the exponential distribution has the largest loglikelihood value, the lowest Kolmogorov-Smirnov test statistic, the lowest Anderson-Darling test statistic and the lowest AIC and BIC values indicating that, for the distributions we fit to the data, the exponential distribution provides the best fit.

However, as we can clearly see from the quantile-quantile and probability-probability plots for the exponential distribution shown below, the FITexp model is far from a perfect fit for the data. Therefore attempting to fit additional distributions to find a better fit would be a desirable next step in finding the “best” model for this particular task.

plot(FITexp)

4. References