Fitting a Distribution

Now let us see how to find a distribution that fits the data. For that I will take a random normal distribution with mean = 0 and sd = 1

dat = rnorm(50000, 0, 1)
hist(dat)

Now let us forget we created the data and take up the task of finding the distribution that fits ‘dat’. We will use ‘fitdistrplus’ package and ‘fitdist’ function from it.

library(fitdistrplus)
## Loading required package: MASS
## Loading required package: survival
f1 <- fitdist(dat,"norm")
f2 <- fitdist(dat,"logis")
f3 <- fitdist(dat,"cauchy")

Now we have three variables f1, f2, f3 which calculates if the ‘dat’ is from ‘normal’ or ‘log-normal’ or ‘cauchy’ distributions respectively. We can have look at f1, f2, f3 to know the parameters of the respective distributions w.r.t ‘dat’.

f1
## Fitting of the distribution ' norm ' by maximum likelihood 
## Parameters:
##         estimate  Std. Error
## mean 0.001550461 0.004476436
## sd   1.000961466 0.003165304
f2
## Fitting of the distribution ' logis ' by maximum likelihood 
## Parameters:
##             estimate  Std. Error
## location 0.002269943 0.004480285
## scale    0.572707809 0.002120105
f3
## Fitting of the distribution ' cauchy ' by maximum likelihood 
## Parameters:
##             estimate  Std. Error
## location 0.003795903 0.004494169
## scale    0.614418431 0.003472385

Let us match our dat with theoretical distributions.

plotdist(dat,"norm",para=list(mean=f1$estimate[1],sd=f1$estimate[2]))

plotdist(dat,"logis",para=list(location=f2$estimate[1],scale=f2$estimate[2]))

plotdist(dat,"cauchy",para=list(location=f3$estimate[1],scale=f3$estimate[2]))

Looking at all the graphs, we can see that ‘Normal Distribution’ fills it well. But let us confirm it statistically using ‘gofstat’ function.

gofstat(list(f1, f2, f3), fitnames = c("norm", "logis", "cauchy"))
## Goodness-of-fit statistics
##                                     norm       logis       cauchy
## Kolmogorov-Smirnov statistic 0.002965509  0.01711589   0.07292582
## Cramer-von Mises statistic   0.050763624  5.53203700  57.27261334
## Anderson-Darling statistic   0.395298198 39.45320521 677.39288598
## 
## Goodness-of-fit criteria
##                                    norm    logis   cauchy
## Akaike's Information Criterion 141994.0 142966.9 160397.1
## Bayesian Information Criterion 142011.6 142984.5 160414.7

By looking at parameters and statistics, ‘Normal Distribution’ suits it well (The least one in all parameters).