Following code reads in the data:
## Loading required package: MASS
## Loading required package: survival
## Warning: Missing column names filled in: 'X1' [1]
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_double(),
## V1 = col_character(),
## V2 = col_double(),
## V3 = col_double(),
## V4 = col_double(),
## V5 = col_double(),
## V6 = col_double(),
## V7 = col_character(),
## `2015 Slant` = col_double(),
## `2016 Slant` = col_double(),
## `2017 Slant` = col_double(),
## `2018 Slant` = col_double(),
## `2019 Slant` = col_double()
## )
The data read is of the r/AnythingGoesNews subreddit for years 2015-2019, Jan - March. Each year’s computed slant score is fitted with a distribution (given the Cullen & Gray Graph). For this subreddit, following is the simple plot of the 2015 slant score data along with the Cullen & Gray graph:
## summary statistics
## ------
## min: 5.784828e-06 max: 0.9999942
## median: 0.6978874
## mean: 0.6871052
## estimated sd: 0.08349227
## estimated skewness: -4.856242
## estimated kurtosis: 37.53401
Clearly, the candidates for distribution fitting are: beta, gamma & lognormal. We use: Weibull, beta & gamma. Note that since MLE is used to fit the data, we scale it such that all values are positive. First, the 2015 slant scores is fitted as follows:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -120.640 0.000 0.000 -1.864 0.000 52.224
## 25% 50% 75% 90% 99%
## 0.00000 0.00000 0.00000 2.03150 18.60485
Here are the 2016 slant scores plotted:
## summary statistics
## ------
## min: 5.983903e-06 max: 0.999994
## median: 0.6330551
## mean: 0.6245723
## estimated sd: 0.08264903
## estimated skewness: -3.432226
## estimated kurtosis: 30.04946
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -105.792 0.000 0.000 -1.418 0.000 61.321
## 25% 50% 75% 90% 99%
## 0.0000 0.0000 0.0000 2.2865 30.4959
Here are the 2017 slant scores plotted:
## summary statistics
## ------
## min: 5.048414e-06 max: 0.999995
## median: 0.4533022
## mean: 0.4512509
## estimated sd: 0.07483273
## estimated skewness: 0.8263925
## estimated kurtosis: 28.50812
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -89.7900 0.0000 0.0000 -0.4063 0.0000 108.2900
## 25% 50% 75% 90% 99%
## 0.0000 0.0000 0.0000 2.1175 45.8463
Here are the 2018 slant scores plotted:
## summary statistics
## ------
## min: 4.546074e-06 max: 0.9999955
## median: 0.1584261
## mean: 0.1738089
## estimated sd: 0.1048226
## estimated skewness: 6.378991
## estimated kurtosis: 47.00718
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -34.848 0.000 0.000 3.384 0.000 185.120
## 25% 50% 75% 90% 99%
## 0.0000 0.0000 0.0000 0.0000 122.3242
Here are the 2019 slant scores plotted:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -57.0240 0.0000 0.0000 -0.3569 0.0000 39.1440
## 25% 50% 75% 90% 99%
## 0.0000 0.0000 0.0000 0.0000 0.1668