Distributions

While some distributions are technically best-suited or even off-limits for certain types of data, I’m usually most interested in using a model based on a distribution that fits the shape of the data as closely as possible.

Digging around further, even when we say time as a continuous variable, we’re talking about the predictor variable and when considering the proper distribution the question is about the response variable. Time is moot, the response is crime incidents, which are a count, which are discrete.

Poisson distribution

The Poisson distribution should technically be appropriate. A Poisson distribution is fit by a single parameter, lambda:

MASS::fitdistr(clev.t$hr2, "Poisson")
##      lambda  
##   18.6265560 
##  ( 0.2780083)
The Poisson distribution oversamples the middle of the afternoon, which actually shows a lull in crime. Even though hours are discrete there are too many categories for a discrete distribution to model them well.

The Poisson distribution oversamples the middle of the afternoon, which actually shows a lull in crime. Even though hours are discrete there are too many categories for a discrete distribution to model them well.

Gamma distribution

##      shape         rate   
##   9.98104533   0.53585042 
##  (0.89443710) (0.04924653)
Gamma distribution errors less on oversampling afternoon-evening, which actually has a lull in crime.

Gamma distribution errors less on oversampling afternoon-evening, which actually has a lull in crime.

Statistics

What does it matter if we just re-do it with the Poisson and don’t get into Gamma at all?

The results do change… a little. Note that the crime type x time interaction is still the top model under each distribution. The second model changes though. Note that all of the Gamma-based models have lower AICc scores than all of the Poisson-based models (similarly for log likelihood, LL). This indicates to me that the Gamma model just fits the data better, as we see in the distribution plots.

## [1] "Gamma distribution:"
## 
## Model selection based on AICc:
## 
##                     K   AICc Delta_AICc AICcWt Cum.Wt      LL
## type.x.time        13 312.90       0.00   0.67   0.67 -142.19
## type.game_day.time 21 314.39       1.49   0.32   0.99 -132.80
## type.only          12 320.92       8.02   0.01   1.00 -147.38
## game_day.x.time     5 349.75      36.85   0.00   1.00 -169.68
## time.only           4 354.51      41.61   0.00   1.00 -173.13
## null                3 365.25      52.35   0.00   1.00 -179.55
## [1] "Poisson distribution:"
## 
## Model selection based on AICc:
## 
##                     K   AICc Delta_AICc AICcWt Cum.Wt      LL
## type.x.time        12 445.62       0.00   0.71   0.71 -209.73
## type.only          11 449.04       3.42   0.13   0.84 -212.61
## game_day.x.time     4 449.29       3.67   0.11   0.95 -220.51
## time.only           3 451.47       5.85   0.04   0.99 -222.66
## null                2 455.42       9.80   0.01   1.00 -225.67
## type.game_day.time 20 456.80      11.18   0.00   1.00 -205.33