In the provided graph, the distribution of the number of items per hour appears to exhibit a decreasing trend, which is characteristic of a power law distribution. Specifically, as time increases (hours since the first item), the number of items generally decreases, suggesting a heavy-tailed distribution.
In minutes, we want to check if information is shared rapidly.
First we tried to find at what time 95% of the information is shared
## [1] "Time at which 95% of RTs occur: 165992 mins"
## [1] "equivalent to 2766.53333333333 hours"
Then we tried to make a more precise calculation of at which point the curve flattens, using an simple iteration
## Total Cumulative RTs in the subset (first 3000 minutes): 820
## Percentage of Total RTs in the subset: 91.62 %
To formally assess if the data fits a power law distribution, you can:
- Fit a Power Law Model: Use methods such as maximum likelihood estimation (MLE) to estimate the parameters of the power law.
- Statistical Tests: Conduct statistical tests (like the Kolmogorov-Smirnov test) to compare the empirical distribution with the power law distribution.
- Log-Log Plot: Generate a log-log plot of the data to visually check for linearity, which would suggest a power law.
## [1] "The slope (α) for the cascade is: 4.62128575676145"
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: cascade_data
## D = 0.95007, p-value < 2.2e-16
## alternative hypothesis: two-sided
##
## Chi-squared test for given probabilities
##
## data: observed_freq_valid
## X-squared = 4.0285, df = 3, p-value = 0.2584
##
## Chi-squared test for given probabilities with simulated p-value (based
## on 2000 replicates)
##
## data: observed_freq_valid
## X-squared = 4.0285, df = NA, p-value = 0.2454
Summary:
alpha
fit
time
time at max count