Exploring the Signatures

Plotting in hours

In the provided graph, the distribution of the number of items per hour appears to exhibit a decreasing trend, which is characteristic of a power law distribution. Specifically, as time increases (hours since the first item), the number of items generally decreases, suggesting a heavy-tailed distribution.

Filtering and finding the turning point

In minutes, we want to check if information is shared rapidly.

First we tried to find at what time 95% of the information is shared

## [1] "Time at which 95% of RTs occur: 165992 mins"
## [1] "equivalent to 2766.53333333333 hours"

Then we tried to make a more precise calculation of at which point the curve flattens, using an simple iteration

## Total Cumulative RTs in the subset (first 3000 minutes): 820
## Percentage of Total RTs in the subset: 91.62 %

Indicators of a Power Law Distribution:

  1. Heavy Tail: A power law distribution typically shows that a few occurrences have high values, while most occurrences have low values.
  2. Log-Log Scale: If you were to plot the data on a log-log scale, a linear relationship could indicate a power law behavior.

Next Steps:

To formally assess if the data fits a power law distribution, you can:

- Fit a Power Law Model: Use methods such as maximum likelihood estimation (MLE) to estimate the parameters of the power law.

- Statistical Tests: Conduct statistical tests (like the Kolmogorov-Smirnov test) to compare the empirical distribution with the power law distribution.

- Log-Log Plot: Generate a log-log plot of the data to visually check for linearity, which would suggest a power law.

## [1] "The slope (α) for the cascade is: 4.62128575676145"
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  cascade_data
## D = 0.95007, p-value < 2.2e-16
## alternative hypothesis: two-sided
## 
##  Chi-squared test for given probabilities
## 
## data:  observed_freq_valid
## X-squared = 4.0285, df = 3, p-value = 0.2584

## 
##  Chi-squared test for given probabilities with simulated p-value (based
##  on 2000 replicates)
## 
## data:  observed_freq_valid
## X-squared = 4.0285, df = NA, p-value = 0.2454

Summary:

  • alpha

  • fit

  • time

  • time at max count