Data description

# Directed graph (each unordered pair of nodes is saved once): web-Google.txt 
# Webgraph from the Google programming contest, 2002
# Nodes: 875713 Edges: 5105039
# FromNodeId  ToNodeId     

The data (first several rows/edges)

##      from     to
## 1       1  11343
## 2       1 824021
## 3       1 867924
## 4       1 891836
## 5   11343      1
## 6   11343  27470
## 7   11343  38717
## 8   11343 309565
## 9   11343 322179
## 10  11343 387544
## 11  11343 427437
## 12  11343 538215
## 13  11343 638707
## 14  11343 645019
## 15  11343 835221
## 16  11343 856658
## 17  11343 867924
## 18  11343 891836
## 19 824021      1
## 20 824021  91808
## 21 824021 322179
## 22 824021 387544
## 23 824021 417729
## 24 824021 438494
## 25 824021 500628
## 26 824021 535749
## 27 824021 695579
## 28 824021 867924
## 29 824021 891836
## 30 867924      1

Number of edges

## [1] 5105039

Number of nodes

## [1] 875713

Degree distributions (in logarithm scale)

Joint distribution

Hill plots (upper 10% of the data)

Stable distribution (\(\alpha = 0.2\)) as an example:

library(stabledist)
x1 <- rstable(10000, alpha = 0.2, beta = 0.1, gamma = 1, delta = 0, pm = 0) 
hill.plot(x1, start = 1000, ylim = c(0, 0.3)) 

Correlation between in and out degrees

##            in       out
## in  1.0000000 0.1365364
## out 0.1365364 1.0000000

Quantiles of in degrees

##   0%   5%  10%  15%  20%  25%  30%  35%  40%  45%  50%  55%  60%  65%  70% 
##    0    0    0    0    1    1    1    1    1    1    1    2    2    3    3 
##  75%  80%  85%  90%  95% 100% 
##    4    6    8   12   20 6326

Quantiles of out degrees

##   0%   5%  10%  15%  20%  25%  30%  35%  40%  45%  50%  55%  60%  65%  70% 
##    0    0    0    0    1    1    1    2    2    3    4    5    5    7    8 
##  75%  80%  85%  90%  95% 100% 
##    9   10   12   14   18  456

Number of nodes with only out degrees (not followed by others)

## [1] 161168

Quantiles of out degrees for such nodes

##   0%   5%  10%  15%  20%  25%  30%  35%  40%  45%  50%  55%  60%  65%  70% 
##    1    1    1    1    1    2    2    2    2    3    3    4    4    5    5 
##  75%  80%  85%  90%  95% 100% 
##    6    8    9   11   14  245

Number of nodes with only in degrees (not following others)

## [1] 136259

Quantiles of in degrees for such nodes

##   0%   5%  10%  15%  20%  25%  30%  35%  40%  45%  50%  55%  60%  65%  70% 
##    1    1    1    1    1    1    1    1    1    1    1    1    1    1    2 
##  75%  80%  85%  90%  95% 100% 
##    2    2    2    3    6 4847

Plots of sub-graphs

First 1000 edges

Randomly select 200 edges

3D interactive plot of first 300 edges in the data

Conditional distributions of neighbors’ types

Now for each type of nodes, we calculate the conditional distribution of types of their followers (from) and nodes they follow (to):