Developing a strategy

Data description

# Directed graph (each unordered pair of nodes is saved once): web-Google.txt 
# Webgraph from the Google programming contest, 2002
# Nodes: 875713 Edges: 5105039
# FromNodeId  ToNodeId     

Step 0: randomly sample 20 nodes and neighbors they follow

## [1] "frequency and distribution of types"
## 
## both   in none  out 
##    0    0   18    2
## 
## both   in none  out 
##  0.0  0.0  0.9  0.1
## [1] "number of edges"
## [1] 178   2
## [1] "number of neighbors they follow"
## [1] 178
## [1] "frequency and distribution of types in neighbors (level 1)"
## 
## both   in none  out 
##   29   73   58   18
## 
##      both        in      none       out 
## 0.1629213 0.4101124 0.3258427 0.1011236
## [1] "Distribution of types in all nodes (level 0 and level 1)"
## 
##      both        in      none       out 
## 0.1464646 0.3686869 0.3838384 0.1010101

Step 1: get all neighbors of the target nodes in step 0

## [1] "no. of level 1 nodes of type in that actually have a neighbor"
## [1] 71  3
## [1] "number of neighbors"
## [1] 371
## [1] "frequency and distribution of types of neighbors"
## 
## both   in none  out 
##   39  116  199   17
## 
##      both        in      none       out 
## 0.1051213 0.3126685 0.5363881 0.0458221
## [1] "distribution of types in level 1 and level 2 nodes"
## 
##       both         in       none        out 
## 0.10077519 0.34108527 0.51421189 0.04392765

Step 2: get all neighbors of the target neighbor nodes in step 1

## [1] "number of target (in) nodes in level 2"
## [1] 116
## [1] "number of neighbors"
## [1] 533
## [1] "frequency and distribution of types of neighbors"
## 
## both   in none  out 
##   42  165  297   29
## 
##       both         in       none        out 
## 0.07879925 0.30956848 0.55722326 0.05440901
## [1] "frequency and distribution of types in all nodes involved in level 1, 2, and 3 (including non-target neighbors reviewed)"
## 
## both   in none  out 
##   56  190  370   32
## 
##       both         in       none        out 
## 0.08641975 0.29320988 0.57098765 0.04938272

## [1] "number of all nodes involved (including step 0)"
## [1] 733   1
## [1] "Frequency and distribution of types of all nodes involved in level 0, 1, 2, and 3"
## 
## both   in none  out 
##   67  190  426   50
## 
##       both         in       none        out 
## 0.09140518 0.25920873 0.58117326 0.06821282

Compare with the strategy that includes all neighbors observed

Step 0 is the same as the previous strategy: randomly sample 20 nodes and neighbors they follow

Whereas in step 1, all level 1 nodes observed in step 0 are selected, and so are all of their neighbors

## [1] "number of neighbors"
## [1] 1004
## [1] "frequency and distribution of types in neighbors"
## 
## both   in none  out 
##  144  213  557   90
## 
##       both         in       none        out 
## 0.14342629 0.21215139 0.55478088 0.08964143
## [1] "frequency and distribution of types in level 1 and level 2"
## 
##       both         in       none        out 
## 0.14160156 0.21777344 0.55273438 0.08789062

## [1] "number of all nodes involved (including step 0)"
## [1] 1040    1
## [1] "total number of nodes selected"
## [1] 1040
## [1] "Frequency and distribution of types of all nodes involved in level 0, 1, 2"
## 
## both   in none  out 
##  145  223  582   90
## 
##       both         in       none        out 
## 0.13942308 0.21442308 0.55961538 0.08653846