Developing a strategy
Data description
# Directed graph (each unordered pair of nodes is saved once): web-Google.txt
# Webgraph from the Google programming contest, 2002
# Nodes: 875713 Edges: 5105039
# FromNodeId ToNodeId
Step 0: randomly sample 20 nodes and neighbors they follow
## [1] "frequency and distribution of types"
##
## both in none out
## 0 0 18 2
##
## both in none out
## 0.0 0.0 0.9 0.1
## [1] "number of edges"
## [1] 178 2
## [1] "number of neighbors they follow"
## [1] 178
## [1] "frequency and distribution of types in neighbors (level 1)"
##
## both in none out
## 29 73 58 18
##
## both in none out
## 0.1629213 0.4101124 0.3258427 0.1011236
## [1] "Distribution of types in all nodes (level 0 and level 1)"
##
## both in none out
## 0.1464646 0.3686869 0.3838384 0.1010101

Step 1: get all neighbors of the target nodes in step 0
## [1] "no. of level 1 nodes of type in that actually have a neighbor"
## [1] 71 3
## [1] "number of neighbors"
## [1] 371
## [1] "frequency and distribution of types of neighbors"
##
## both in none out
## 39 116 199 17
##
## both in none out
## 0.1051213 0.3126685 0.5363881 0.0458221
## [1] "distribution of types in level 1 and level 2 nodes"
##
## both in none out
## 0.10077519 0.34108527 0.51421189 0.04392765

Step 2: get all neighbors of the target neighbor nodes in step 1
## [1] "number of target (in) nodes in level 2"
## [1] 116
## [1] "number of neighbors"
## [1] 533
## [1] "frequency and distribution of types of neighbors"
##
## both in none out
## 42 165 297 29
##
## both in none out
## 0.07879925 0.30956848 0.55722326 0.05440901
## [1] "frequency and distribution of types in all nodes involved in level 1, 2, and 3 (including non-target neighbors reviewed)"
##
## both in none out
## 56 190 370 32
##
## both in none out
## 0.08641975 0.29320988 0.57098765 0.04938272

## [1] "number of all nodes involved (including step 0)"
## [1] 733 1
## [1] "Frequency and distribution of types of all nodes involved in level 0, 1, 2, and 3"
##
## both in none out
## 67 190 426 50
##
## both in none out
## 0.09140518 0.25920873 0.58117326 0.06821282
Compare with the strategy that includes all neighbors observed
Step 0 is the same as the previous strategy: randomly sample 20 nodes and neighbors they follow

Whereas in step 1, all level 1 nodes observed in step 0 are selected, and so are all of their neighbors
## [1] "number of neighbors"
## [1] 1004
## [1] "frequency and distribution of types in neighbors"
##
## both in none out
## 144 213 557 90
##
## both in none out
## 0.14342629 0.21215139 0.55478088 0.08964143
## [1] "frequency and distribution of types in level 1 and level 2"
##
## both in none out
## 0.14160156 0.21777344 0.55273438 0.08789062

## [1] "number of all nodes involved (including step 0)"
## [1] 1040 1
## [1] "total number of nodes selected"
## [1] 1040
## [1] "Frequency and distribution of types of all nodes involved in level 0, 1, 2"
##
## both in none out
## 145 223 582 90
##
## both in none out
## 0.13942308 0.21442308 0.55961538 0.08653846