wallposts.counts <- (function() {
cnt <- count(wallposts, vars=c("f","t"))
comb <- merge(cnt, cnt, by.x=c("f","t"), by.y=c("t","f"))
names(comb) <- c("a","b","ab","ba")
comb
})()
The subnetworks are defined by randomly selecting an ego, and their friends, and the friends of their friends. If this network is sizable enough, but not too sizable (500 < N < 2000) we calculate a whole bunch of network statistics from it and save these.
get_random_subnetworks <- function (
graph,
N=100,
plot_degree_distribution=F,
chop_min_degree=1
) {
ldply(1:N, function(i){
# select a random node
rnode <- sample(V(graph), 1)
# collect the 3rd-order ego-net, and its induced subgraph
nbd.nodes <- neighborhood(graph, order=3, nodes=rnode)[[1]]
nbd = induced_subgraph(graph=graph, vids=nbd.nodes)
# we only keep sizable subgraphs
n <- length(V(nbd))
if(!(n < 2000 & n > 500)){
return(NULL);
}
# print the degree distribution of each neighborhood, for debugging
if(plot_degree_distribution) {
plot(degree_distribution(nbd), type='l')
}
# chop off those with low degree, who aren't strongly connected to the cluster
if( chop_min_degree > 1 ) {
nbd <- network.analysis.iteratively_chop(
nbd,
min_degree=chop_min_degree
)
}
# compute stats about ego-nets et al
attr(nbd, "friend_compare") <- network.analysis.friend_compare.fast(nbd)
fr = network.analysis.generate_feld_row(nbd)
# return the damn thing
ret <- fr$value
names(ret) <- fr$stat
ret
})
}
g.friends <- graph_from_data_frame(
friendships[,c("f","t")],
directed = F
)
nets.friends <- get_random_subnetworks(g.friends, N=1000)
f_recip.nonzero <- wallposts.counts[
with(wallposts.counts,
ab > 0 & ba > 0
), c("a","b")
]
f_recip.notunbalanced <- wallposts.counts[
with(wallposts.counts,
ab > 0 & ba > 0 & (ab/ba < 3 & ba/ab < 3)
), c("a","b")
]
ggplot(nets.friends) +
geom_point(aes(x=variance_degree, y=p_meanfdeg_gt_mydeg))
ggplot(nets.friends) +
geom_histogram(aes(x=deg_meanfdeg_b), fill='gray', color='black')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(nets.friends) +
geom_histogram(aes(x=assortativity), fill='gray', color='black')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The following is simply asking what one of these networks look like. What does the plot of degree against mean friend degree look like in a single network?
# select a random node
rnode <- sample(V(g.friends), 1)
# collect the 3rd-order ego-net, and its induced subgraph
nbd.nodes <- neighborhood(g.friends, order=3, nodes=rnode)[[1]]
nbd = induced_subgraph(graph=g.friends, vids=nbd.nodes)
friend_compare <- network.analysis.friend_compare.fast(nbd)
attr(nbd, "friend_compare") <- friend_compare
network.plot.degree_slices(friend_compare, meanfd_max=250, cuts=8, deg_max=200)
## [1] "Creating 0"
## [1] "Creating 25"
## [1] "Creating 50"
## [1] "Creating 75"
## [1] "Creating 100"
## [1] "Creating 125"
## [1] "Creating 150"
## [1] "Creating 175"
## Warning: Removed 952 rows containing non-finite values (stat_density).
We’ll first look at a network which is by its nature symmetric – the simple friendship network. Without limiting this network at all in terms of activity, it’s likely to include bots and other abberations, but it should be a good guide nonetheless.
For any subnet, the mean for friends is the mean plus var/mean for people. The proportion expected to have friends with mean more than they have is the proportion of people below that mean for friends, which can be calculated from the distribution for people.
Note that
crit_p_overall_trueis the proportion of individuals in the population who have degree less than the “mean for friends”.p_meanfdeg_gt_mydeg, as the name suggests, is the proportion of individuals whose ego-net mean is greater than their own degree.
ggplot(nets.friends) +
geom_point(aes(x=crit_p_overall_true, y=p_meanfdeg_gt_mydeg)) +
geom_abline()
Now, the mean mean is the mean of the mean for friends for all the people. That mean mean is equal to the overall mean for friends when there is no correlation between people and and the mean of their friends. We can calculate the mean mean divided by the overall mean. That quotient goes down with assortativity.
nets.friends$quotient <- nets.friends$mean_meanfdeg / nets.friends$mean_degree
ggplot(nets.friends) +
geom_point(aes(x=assortativity, y=quotient))
Limiting to those with positive assortativity (defined here as the slope in a regression of degree and mean friend’s degree).
ggplot(nets.friends) +
scale_x_continuous(limits=c(0,0.5)) +
scale_y_continuous(limits=c(0,4)) +
geom_point(aes(x=assortativity, y=quotient))
## Warning: Removed 65 rows containing missing values (geom_point).
We can calculate the proportion of people with friend mean greater than their own number divided by the expected number calculated above from the distribution for people. I claim that number is NOT predicted to go down as assortativity goes up, even though the mean mean does go down with assortativity.
nets.friends$quotient2 <- nets.friends$p_meanfdeg_gt_mydeg / nets.friends$crit_p_overall_true
ggplot(nets.friends) +
geom_point(aes(x=assortativity, y=quotient2))
The prediction is that with increasing assortativity (or possibly the correlation between degree and mean friends’ degree) we should see increasing “critical degree,” beyond which a person has a better than worse chance (under the linear model) of having mean friends’ degree less than theirs. Our prediction is that holding all else equal (is this possible?) increasing assortativity means increasing proportion with degree less than the mean of their friends’.
First notice that there are no values of crit_deg_overall_theory within a reasonable range
crits <- nets.friends$crit_deg_overall_theory
length(crits[crits < nets.friends$mean_degree*3 & crits > 0])
## [1] 0
hist(log(nets.friends$crit_deg_overall_theory))
## Warning in log(nets.friends$crit_deg_overall_theory): NaNs produced