new orleans facebook - exploratory analysis

Helpers
- Count wall posts
- Random subnetworks
Extract the subnetworks we need to test hypotheses
- Friendship network
- Reciprical posting networks
Sanity Checks
- The friendship paradox is present
- Positive assortativity is common
Testing some of our basic theoretical predictions
More to the point. Directly assessing theoretical prediction

Helpers

Count wall posts

wallposts.counts <- (function() {
  
  cnt <- count(wallposts, vars=c("f","t"))
  comb <- merge(cnt, cnt, by.x=c("f","t"), by.y=c("t","f"))
  
  names(comb) <- c("a","b","ab","ba")
  comb

})()

Random subnetworks

The subnetworks are defined by randomly selecting an ego, and their friends, and the friends of their friends. If this network is sizable enough, but not too sizable (500 < N < 2000) we calculate a whole bunch of network statistics from it and save these.

get_random_subnetworks <- function (
  graph,
  N=100,
  plot_degree_distribution=F, 
  chop_min_degree=1
) {
  
  ldply(1:N, function(i){
    
    # select a random node
    rnode <- sample(V(graph), 1)
    
    # collect the 3rd-order ego-net, and its induced subgraph
    nbd.nodes <- neighborhood(graph, order=3, nodes=rnode)[[1]]
    nbd = induced_subgraph(graph=graph, vids=nbd.nodes)
    
    # we only keep sizable subgraphs
    n <- length(V(nbd))
    if(!(n < 2000 & n > 500)){
      return(NULL);
    }
    
    # print the degree distribution of each neighborhood, for debugging
    if(plot_degree_distribution) {
      plot(degree_distribution(nbd), type='l')
    }
    
    # chop off those with low degree, who aren't strongly connected to the cluster
    if( chop_min_degree > 1 ) {
      nbd <- network.analysis.iteratively_chop(
        nbd, 
        min_degree=chop_min_degree
      )
    }
    
    # compute stats about ego-nets et al
    attr(nbd, "friend_compare") <- network.analysis.friend_compare.fast(nbd)
    fr = network.analysis.generate_feld_row(nbd)
    
    # return the damn thing
    ret <- fr$value
    names(ret) <- fr$stat
    
    ret
  })
  
}

Extract the subnetworks we need to test hypotheses

Friendship network

g.friends <- graph_from_data_frame(
  friendships[,c("f","t")],
  directed = F
)
nets.friends <- get_random_subnetworks(g.friends, N=1000)

Reciprical posting networks

f_recip.nonzero <- wallposts.counts[ 
  with(wallposts.counts, 
    ab > 0 & ba > 0
  ), c("a","b")
]

f_recip.notunbalanced <- wallposts.counts[ 
  with(wallposts.counts, 
    ab > 0 & ba > 0 & (ab/ba < 3 & ba/ab < 3)
  ), c("a","b")
]

Sanity Checks

The friendship paradox is present

ggplot(nets.friends) + 
  geom_point(aes(x=variance_degree, y=p_meanfdeg_gt_mydeg))

Positive assortativity is common

ggplot(nets.friends) + 
  geom_histogram(aes(x=deg_meanfdeg_b), fill='gray', color='black')

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(nets.friends) + 
  geom_histogram(aes(x=assortativity), fill='gray', color='black')

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The following is simply asking what one of these networks look like. What does the plot of degree against mean friend degree look like in a single network?

# select a random node
rnode <- sample(V(g.friends), 1)

# collect the 3rd-order ego-net, and its induced subgraph
nbd.nodes <- neighborhood(g.friends, order=3, nodes=rnode)[[1]]
nbd = induced_subgraph(graph=g.friends, vids=nbd.nodes)

friend_compare <- network.analysis.friend_compare.fast(nbd)
attr(nbd, "friend_compare") <- friend_compare

network.plot.degree_slices(friend_compare, meanfd_max=250, cuts=8, deg_max=200)

## [1] "Creating 0"
## [1] "Creating 25"
## [1] "Creating 50"
## [1] "Creating 75"
## [1] "Creating 100"
## [1] "Creating 125"
## [1] "Creating 150"
## [1] "Creating 175"

## Warning: Removed 952 rows containing non-finite values (stat_density).

Testing some of our basic theoretical predictions

We’ll first look at a network which is by its nature symmetric – the simple friendship network. Without limiting this network at all in terms of activity, it’s likely to include bots and other abberations, but it should be a good guide nonetheless.

The proportion below their friends’ means corresponds (almost) with the proportion below the mean friend’s degree

For any subnet, the mean for friends is the mean plus var/mean for people. The proportion expected to have friends with mean more than they have is the proportion of people below that mean for friends, which can be calculated from the distribution for people.

Note that crit_p_overall_true is the proportion of individuals in the population who have degree less than the “mean for friends”. p_meanfdeg_gt_mydeg, as the name suggests, is the proportion of individuals whose ego-net mean is greater than their own degree.

ggplot(nets.friends) + 
  geom_point(aes(x=crit_p_overall_true, y=p_meanfdeg_gt_mydeg)) +
  geom_abline()

Mean mean / mean degree goes down with assortativity (but only for negative assortativity)

Now, the mean mean is the mean of the mean for friends for all the people. That mean mean is equal to the overall mean for friends when there is no correlation between people and and the mean of their friends. We can calculate the mean mean divided by the overall mean. That quotient goes down with assortativity.

nets.friends$quotient <- nets.friends$mean_meanfdeg / nets.friends$mean_degree

ggplot(nets.friends) + 
  geom_point(aes(x=assortativity, y=quotient))

Limiting to those with positive assortativity (defined here as the slope in a regression of degree and mean friend’s degree).

ggplot(nets.friends) + 
  scale_x_continuous(limits=c(0,0.5)) +
  scale_y_continuous(limits=c(0,4)) +
  geom_point(aes(x=assortativity, y=quotient))

## Warning: Removed 65 rows containing missing values (geom_point).

Proportion with friend mean greater than they / expected number should be constant with assortativity.

We can calculate the proportion of people with friend mean greater than their own number divided by the expected number calculated above from the distribution for people. I claim that number is NOT predicted to go down as assortativity goes up, even though the mean mean does go down with assortativity.

nets.friends$quotient2 <- nets.friends$p_meanfdeg_gt_mydeg / nets.friends$crit_p_overall_true

ggplot(nets.friends) + 
  geom_point(aes(x=assortativity, y=quotient2))

More to the point. Directly assessing theoretical prediction

The prediction is that with increasing assortativity (or possibly the correlation between degree and mean friends’ degree) we should see increasing “critical degree,” beyond which a person has a better than worse chance (under the linear model) of having mean friends’ degree less than theirs. Our prediction is that holding all else equal (is this possible?) increasing assortativity means increasing proportion with degree less than the mean of their friends’.

First notice that there are no values of crit_deg_overall_theory within a reasonable range

crits <- nets.friends$crit_deg_overall_theory
length(crits[crits < nets.friends$mean_degree*3 & crits > 0])

## [1] 0

hist(log(nets.friends$crit_deg_overall_theory))

## Warning in log(nets.friends$crit_deg_overall_theory): NaNs produced