Quadratic Assignment Procedure (QAP) in statnet

Practicum Setup

This practicum is still in the early stages. So, you may note that it is a little on the terse at the moment. That will be rectified in future iterations. But, for now, we will run through a few options that you are likely to find helpful if you plan to use statistical analysis to confirm any suspicions that you developed during background research, or perhaps during your exploratory work.

This part is important. Please be sure to do this part again

Create a folder labeled “QAP_Practicum” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in R Studio:

Session
- Set Working Directory
- Choose Directory…

Well-intentioned tip for those just starting out in R Studio: You will notice that the History tab in R Studio now has a line that looks something like setwd("~/Desktop/Practicum 3"). If you highlight that and click “To Source”, that line will be included in your script. Any time you run the script in the future, you will not need to manually set your working directory every time you begin to go through an older script. There is, of course, a much better way to do this for the long term. But, this will save you a little time in the interim.
Studio

Getting Started

For the sake of familiarity and expediency, we’ll, once again, stick with Padgett’s Florentine families data. The object of this practicum is to familiarize you with a variety of statistical tools that should be helpful to one or more of your projects. For that reason, it is likely that some of these analyses may seem a little pithy or odd. Please understand that the examples are being provided to demonstrate how you may use R to conduct these analyses. Bear with me, then use these with your own data for something a little more worthy of inference!

Data for this Practicum

Load Padgett data. These have been saved as data that are appropriate for statnet. To access them, go to the Network Data link on the course website and download the padbus.rda and padmar.rda files for business ties, and marriage ties, respectively.

These will load data objects named “Padgett_Business” and “Padgett_Marriage” in R’s memory.

load("padbus.rda") # "Padgett_Business"
load("padmar.rda") # "Padgett_Marriage"

You will also need to convert an attribute that we used in the past to a network. Although this practicum will be conducted almost entirely in statnet, igraph offers the most efficient means of converting a two-mode network to one-mode. So, we’ll use igraph to preprocess our data, and then convert it to a statnet object, using the intergraph package.

To begin, load the party.csv file. We have used this before to study brokerage in Practicum 6, so you can find it in the Practicum 6 folder on the Network Data page of the course website.

# Load igraph
library(igraph)

party <- read.csv(file.choose(), header=FALSE) # Load "party.csv"
g <- graph.data.frame(party, directed=FALSE) # convert to igraph object
plot(g)
V(g)$type <- bipartite_mapping(g)$type
proj <- bipartite.projection(g)
Party_Membership <- proj$proj1

# Convert the igraph object to a statnet object
library(intergraph)
Padgett_Party <- asNetwork(Party_Membership)

# Make sure you got the right network
plot(Padgett_Party, displaylabels=TRUE)

# Finally, clean up your memory by detaching both packages.
detach("package:igraph")
detach("package:intergraph")

Now that your data are ready for use, move on to statnet.

Load statnet

All of the following analyses may be found in statnet. This package was developed on top of Carter Butts’ network and sna packages. Each of those packages are dependencies for statnet, so they load with it. Below, I may refer to various functions as residing in statnet, despite the fact that they are actually part of sna or network. The reference to statnet is therefore intended to be encompassing of all three packages. When in doubt, just use the help function (?) to learn more about any individual command that we cover below.

library(statnet)

Quadratic Assignment Procedure (QAP)

Quadratic assignment procedure (QAP) is similar to CUG, in that it simulation in order to generate a distribution of hypothetical networks. But QAP controls for network structure, as compared with CUG, which controls for size, the number of edges, or dyad census. (Note: CUG can be made to condition on other properties. But size, edges, and dyad census are the options available through statnet.)

QAP controls for network structure by repermuting the network over a large number of iterations - usually around 1000. For more details, check out chapter 9 of the book.

QAP is useful for running a variety of statistical functions. Below, are three of the options that you have available in statnet.

QAP Correlation

Did the Florentine families base their business dealings on the marriage ties? (Or maybe their marriages are based on their business ties?)

# Get the Correlation value
gcor(Padgett_Business, Padgett_Marriage)

## [1] 0.3718679

Marriage and business ties are moderately correlated (r=0.37).

# Is it significant?
Medici.Cor <- qaptest(list(Padgett_Business, Padgett_Marriage), gcor, g1=1, g2=2, reps=1000)
Medici.Cor

## 
## QAP Test Results
## 
## Estimated p-values:
##  p(f(perm) >= f(d)): 0.002 
##  p(f(perm) <= f(d)): 1

The correlation is significant at the 0.05 alpha level. We know this because less than 5% the permuted networks - or in this case, all of them - exhibited correlation coefficients that were either, greater than, or less than that of the value we calculated for these networks.

For a visual of this comparison, see below.

plot(Medici.Cor, xlim=c(-0.25, 0.4))

Note: I widened the x-axis a little so that the dotted line would show better. That is what the xlim= argument does in the script above.

QAP Linear Regression

We are finally ready to use the “Padgett_Party” network that you constructed from an attribute. If you have not already created this variable, go back to the top and do that now. We’ll wait.

Okay, now that you have the variables ready, you may use it to run a multiple regression.

Strictly speaking, what we are about to do is inappropriate. It is inappropriate because the ties within this network are binary, and multiple linear regression is meant for continuous measures. So, linear regression works well if your dependent network has valued ties (e.g., trade networks, layered networks, etc.)

Because we are using a binary network as our dependent variable, we can interpret the estimates as explaining the probability of a tie in the dependent network, controlling for other variables in the model.

nl<-netlm(Padgett_Business,           # Dependent variable/network
          list(Padgett_Marriage, Padgett_Party), # List the independent variables/networks
          reps=1000)                  # select the number of permutations

#Examine the results
summary(nl)

## 
## OLS Network Model
## 
## Residuals:
##          0%         25%         50%         75%        100% 
## -0.40038536 -0.07067437 -0.07067437 -0.06874759  0.93125241 
## 
## Coefficients:
##             Estimate     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  0.070674374 0.790   0.210   0.210    
## x1           0.329710983 1.000   0.000   0.000    
## x2          -0.001926782 0.506   0.494   0.975    
## 
## Residual standard error: 0.3089 on 237 degrees of freedom
## Multiple R-squared: 0.1383   Adjusted R-squared: 0.131 
## F-statistic: 19.02 on 2 and 237 degrees of freedom, p-value: 2.188e-08 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)       x1       x2
## Min       -2.40800 -3.07233 -4.36057
## 1stQ       1.10035 -0.73929 -0.92043
## Median     1.82746 -0.22967 -0.07877
## Mean       1.80254  0.03366  0.03020
## 3rdQ       2.52536  0.73238  0.94588
## Max        4.73166  4.81348  5.32695

In the output above, x1 represents the first variable in the model (marriage ties), and x2 represents the second variable that we entered (party ties).

The F-statistic tells us that the model is useful (p<0.05). When we consider the individual predictors, it is apparent from the two-tailed p-values (Pr(>=|b|)) that party ties do not explain business ties (p=0.97) and marriage ties do explain business ties (p<0.0001). We can, therefore, interpret the model estimates as telling us that the presence of a marriage tie increases the probability of a business tie by 0.33. Conversely, we interpret party ties as not explaining business ties, since party was not significant.

Again, this example would have been much better if we had valued ties so that we could predict or explain the value of a tie, given the various independent variables. For a much better way to analyze a binary network, see below.

QAP Logistic Regression

If you have a network with binary ties, this is the regression method that you should use to predict ties in the outcome network (dependent variable).

Below, we run the same model. However, this time, we are using the appropriate distribution. The process is the same, but the interpretation is somewhat more involved.

nlog<-netlogit(Padgett_Business, list(Padgett_Marriage, Padgett_Party),reps=1000)

#Examine the results
summary(nlog)

## 
## Network Logit Model
## 
## Coefficients:
##             Estimate    Exp(b)    Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -2.57893819 0.0758545 0.000   1.000   0.000    
## x1           2.17792200 8.8279428 1.000   0.000   0.000    
## x2          -0.02228475 0.9779617 0.527   0.473   0.967    
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 332.7106 on 240 degrees of freedom
## Residual deviance: 155.2943 on 237 degrees of freedom
## Chi-Squared test of fit improvement:
##   177.4164 on 3 degrees of freedom, p-value 0 
## AIC: 161.2943    BIC: 171.7362 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.4250345 
##  (Dn-Dr)/Dn: 0.5332452 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##       0     1
## 0   210    30
## 1     0     0
## 
##  Total Fraction Correct: 0.875 
##  Fraction Predicted 1s Correct: NaN 
##  Fraction Predicted 0s Correct: 0.875 
##  False Negative Rate: 1 
##  False Positive Rate: 0 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Distribution Summary:
## 
##        (intercept)       x1       x2
## Min       -7.33007 -2.56766 -3.18722
## 1stQ      -6.56498 -1.15953 -0.99335
## Median    -6.25282 -0.30637 -0.17438
## Mean      -6.22982  0.03888 -0.05625
## 3rdQ      -5.92024  0.77360  0.85316
## Max       -4.67951  4.33847  4.42885

The logistic model looks fairly similar to the linear regression model. The estimates, however, are interpreted very differently.

As above, party ties do not explain business ties (p=0.98) and marriage ties do explain business ties (p<0.0001). In this output, the estimates for this test are given in two columns. The first column, labeled “Estimate”, gives the log odds. You can interpret this as a tendency towards forming ties (for positive values), or a tendency away from forming ties (for negative values). It is, however, much easier to read the second column, labeled “Exp(b)”. The values in the second are the log odds that have been converted to odds ratios by taking their exponent. To interpret the odds ratios, think of them as the likelihood of a tie forming, given the presence of some other factor; as compared with the absence of that factor.

Using the second column of estimates, the output indicates that a business tie is 8.83 times more likely to form in the presence of a marriage tie, than in the absence of a marriage tie. If you are interested in betting, then think of this as the odds of forming a business tie being 8.83:1 in the presence of a marriage tie. Alternatively, you could say that the formation of business ties is 8.83 times more likely when there is a marriage tie, then when there is not a marriage tie. Party is not significant (p=0.97), so we interpret party ties as not explaining business ties in this model.

Bonus: Visualizing Two Networks, Side-by_side, Using the Same Coordiantes

Sometimes, it is handy to compare networks. If both networks contain all the same nodes, then it can be very helpful to use the same coordinates in order to help display the differences in the structure. Doing this requires just a couple of visualizaiton tools.

Let’s start with the coordinates. In statnet, preserving coordinates is done by saving a visualization layout as an object. There really is little more to it than plotting the network as usual, and saving that into an object. The thing to keep in mind is that the layout will be different each time you plot. The starting point for the coordinates is randomly shuffled somewhat each time you run the gplot() function. So, it is a good idea to use the set.seed() function to make that a little more predictable.

set.seed(8675309)
gplot(Padgett_Marriage, 
      usearrows = FALSE, 
      displaylabels = TRUE)

When you have an orientation you like, preserve it. Don’t worry if the labels go outside of the plot area at this point. We will address that momentarily, when we visualize the networks side-by-side.

set.seed(8675309) # Use this for consistency
coords <- gplot(Padgett_Marriage)

The coordiantes are saved as x and y values. To see them, just type in whatever name you used to store them.

You may also notice that the labels are running off of the plot a little. We can fix that, but we need to understand where the outer limits of the coordinates are.

coords

##                 x          y
##  [1,]  0.42960984 -1.1655433
##  [2,]  3.99289748 -6.5720179
##  [3,]  4.48341293  0.1109883
##  [4,]  9.61417820 -3.8142220
##  [5,]  7.64374231  0.8103331
##  [6,]  2.83180022 -9.1016792
##  [7,]  7.21734643 -6.3821748
##  [8,]  8.54768171 -8.7424325
##  [9,]  2.88080419 -3.0613211
## [10,] -2.11546422 -5.3040005
## [11,] 10.10252220 -0.8581780
## [12,]  1.82348621  3.6843265
## [13,]  5.52941168 -2.4556741
## [14,] -0.06871684 -4.1530135
## [15,]  8.32981470 -1.5682125
## [16,]  5.07794042 -4.7788305

To see this, you can read through the coordinates, or you can use R to check the minimum (min()) and maximum (max()) values in the coordinates.

min(coords[,1]) # minimum X coordinate (column 1)
## [1] -2.115464
max(coords[,1]) # maximum X coordinate (column 1)
## [1] 10.10252
min(coords[,2]) # minimum X coordinate (column 2)
## [1] -9.101679
max(coords[,2]) # maximum X coordinate (column 2)
## [1] 3.684326

We can see that the x coordinates range from -2.12 to 10.10, and the y coordinates range from -9.10 to 3.68. We will use those in a moment.

In the meantime, we will focus on displaying the networks side-by-side. The par() function tells the graphics window how to behave. Among other things, you can include the option mfrow=c(nrows, ncols) to create a matrix of visualizations. Using this opiton, the graphics window will be filled in by row.

Let’s use all of this to create our side-by-side plots.

Notice, we will use the coords object we created earler for both plots. You will also notice an additional argument: xlim(). The xlim() and ylim() arguments can be included in the plot to widen (xlim) the plot area, or increase the vertical dimensions (ylim).

This time, we are using only the xlim argument to essentially widen the plot area a little. As we saw above, the x axis for the plot ranges from -2.12 to 10.10. Below, we can set the limits just a little wider (-4 to 14) to give a little more space for the labels. We are also decreasing the label size (label.cex) a little. The default size for the labels is 1, so we are decreasing that about 25%.

Your own adjustments are likely to be different. So, consider the steps we took here when you plot your networks.

par(mfrow=c(1, 2), mar=c(1,1,1,1)) # Set the window to have one row and two columns

gplot(Padgett_Marriage, 
      usearrows = FALSE, 
      displaylabels = TRUE,
      coord = coords,      # Use the coordinates we saved
      label.cex = 0.75,    # Decrease label size
      xlim=c(-4, 14),      # Widen x axis
      main="Marriage Ties")# Add a label

gplot(Padgett_Business, 
      usearrows = FALSE, 
      displaylabels = TRUE,
      coord = coords,      # Use the coordinates we saved
      label.cex = 0.75,    # Decrease label size
      xlim=c(-4, 14),      # Widen x axis
      main="Business Ties")# Add a label

par(mfrow=c(1, 1)) # Reset the graphics window to one plot
                   # (one row and one column)