This practicum is an exercise that supports the upcoming book, “Introduction to Social Network Analysis using R”. So, you may note that it is a little on the terse side when it comes to explanation. Please use the draft chapter as a reference while you proceed. But, for now, we will run through a few options that you are likely to find helpful if you plan to use statistical analysis to confirm any suspicions that you developed during background research, or perhaps during your exploratory work.
Create a folder labeled “CUG_Practicum” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in RStudio:
For the sake of familiarity and expediency, we’ll stick with Padgett’s Florentine families data. The object of this practicum is to familiarize you with a variety of statistical tools that should be helpful to one or more of your projects. For that reason, it is likely that some of these analyses may seem a little pithy or odd. Please understand that the examples are being provided to demonstrate how you may use R to conduct these analyses. Bear with me, then use these with your own data for something a little more worthy of inference!
Load Padgett data. These have been saved as data that are appropriate for statnet. To access them, go to the Network Data link on the course website and download the padbus.rda, and padmar.rda, and padpar.rda files for business ties, marriage ties, and party ties, respectively.
These will load data objects named “Padgett_Business” “Padgett_Marriage”, and “Padgett_Party” in R’s memory.
load("padbus.rda") # "Padgett_Business"
load("padmar.rda") # "Padgett_Marriage"
load("padpar.rda") # "Padgett_Party"
Now that your data are ready for use, move on to statnet.
All of the following analyses may be found in statnet. This package was developed on top of Carter Butts’ network and sna packages. Each of those packages are dependencies for statnet, so they load with it. Below, I may refer to various functions as residing in statnet, despite the fact that they are actually part of sna or network. The reference to statnet is therefore intended to be encompassing of all three packages. When in doubt, just use the help function (?) to learn more about any individual command that we cover below.
library(statnet)
Conditional uniform graphs (CUGs) are simple in their essence. By running the commands below, you are essentially running through a three step process in one brief step. The process for running a CUG is:
In frequentist statistics, a single sample t-test compares some reference measure with a sample distribution. The null hypothesis for this test is “no difference.” The network-friendly analog of the single sample t-test is pretty much the same, with the caveat that the result cannot be extrapolated to the wider population. Rather, the comparison is between what we observe in the network we are analyzing and the distribution of simulated networks that are conditioned on a particular parameter.
H0: There is no difference between the measure in the observed network and the same type of measures in the simulated networks of this (conditioning parameter).
You may use a CUG test of this sort with pretty much any global measure. The example below uses only betweenness centralization as the global measure. There are many more that may be more of a match to your research. If you are a little rusty on global measures, take a look back at Chapter 4 of Understanding Dark Networks or look back at Practicum 3.
Keep in mind that some, but not all, functions to calculate global measures require two arguments. For example, centralization in statnet requires you to include both the name of network being analyzed, and a second argument giving the name of the centrality measure being applied. In order to tell the cug.test function how to use the second argument, you will need to include the FUN.arg=list(), argument as we do, below.
Note that centralization is a generalized, graph-level measure that summarizes the variation in centrality measures over the entire network. If you are undure of what centrality measures are available in statnet, then check out the practicum on centrality measures in statnet. There, you will find a list of centrality measures that you can use with the centralization function in statnet.
You will also note that the example is conditioning the simulated networks on all three of the possible modes: size; number of edges; and the distribution of dyads. This is for demonstration purposes. Under normal circumstances, we would just condition the simulated networks on any one of the three. We are running all three to demonstrate how they differ. Select the conditioning mode according to what you suspect about your own network of interest. Your analysis should be meaningful, not just a whole battery of tests that were run to see if anything “looks good.”
cugBetSize <- cug.test(Padgett_Business,
centralization,
FUN.arg=list(FUN=betweenness),
mode="graph",
cmode="size")
cugBetEdges <- cug.test(Padgett_Business,
centralization,
FUN.arg=list(FUN=betweenness),
mode="graph",
cmode="edges")
cugBetDyad <- cug.test(Padgett_Business,
centralization,
FUN.arg=list(FUN=betweenness),
mode="graph",
cmode="dyad.census")
To see the results, enter the name of any of the above objects (e.g., cugBedSize) in the console.
If, however, you would prefer to get fancy, then you can combine the output for all three into a table, as below. The process looks annoyingly involved, but the result can be worth it.
# Aggregate the findings...if you prefer.
Bet_Centralization <- c(cugBetSize$obs.stat, # Combine output
cugBetEdges$obs.stat,
cugBetDyad$obs.stat)
PctGreater <- c(cugBetSize$pgteobs, # Combine pseudo p-values (right tail)
cugBetEdges$pgteobs,
cugBetDyad$pgteobs)
PctLess <- c(cugBetSize$plteobs, # Combine pseudo p-values (left tail)
cugBetEdges$plteobs,
cugBetDyad$plteobs)
Betweenness <- cbind(Bet_Centralization, # Bind them all together
PctGreater,
PctLess)
rownames(Betweenness) <- c("Size", "Edges", "Dyads") # Change the row names
# write.csv(Betweenness, file="CUGoutput.csv") # Save the output, if you wish
round(Betweenness, 2) # Take a look
## Bet_Centralization PctGreater PctLess
## Size 0.21 0.00 1.00
## Edges 0.21 0.73 0.27
## Dyads 0.21 0.75 0.25
Consider the visualization of the network, below. Given the network’s structure, it is at least somewhat dominated by the Barbadori and Medici families (betweenness centralization = 0.21). But is that level of centralization special to the Florentine business network, or is this something that we would normally expect for a network this size? Is it something that we would normally expect for a network with this number of edges? Is it something that we would normally expect for a network with this distribution of dyads?
As we can see in the output above, this level of centralization is very uncommon in a network this size. But it is not at all uncommon in a network with the same number of edges, or the same distribution of dyads.
We can depict the same information graphically by displaying the distribution of betweeness centralization measures for the randomly generated networks, and indicating where the betweenness centralization measure of 0.21 lies in comparison to each distribution.
par(mfrow=c(1,3))
plot(cugBetSize, main="Betweenness \nConditioned on Size" )
plot(cugBetEdges, main="Betweenness \nConditioned on Edges" )
plot(cugBetDyad, main="Betweenness \nConditioned on Dyads" )
par(mfrow=c(1,1))
Of course, you don’t have to produce side-by-side plots. If you wish to take a look at the histogram plots, you can enter minimal information into R.
plot(cugBetSize)
Correlations are inherently descriptive measures that describe a relationship between two variables. When this is translated into network terms, they describe the similarity between two network matrices. Provided the two matrices are in the same order, we can compare them for similarity.
The correlation function included in the sna package in the statnet suite is gcor(). The gcor function is designed to correlate a pair of networks, or a list of more than two networks. For pairs of networks, you may enter the networks directly into the function. To create a correlation matrix, however, it is necessary to first combine the networks into a single object using the list function. Once the networks are combined into a single object, they may be analyzed using a correlation matrix, or you may specify which networks in the list should be analyzed using the g1= and g2= arguments.
You will also notice the round() function. That function will round the output to a certain number of digits (two in this case). Feel free to try gcor(nets) to see what the matrix looks like when it is not rounded. Either way, you will need to keep up with which network is which. Statnet will not do it for you.
gcor(Padgett_Business, Padgett_Marriage)
## [1] 0.3718679
nets <- list(Padgett_Business,
Padgett_Marriage,
Padgett_Party)
gcor(nets, g1=1, g2=2) # specify the list and which to analyze
## [1] 0.3718679
round(gcor(nets), 2) # Correlation matrix, rounded to 2 digits
## 1 2 3
## 1 1.00 0.37 -0.05
## 2 0.37 1.00 -0.12
## 3 -0.05 -0.12 1.00
To make this a correlation test, we need to incorporate the correlation into the CUG function. At present, the cug.test function does not allow us to test two networks like this. So, we need to use the older, cugtest (note the lack of a period) instead.
The cugtest function operates in pretty much the same fashion as the newer cug.test function, with the exception that it uses three different terms with which to condition the simulated networks. A breakdown of the differences are listed below. Note that, if the cmode= argument is omitted from either version of the function, or if the term does not match what the function expects to see, the default value will be applied. Each version is idiosyncratic, in that each has its own three conditioning modes and the two do not contain overlapping terms. For either function, you cannot substitute parameters that are not included in the list of three for each.
| Meaning | cug.test (newer) |
cugtest (older) |
|---|---|---|
| Number of nodes in the network | “size” (default) | “order” |
| Distribution of edge values in the network | “ties” | |
| Distribution of edge values, or number of edges in the network | “edges” | |
| Density of the network | “density” (default) | |
| Number (or value) of dyads the network | “dyad.census” |
Running a correlation test with the cugtest() function will provide a pseudo p-value. The null hypothesis for the correlation test is:
H0: r=0
or
H0: There is no relationship. (The observed measure is spurious.)
Although it is possible to run the set of networks as a batch, that will give you only one, overall pseudo p-value. For pairwise comparison tests between networks, it is important to run them two at a time. In this case, we are conditioning all simulations on the “order” of the networks so that their results can be compared.
nets <- list(Padgett_Business, # You already did this, above.
Padgett_Marriage, # It is included here for your
Padgett_Party) # cognitive convenience.
corOrd13 <- cugtest(nets, gcor,
g1=1, g2=3, # The first and third network
cmode= "order")
corOrd12 <- cugtest(nets, gcor,
g1=1, g2=2, # The first and second network
cmode= "order")
corOrd23 <- cugtest(nets, gcor,
g1=2, g2=3, # The second and third network
cmode= "order")
It is also possible to plot the distribution of simulation values using plot(test), though the plots will appear somewhat different from the newer versions. As with other CUG tests, however, the plots are best only for illustration purposes and unnecessary for inference.
statnetstatnet Object from an Edgelist or a MatrixImport the data in either format to create a network data object.
For this example, I am using a network created from the movie Death of Stalin.
net <- read.csv(file.choose(),
header=FALSE,
check.names=TRUE) # First find and load your data.
Once the edgelist is imported, you may convert it to a network object.
Here, we are saving it as an object called “talked_to”. Be sure to give a unique name to every network you import. Be sure to use a creative, and descriptive name each time.
talked_to <-as.network(net, matrix.type="edgelist")
Once the matrix is imported, you may convert it to a network object.
Start by making sure that the object imported as a matrix.
net <- as.matrix(net)
Next, create the network object.
talked_to <-as.network(net,
matrix.type="adjacency",
directed=FALSE) # Modify this as necessary. This is set for an undirected network.
To save the network in your working folder, use the save() function. Remember, no matter what you call the network when you save it in the folder, R will remember whatever you called it before it was saved. So, if you create a network named “network” and then save it as “Stalins_Network.rda”, then, when you reopen it, it will still appear in R as “network.” (Just an FYI)
save(talked_to,
file="talked_to-DeathOfStalin.rda")
statnetStatnet actually has great functionality in visualization of networks. It does, however, render a little more slowly than igraph. Just to keep it a little odd, the arguments in statnet’s gplot() function are not very similar to those in igraph.
To see a list of options for plotting and visualizing networks in statnet, enter ?gplot into the R console. Then, try as many of these as you can…
In the meantime, try a few of the options selected for the visualization below.
gplot(talked_to,
displaylabels=TRUE, # include node labels
label.cex=0.6, # decrease the size of labels a little
vertex.cex=1.5, # increase node size a little
arrowhead.cex=0.75, # decrease the size of the arrows a little
edge.col="gray") # change the color of the edges