In this project, you will explore a network of friends, classmates, and roommates. Your instructor will provide you with a data set (matrix) of various relationship types among individuals: friendship, classmate, roommate. Refer to the Exponential Random Graph Theory (ERGM) discussed in class and read the tutorial on ERGM in R (included). Apply ERGM to the given set to explain the friendship network in relationship to the roommate and classmate matrices. Your instructor will provide additional guidance on expectation from further analysis, interpretation, and visualization of your results.
In order to understand what one social network tells us about other, similar networks, inferential statistics is required. The four tests that are most commonly used for statistical significance, chi-square test, z test, t test, and F ratio test, cannot be used on social networks. “Most significance tests presume that the units of analysis are independent of each other, which is exactly contrary to the assumption of interdependence between cases in social network data” (Social Network Analysis: Methods and Examples, 88). Because of this, some novel methods are needed to produce the sampling distribution for social network data and to infer from them the significance level of test statistics.
The major that I am in at GCU, computer science, has multiple emphases. These emphases are big data, game design, and business. There is one class that every cs major, regardless of their emphasis, has to take. This class is CST-405. I want to answer the following question:
“Do people sit at the same table with others from their same emphasis?”
In order to find this out I’ve created two adjacency matrices that connect everyone in CST-405 by major and by seating arrangement.
This is the adjacency matrix that connects everyone by the table they sit at:
class.nodes = read.csv("classNodes.csv")
class.tables = read.csv("classLinkedByTables.csv", row.names = 1)
class.tables.df = as.data.frame(class.tables)
colnames(class.tables.df) = class.nodes$firstname
rownames(class.tables.df) = paste(class.nodes$firstname, class.nodes$lastname)
kable(class.tables.df)
| Aaron | Alec | Andrew | Brian | Connor | Daniel | Daniel | Erik | Garret | Justen | Kevin | Kyle | Lamarr | Michael | Nvart | Shawn | Timothy | Jacob | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Aaron Scirocco | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| Alec Ferko | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Andrew Parasadayan | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Brian Kurowski | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| Connor Segneri | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Daniel Briscoe | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Daniel Stagnaro | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Erik Weimer | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| Garret Grundeis | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| Justen Johns | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Kevin Hoskins | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Kyle Bewley | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| Lamarr Pace | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Michael Hesseltine | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
| Nvart Kahkedjian | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Shawn Kurowski | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Timothy Lowther | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| Jacob Slaton | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
class.tables.g = graph_from_adjacency_matrix(as.matrix(class.tables))
par(bg="gray70")
plot(class.tables.g, edge.arrow.size = 0.4, edge.color = "white", vertex.shape = "none",
vertex.label = class.nodes$firstname, vertex.label.color = "black")
This is the adjacency matrix that connects everyone by their major:
class.majors = read.csv("classLinkedByMajor.csv", row.names = 1)
class.majors.df = as.data.frame(class.majors)
colnames(class.majors.df) = class.nodes$firstname
rownames(class.majors.df) = paste(class.nodes$firstname, class.nodes$lastname)
kable(class.majors.df)
| Aaron | Alec | Andrew | Brian | Connor | Daniel | Daniel | Erik | Garret | Justen | Kevin | Kyle | Lamarr | Michael | Nvart | Shawn | Timothy | Jacob | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Aaron Scirocco | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Alec Ferko | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Andrew Parasadayan | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Brian Kurowski | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Connor Segneri | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Daniel Briscoe | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Daniel Stagnaro | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Erik Weimer | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Garret Grundeis | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Justen Johns | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Kevin Hoskins | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Kyle Bewley | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
| Lamarr Pace | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| Michael Hesseltine | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| Nvart Kahkedjian | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
| Shawn Kurowski | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
| Timothy Lowther | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| Jacob Slaton | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
class.majors.g = graph_from_adjacency_matrix(as.matrix(class.majors))
par(bg="gray70")
plot(class.majors.g, edge.arrow.size = 0.4, edge.color = "white", vertex.shape = "none",
vertex.label = class.nodes$firstname, vertex.label.color = "black")
“Quadratic Assignment Procedure (QAP) was developed by Krackhardt (1987), and its logic draws from that of the permutation test” (Social Network Analysis: Methods and Examples). This method is used to determine if a correlation between networks is statistically significant or not.
Below is the methodology of the quadratic assignment procedure:
netlm(class.majors, class.tables)
##
## OLS Network Model
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.35652174 1.000 0.000 0.000
## x1 0.09084668 0.869 0.131 0.244
##
## Residual standard error: 0.4852 on 304 degrees of freedom
## F-statistic: 2.003 on 1 and 304 degrees of freedom, p-value: 0.158
## Multiple R-squared: 0.006546 Adjusted R-squared: 0.003278
The linear regression function in the sna package can be used to find the correlation between the matrices. This is because it finds the probability value (p-value) automatically.
The multiple R-squared is 0.006546, and the p-value is 0.158. Since the p-value is not below conventional threshold levels (e.g., p < .05, .01, or .001), the null hypothesis that any correlations between the networks occurs purely by chance fails to be rejected. The p-value is low enough, however, to warrant further investigation. If there is a correlation within the two matrices tested, it is very minimal, but a different result could be discovered if a different data set is used with more observations.
“The exponential random graph model (ERGM)/P* attempts to explain how and why social network ties arise. The main goal of ERGMs is to understand a given observed network, that is, the empirical network measured by researchers, and to obtain insight into the underlying process that creates and sustains its ties” (Social Network Analysis: Methods and Examples, 93).
An ERGM assigns probability to graphs according to the following statistics:
\[P_\theta(G)=ce^{\theta_1Z_1(G)+\theta_2Z_2(G)+...+\theta_pZ_p(G)}\]
G is the specific network that is being analyzed.Zs in the expression are network statistics.c is a normalizing constant.“ERGM includes those inter-dependencies of network ties, represented in various network configurations that explain the overall network structures, and measures their respective importance in the network formation process” (Social Network Analysis: Methods and Examples, 94). This is an advantage of using ERGM, that the overall network structure can be analyzed depending on multiple processes operating simultaneously.
When explaining network ties in a full network, ERGM looks at three aspects:
Below is one ERGM model of the class network where ties are determined by the tables they sit at.
class.tables.net = as.network(x = class.tables,
directed = TRUE,
loops = FALSE,
matrix.type = "adjacency")
set.vertex.attribute(class.tables.net,
"Major",
as.character(class.nodes$major))
class.tables.01 = ergm(class.tables.net ~ edges)
summary(class.tables.01)
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: class.tables.net ~ edges
##
## Iterations: 5 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % p-value
## edges -1.1073 0.1323 0 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 424.2 on 306 degrees of freedom
## Residual Deviance: 343.0 on 305 degrees of freedom
##
## AIC: 345 BIC: 348.8 (Smaller is better.)
This first formula tells us the density of the network. The log-odds of any tie existing is -1.1073. Using this, the density can be found:
\[\frac{e^{-1.1073}}{1+e^{-1.1073}}\]
This comes out to be about 0.248. This means that the overall density of the graph is about 24.8%. This suggests a lower than moderate connectivity for the network. Since the p-value is so low, it means that this density measure is statistically significant, and would not normally occur by chance.
In the next model, homophily will also be evaluated. This is the idea that similar nodes are more likely to be connected than dissimilar nodes. The homophily of the table network is examined based on major.
class.tables.02 = ergm(class.tables.net ~ edges + nodematch("Major", diff = T))
summary(class.tables.02)
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: class.tables.net ~ edges + nodematch("Major", diff = T)
##
## Iterations: 13 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % p-value
## edges -1.25954 0.17483 0 <1e-04 ***
## nodematch.Major.CS in big data 0.09639 0.40226 0 0.811
## nodematch.Major.CS in business 16.16194 738.50272 0 0.983
## nodematch.Major.CS in game design 0.43856 0.30987 0 0.158
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 424.2 on 306 degrees of freedom
## Residual Deviance: 335.5 on 302 degrees of freedom
##
## AIC: 343.5 BIC: 358.4 (Smaller is better.)
Out of the estimated correlations above, it seems that the business emphases tend to sit with each other (since this network again has connections based on seating arrangements) more than other emphases do. The p-values, however for all of these correlations are not below 0.05, so the null hypothesis is rejected, and we cannot confidently say that these homophily results are anything but chance. To state clearly, there is not enough evidence to support a relationship between emphases sitting next to their likeness (those in the same emphasis) concerning the observations that were evaluated. The p-value is low enough, like in the QAP, to warrant further investigation. The outcome is the same as in the QAP, but again the results may be different if more nodes are observed and included in the analysis.
(n.d.). Retrieved February 21, 2018, from http://www.mjdenny.com/Preparing_Network_Data_In_R.html
Homophily. (2018, February 20). Retrieved February 21, 2018, from https://en.wikipedia.org/wiki/Homophily
Irvine, C. S. (n.d.). Retrieved February 21, 2018, from https://statnet.org/trac/raw-attachment/wiki/Sunbelt2016/ergm_tutorial.html#terms-provided-with-ergm
Yang, S., Zhang, L., & Keller, F. B. (2017). Social network analysis: methods and examples. Los Angeles: Sage.