Part A

Introduction

Social Network Analysis was conducted on a professional network consisting of 109 employees, each with two possible relationship types–information seeking and collaboration.

Attributes

The attributes for this network include the following:

  • Department: 1-5

  • Management Level: 1-3

  • Tenure: 1-3

  • Gender: 1 or 2

  • Individual: satisfaction with individual productivity

  • Group: satisfaction with group productivity

  • Organization: satisfaction with organization productivity

Questions to be Answered

The types of questions than can be answered about this network using SNA are:

  • Is there a difference of the makeup between the departments? E.g. does each department contain the same number of high level managers?

  • What attributes, if any, impact the existence of a certain relationship type? E.g. does a person’s gender impact whether they have a collaborative or information seeking relationship with their manager?

  • Are there employees who play certain roles, such as information brokers or hubs?

  • Do people tend to connect with people in or outside of their own department?

  • If someone has an info-seeking relationship are they likely to also have a collaboration relationship and vice versa?

Overall Findings

  • D1R69 from Department 1 may manage Department 2 because they are structurally equivalent to several members within that department.

  • D1R66 from Department my be the CEO of this company since they are responsible for all significant communication among that department.

  • Employees D4R149, D4R189, D4R53 are likely members of the same team who collaborate with each other often.

  • The odds of formation of an infoseek tie is positively correlated with department and satisfaction with group, individual, and organization productivity.

  • The longer an employee stays with the company, the more selective they are with those employess with whom they collaborate. They first build up a large network after entering the company, and then after some experience limit their collaboration to a select few.

Analysis Methods and Approach

To better understand this network, descriptive analysis was conducted on the attributes data to see what it can tell us about the network, such as if members of each department have the same attributes. Next, we calculated network-level metrics such as density and dyad- and triad census to identify how well-connected the network is and how connections are distributed. Node-level analysis using Gephi was utilized to discover which employees play which roles and who controls the information channels within this company.To dive further, a block model was created to view how the networks are organized and whether it is by department or some other mechanism. Lastly, we used QAP analysis to measure the impact of attributes on connection formation, if such an impact exists.

Descriptive Analysis

Management Level and Departments

In this network, there are five departments, with three levels of management. Based on the below chart, it can be inferred that the deparments move from highest level of management (CEO-level) to lowest. Department 1 only consists of level 1 management, while the remaining departments consist of all three management levels, with department 2 having the highest proportion of level 2 managers. The number of level 1 managers among departments 2-5 is relatively consistent.

Management Level and Gender

Overall, there are more of Gender 1 across all levels of management compared to Gender 2.

Tenure and Departments

The tenure attribute describes how long each employee has been a part of this organization, 1 being the least and 3 being the most. Those with highest tenure (2 and 3) occupy the entirety of Department 1. The remaining departments consist of all three levels of tenure, with Department 2 containing the highest number of new employees.

Satisfaction with Productivity

Each employee rated their satisfaction with the productivity of themselves (individual), their working groups (group), as well as the entire organization.

Mean Std.Dev. Min Q1 Median Q3 Max
group 9.20e-06 1.004625 -2.895 -0.507 0.058 0.653 1.859
individual 0.00e+00 1.004623 -4.052 -0.457 0.217 0.652 1.564
organization -2.75e-05 1.004594 -3.075 -0.703 0.155 0.724 2.086

Across the organization, individual satisfaction had the lowest minimum rating at -4.052, while organization satisfaction had the highest maximum rating at 2.086. The mean for all three of the productivity rating types was approximately 0.

Several individuals from Department 3 have very low ratings for all three areas, while most of the other departments range from mild individual and organization dissatisfaction (-2 to 0) to mild individual satisfaction (0 to 1). Only Departments 2 and 5 have employees who are very satisfied with both organizational and individual productivity (1+ for each). This trend continues with group satisfaction as well.

Network-Level Analysis

For the two networks, edges that were weighted 4 and above were considered true relationships. Edges below that threshold were not considered as a strong indication of a relationship and were thus removed. Additionally, self-ties were removed.

Collaboration Network

Metrics

Vertices Edges Network Density Mutual Dyads Asymmetric Dyads Null Dyads Complete Triads Transitivity
104 569 0.0526072 132 305 4919 48 0.2952323
  • In Collaboration Network, there’s 104 vertices and 569 Edges.
  • 5.26% of the directed ties are actualized among all members in this network.
  • Based on the transitivity, we can conclude that 29.52% of the connected triples of nodes in this network graph are triangles.

Information Seeking Network

Metrics

Vertices Edges Network Density Mutual Dyads Asymmetric Dyads Null Dyads Complete Triads Transitivity
108 769 0.0659294 195 379 5204 118 0.3233634
  • In Information Seeking Network, there’s a toral of 108 vertices and 769 Edges.
  • 6.59% of the directed ties are actualized among all members in this network.
  • Based on the transitivity, we can conclude that 29.52% of the connected triples of nodes in this network graph are triangles.

Node Level Analysis

Collaboration Network

Node Clustering Coefficient Degree Betweenness Eigenvector
D5R136 0.1884058 36 1503.6977 0.8166203
D4R149 0.2898551 34 769.1425 1.0000000
D4R189 0.2727273 34 884.3010 0.9390535
D3R10 0.2597403 31 1117.8375 0.8051194
D4R53 0.1688312 28 752.6055 0.6630539
D2R58 0.2000000 27 582.6637 0.5227066

Infoseeking Network

Node Clustering Coefficient Degree Betweenness Eigenvector
D4R149 0.2717718 53 913.7338 1.0000000
D5R136 0.2269841 52 1455.0020 0.8171295
D4R189 0.3181818 49 867.7845 0.9595118
D4R53 0.2318548 45 1137.2573 0.7627640
D5R152 0.2580645 44 1149.8826 0.6936223
D5R50 0.2413793 39 816.5781 0.5871588

Comparison of Centralities

The degree centrality and betweenness centrality are generally positively correlated with each other in both networks. The higher the value of centrality is, the smaller number of nodes there is. One person in department 5(D5136) appears to be an extreme outlier in both networks, given this individual is in the highest level of management, she could be a leader with high power and most information in this company.

Degree and Betweenness Centrality

The degree centrality and closeness centrality are mostly positively correlated in both networks. There are a few ouliers with low degree and high closeness centrality in both networks, which could be a result of the close relationship that people in high management level have with other important/active alters.

Degree and Closeness Centrality

In both networks, the majority of people have low betweenness centrality while only a few have very high betweenness centrality. Those with high betweenness like D5R136 and D3R10 could be intermediaries in collaboration network and important information holder in info-seeking network. Also, due to the multiple paths in network (probably close relationship between directors of different apartments), several outliers have low betweenness and high closeness centrality.

Betweenness and Closeness Centrality

Other Analysis

Subgroup Analysis

Community detection algorithms are methods for identifying a community’s structure. These algorithms can assist in viewing how nodes within a community interact, which nodes play the part of an information broker between separate cliques, and so on. Based on that these two are directed networks, we chose walktrap algorithm to display a general community composition for the two networks. It found roughly three main communities within the collaboration network and four within the information seeking network.

Role Analysis

Hub in Collaboration Network

In this graph, we can have a general view of the hub relationships in collaboration network. The most important node here is D5R136 who is in management level 1 and have 3 years of tenure. We can also see that the higher the management level is, and the more tenure one have, the more possible is one node to be highly connected in the collaboration network.

D5R136 Role

This Gephi image shows the connections of node D5R136, it is clealy demonstrated that node D5R136 is in a very high level of authority, directly connected with D3R10, D4R53, and D4R189 etc. The high authority indicates that D5R136 could be an important information broker, the nodes connected with him are mostly of high authority as well, which reflects the hierarchy in collaboration in this company.

Hub in Information Seeking Network

Different from the collaboration relationship, the most important node and possible information hub here is D5R50 who is in management level 1 and have 2 years of tenure. It is also evident in this graph that high management level and more tenure leads to more important positions in the information seeking network. What’s interesting here is that people in the same department tend to have a closer information seeking relationship with each other, depite of the management level.

Also, we found out that CEO may be D1R66 since they are the only Dept1 member connecting Dept1.

Block Model Analysis

In Block Model Analysis, actors are placed into groups based on the types and degree of relationships they have. Those within the same block are said to be structurally equivalent–they share the same relationships with the same nodes. This can help answer the question of whether employees tend share the same relationships among others within their department. The graphs below show each department and block membership by color.


[Block 1 = Orange, Block 2 = Blue, Block 3 = Green, Block 4 = Yellow]


The majority of employees within this network belong to the same block (Block 1). There is very low density among this block, with only a select number of employees communicating with employees of other blocks. Two of the three members of Department 1 are among the members of Block 1. The remaining member, D1R69, belongs in the same block as several employees from Department 2. This may mean that he/she is responsible for managing this department. It also may be safe to assume that D1R66 is the CEO of this company as he/she is responsible for all communication among Department 1. Several employees from Departments 3 and 4 (D4149, D4R189, D4R53) make up Blocks 2 and 4, respectively.

The above image shows that there are no ties between members of Block 1, indicating that the majority of the company does not communicate with one another. Additionally, members of Block 1 only communicate with those in blocks 2 and 4. The remaining three blocks do have ties among fellow members, with the highest density being among those in Block 4. This may mean that the employees in that block form a collaborative team within their department.

Statistical Analysis - QAP

Quadratic Assignment Procedure (QAP) is a regression analysis technique that is used to assess the impact of independent variables on a target.

Collaboration Network

## IGRAPH 1ad9165 DN-- 104 569 -- 
## + attr: name (v/c), id (v/c), department (v/n), management_level
## | (v/n), tenure (v/n), gender (v/n), individual (v/n), group
## | (v/n), organization (v/n), block (v/n), collab_degree (v/n),
## | info_degree (v/n), value (e/n)

In the Collaboration Network, there are 569 edges among 104 actors. Five employees in this company do not maintain significant collaborative relationships among their colleagues.

Does having an info-seeking relationship increase the chance of having a collaborative relationship as well?

Our hypotheses to test are:

  • Ho: the relationship between info-seek and collaboration is random.

  • H1: the relationship between info-seek and collaboration is non-random.

## 
## Network Logit Model
## 
## Coefficients:
##             Estimate  Exp(b)       Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -3.688786   0.02500233 0       1       0        
## x1           5.685885 294.67839103 1       0       0        
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 16020.02 on 11556 degrees of freedom
## Residual deviance: 2936.278 on 11554 degrees of freedom
## Chi-Squared test of fit improvement:
##   13083.74 on 2 degrees of freedom, p-value 0 
## AIC: 2940.278    BIC: 2954.988 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.5310015 
##  (Dn-Dr)/Dn: 0.8167119 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##          Actual
## Predicted       0       1
##         0   10719     268
##         1      68     501
## 
##  Total Fraction Correct: 0.9709242 
##  Fraction Predicted 1s Correct: 0.8804921 
##  Fraction Predicted 0s Correct: 0.9756075 
##  False Negative Rate: 0.3485046 
##  False Positive Rate: 0.006303884 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 100 
##  Distribution Summary:
## 
##        (intercept)        x1
## Min      -58.73599  -3.02312
## 1stQ     -58.47972  -1.18180
## Median   -58.35588   0.02339
## Mean     -58.33671  -0.07359
## 3rdQ     -58.17415   0.88518
## Max      -57.88408   3.27182

If an employee has an info-seeking relationship, they are likely to have a collaborative relationship with 81.67% certainty. Info-seeking is significant with a p-value below 0.05 such as p-value = 0. This relationship can be summarized by:

\[logodds(collaboration)=5.686*infoseek\]

Which attributes, if any, contribute to the existence of a collaborative relationship?

## 
## Network Logit Model
## 
## Coefficients:
##             Estimate     Exp(b)     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -2.652570673 0.07046983 1.00    0.00    1.00     
## x1           0.030304559 1.03076842 0.76    0.24    0.59     
## x2           0.054681178 1.05620382 0.95    0.05    0.07     
## x3          -0.005557107 0.99445831 0.42    0.58    0.95     
## x4           0.051741102 1.05310306 0.77    0.23    0.56     
## x5           0.022881694 1.02314549 0.64    0.36    0.71     
## x6           0.097379018 1.10227808 0.88    0.12    0.18     
## x7          -0.137887003 0.87119713 0.00    1.00    0.00     
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 16020.02 on 11556 degrees of freedom
## Residual deviance: 5634.293 on 11548 degrees of freedom
## Chi-Squared test of fit improvement:
##   10385.73 on 8 degrees of freedom, p-value 0 
## AIC: 5650.293    BIC: 5709.132 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.4733322 
##  (Dn-Dr)/Dn: 0.6482967 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##         0       1
## 0   10787     769
## 1       0       0
## 
##  Total Fraction Correct: 0.9334545 
##  Fraction Predicted 1s Correct: NaN 
##  Fraction Predicted 0s Correct: 0.9334545 
##  False Negative Rate: 1 
##  False Positive Rate: 0 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 100 
##  Distribution Summary:
## 
##        (intercept)        x1        x2        x3        x4        x5
## Min      -70.30961  -3.07088  -3.50072  -3.05298  -4.43821  -3.62898
## 1stQ     -70.30961  -1.39838  -0.83871  -0.58935  -1.41386  -0.89466
## Median   -70.30961  -0.36227   0.27177   0.14166  -0.24359   0.15155
## Mean     -70.30961  -0.26444   0.10948   0.25325  -0.22768   0.08856
## 3rdQ     -70.30961   0.74337   1.00563   1.23542   1.05594   0.90876
## Max      -70.30961   3.21258   3.26685   3.28495   3.75380   3.73761
##               x6        x7
## Min     -3.81094  -3.07917
## 1stQ    -0.86013  -0.76274
## Median  -0.00178   0.10789
## Mean     0.03166   0.06022
## 3rdQ     0.73719   0.91928
## Max      3.33440   3.06334

Beside the tie x7(orgmat) (p=0.02) which is significant, the other ties in the above model are not significant because the resulting p-values were higher than the level of significance of 0.05.

The output of x7(orgmat) indicates that for every one-unit increase in satisfaction with organization productivity, the probability of tie formation collaboration tie increases by a factor of 1.15. Since the other ties are not significant, we interpret them as their relationship with collaboration tie formation is insignificant.

Inferential Statistics

Association: Scatterplot shows a moderately strong, negative relationship between satisfaction with organization productivity and collaboration degree for departments 2, 4, and 5. There is a positive relationships between satisfaction with organization productivity and collaboration degree for departments 1 and 3. There are few points with high leverage that are potentially influential. Those outliers and influential points happened to be in departments 1, 2, and 5.

The boxplots comparing gender’s collaboration ties show that the range of gender 1 is larger than that of gender 2. Additionally, the mean for gender 1 is higher than gender 2. Distributions are not symmetric. It means gender 2 has less collaboration ties than gender 1. The tiny circles outside the range of gender 1, gender 2 represent the outlier values.

Analyzing this box plot of collaboration and tenure, we can infer that the mean value of tenure 2 is higher than that of tenure 1 and tenure 3. It means tenure 1 and tenure 3 have less collaboration ties than tenure 2. Tenure 3 has the lowest collaboration ties in the company. This may indicate that the longer an employee stays with the company, the more selective they are with those employess with whom they collaborate. Employees who just enter the company (tenure 1) have not had the time to form a solid collaboration network, so they try to collaborate with as many fellow employees as possible (tenure 2).

Information Seeking Network

## IGRAPH 1aeb173 DN-- 108 769 -- 
## + attr: name (v/c), id (v/c), department (v/n), management_level
## | (v/n), tenure (v/n), gender (v/n), individual (v/n), group
## | (v/n), organization (v/n), block (v/n), collab_degree (v/n),
## | info_degree (v/n), value (e/n)

In the Information Seeking Network, there are 769 edges among 108 actors, showing that employees within this company ask for assistance more often than asking to collaborate with their colleagues.

Does having a collaborative relationship increase the chance of having an info-seeking relationship as well?

If an employee has a collaborative relationship, they are likely to have an info-seeking relationship as well with 88.65% certainty. Collaboration is significant with a p-value below 0.05 such as p-value = 0.

## 
## Network Logit Model
## 
## Coefficients:
##             Estimate  Exp(b)       Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -5.060265 6.343876e-03 0.99    0.01    0.99     
## x1           5.685884 2.946784e+02 1.00    0.00    0.00     
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 16020.02 on 11556 degrees of freedom
## Residual deviance: 1818.968 on 11554 degrees of freedom
## Chi-Squared test of fit improvement:
##   14201.05 on 2 degrees of freedom, p-value 0 
## AIC: 1822.968    BIC: 1837.678 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.5513461 
##  (Dn-Dr)/Dn: 0.8864565 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##          Actual
## Predicted       0       1
##         0   10719      68
##         1     268     501
## 
##  Total Fraction Correct: 0.9709242 
##  Fraction Predicted 1s Correct: 0.6514954 
##  Fraction Predicted 0s Correct: 0.9936961 
##  False Negative Rate: 0.1195079 
##  False Positive Rate: 0.02439246 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 100 
##  Distribution Summary:
## 
##        (intercept)        x1
## Min      -44.94247  -3.02141
## 1stQ     -43.79353  -1.01039
## Median   -43.26494  -0.14912
## Mean     -43.22534  -0.09176
## 3rdQ     -42.56451   0.92820
## Max      -41.38110   3.44011

Which attributes, if any, contribute to the existence of an information-seeking relationship?

## 
## Network Logit Model
## 
## Coefficients:
##             Estimate    Exp(b)     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -2.98363651 0.05060846 0.00    1.00    0.00     
## x1           0.08701166 1.09090940 0.99    0.01    0.01     
## x2           0.14044675 1.15078780 0.98    0.02    0.02     
## x3          -0.14171968 0.86786451 0.01    0.99    0.01     
## x4          -0.11335325 0.89283521 0.02    0.98    0.04     
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 16020.02 on 11556 degrees of freedom
## Residual deviance: 4508.509 on 11551 degrees of freedom
## Chi-Squared test of fit improvement:
##   11511.51 on 5 degrees of freedom, p-value 0 
## AIC: 4518.509    BIC: 4555.283 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.4990356 
##  (Dn-Dr)/Dn: 0.7185703 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##         0       1
## 0   10987     569
## 1       0       0
## 
##  Total Fraction Correct: 0.9507615 
##  Fraction Predicted 1s Correct: NaN 
##  Fraction Predicted 0s Correct: 0.9507615 
##  False Negative Rate: 1 
##  False Positive Rate: 0 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 100 
##  Distribution Summary:
## 
##        (intercept)        x1        x2        x3        x4
## Min      -67.91694  -2.94602  -2.23476  -3.17605  -4.21023
## 1stQ     -67.91694  -0.95716  -0.75406  -1.21374  -0.71678
## Median   -67.91694  -0.19594   0.05705  -0.29185   0.10738
## Mean     -67.91694  -0.09305   0.15064  -0.22880   0.10461
## 3rdQ     -67.91694   0.64767   0.98077   0.61610   0.97924
## Max      -67.91694   3.40961   3.54292   2.24416   2.92978

The log odds of an information-seeking relationship occurring in this network can be summarized by:

\[logodds(infoseek)=-2.98+0.087*dept+0.14*group-0.14*individual-0.11*organization\] The odds of formation of an infoseek tie is impacted by the department and satisfaction with group, individual, and organization productivity.