library(igraph)
## Warning: package 'igraph' was built under R version 3.4.4
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
About Social Network Analysis:
This framework is used to study the relationship between different nodes within network structure through application of mathematical models. This approach is used to study impact of other entities in the clusters within the network.
Purpose: The dataset consists of 17 observations from the student survey conducted at Harrisburg University. The dataset has names of the student, their peers who they worked with, frequency of their meetings, gender, how far they located from university, and professional field of this peer. From this social network analysis, I want to identify clusters within this student sample from the variables collected in the survey.
Part1-Data Processing
Here, i imported the survey dataset, renamed columns and removed any duplicates.
sna<-read.csv('F:/THESIS/Tanya/R-Folder/data/snaSpring2019.csv')
colnames(sna)[3]<-"StudentName"
colnames(sna)[4]<-"who1"
colnames(sna)[5]<-"numTimes1"
colnames(sna)[6]<-"gender1"
colnames(sna)[7]<-"field1"
colnames(sna)[8]<-"miles1"
colnames(sna)[9]<-"who2"
colnames(sna)[10]<-"numTimes2"
colnames(sna)[11]<-"gender2"
colnames(sna)[12]<-"field2"
colnames(sna)[13]<-"miles2"
colnames(sna)[14]<-"who3"
colnames(sna)[15]<-"numTimes3"
colnames(sna)[16]<-"gender3"
colnames(sna)[17]<-"field3"
colnames(sna)[18]<-"miles3"
anyDuplicated(sna)
## [1] 0
There is one duplicate row of one student and hence was deleted making a total of 17 unique students who participated in the survey. Then, I prepared nodes and edges excel documents ready for social network analysis.
Part II - Social Network Analysis
snanodes<-read.csv('F:/THESIS/Tanya/R-Folder/data/sna_nodesproj.csv')
snaedges<-read.csv('F:/THESIS/Tanya/R-Folder/data/sna_edgesproj.csv')
schoolnetwork<- graph_from_data_frame(d=snaedges, vertices = snanodes, directed=F)
plot(schoolnetwork)
E(schoolnetwork)$color <- ifelse(E(schoolnetwork)$field == "Yes", "green", ifelse(E(schoolnetwork)$field == "No", "red","blue"))
V(schoolnetwork)$color <- ifelse(V(schoolnetwork)$gender == "male", "blue", ifelse(V(schoolnetwork)$gender == "female", "yellow","black"))
plot(schoolnetwork, vertex.frame.color="white")
legend("bottomright", c("Male","Female", "Don't Know"), pch=25,
col="#777777", pt.bg=c("blue","yellow","black"), title="Gender",pt.cex=2, cex=.8)
legend("topleft", c("Similar","Not Similar", "Don't Know"),
col=c("green","black","orange"),title="Professional Field", lty=1, cex=.8)
The above network map connect the students with edges colored based on the similarity of the professional field. Likewise, the vertices are colored based on the gender. There are two black circles indicating that either none of them indicated these two individuals in the three student details collected from the survey. From the graph, I can conclude that there are majority of the blue edges indicating that there are more students who have different professional/field experience among the group.
E(schoolnetwork)$color <- ifelse(E(schoolnetwork)$numtimes >=4, "green", "blue")
V(schoolnetwork)$color <- ifelse(E(schoolnetwork)$miles == -999, "blue", ifelse(E(schoolnetwork)$miles >1000, "yellow","pink"))
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
plot(schoolnetwork, vertex.frame.color="white")
legend("bottomright", c("Miles>3000","Miles<3000", "Don't Know"), pch=25,
col="#777777", pt.bg=c("blue","yellow","pink"), title="Distance",pt.cex=2, cex=.8)
legend("topleft", c("More than 4","Less than 4"),
col=c("green","blue"),title="Number of times", lty=1, cex=.8)
The figure highligthed that lot of students do not know the distance their peers reside. Majority of the students only meet less than 4 times from the blue edges between the students. Vinay doesnt interact with others. It is interesting to observe that in some clusters, the members do not know where the peers reside.
clusters(schoolnetwork)
## $membership
## Barbara Susuhwe Asiedu Oluwatobi Akinyemi Oluwadamilola Akanni
## 1 1 1
## Kanika Bhalla Subhash Pemmaraju Raghu Mohan Sanugommula
## 1 1 1
## Karan Ketan Parekh Vinay Kumar Aavula Jiaoyan Zhang
## 1 2 3
## Jiaqi Fan Doruk Sinan Adali Nazli Mergen
## 1 1 1
## Pavan Lakhubhai Chavda Tongtian Fan Shriya Deshmukh
## 1 3 1
## Yash Rajiv Pillai Tianhe Wang Zeeshan Bashir
## 1 3 1
## Paritosh Tiwari Rohan Mayank Mashru Lan Shen
## 1 1 3
## Tu Ngoc Anh Nguyen
## 1
##
## $csize
## [1] 17 1 4
##
## $no
## [1] 3
clusterschool<-cluster_walktrap(schoolnetwork)
modularity(clusterschool)
## [1] 0.6445568
membership(clusterschool)
## Barbara Susuhwe Asiedu Oluwatobi Akinyemi Oluwadamilola Akanni
## 3 3 1
## Kanika Bhalla Subhash Pemmaraju Raghu Mohan Sanugommula
## 1 4 4
## Karan Ketan Parekh Vinay Kumar Aavula Jiaoyan Zhang
## 4 6 5
## Jiaqi Fan Doruk Sinan Adali Nazli Mergen
## 1 2 2
## Pavan Lakhubhai Chavda Tongtian Fan Shriya Deshmukh
## 3 5 2
## Yash Rajiv Pillai Tianhe Wang Zeeshan Bashir
## 1 5 3
## Paritosh Tiwari Rohan Mayank Mashru Lan Shen
## 1 4 5
## Tu Ngoc Anh Nguyen
## 2
plot(clusterschool,schoolnetwork)
There are average of 1 to 5 students in clusters as shown in the diagram. The largest interconnected cluster has 9 students out of the total 22 students. We can conclude those students working in smaller cliques might be working on group projects due to their close proximity.
Part III: Social Network Analysis Statistics
Desribing the network
When doing the netwokr analysis, it is important to understand the position of the nodes. In this case, the centrality defined the position of the individuals with reference to the circumference of the network.
Degree centrality
It measures the number of connections between one node and all other remaining nodes. In this network, olawatobi Akinyemi and Oluwadamilola Akanni has the highest degree of 6 connections with other nodes. These two students might engage in active collaboration within the class members.
degree(schoolnetwork, mode="all")
## Barbara Susuhwe Asiedu Oluwatobi Akinyemi Oluwadamilola Akanni
## 5 6 6
## Kanika Bhalla Subhash Pemmaraju Raghu Mohan Sanugommula
## 4 6 5
## Karan Ketan Parekh Vinay Kumar Aavula Jiaoyan Zhang
## 5 2 4
## Jiaqi Fan Doruk Sinan Adali Nazli Mergen
## 1 3 3
## Pavan Lakhubhai Chavda Tongtian Fan Shriya Deshmukh
## 4 4 5
## Yash Rajiv Pillai Tianhe Wang Zeeshan Bashir
## 3 5 1
## Paritosh Tiwari Rohan Mayank Mashru Lan Shen
## 3 3 3
## Tu Ngoc Anh Nguyen
## 1
Closeness Centrality
It is measured as the closeness of one node to all the other nodes including those not connected in the network. Shriya Deshmukh has the highest closeness centrality in this network indicating that she is very well connected with most of the class members.
closeness(schoolnetwork, mode = "all", weights = NA, normalized = T)
## Warning in closeness(schoolnetwork, mode = "all", weights = NA, normalized
## = T): At centrality.c:2784 :closeness centrality is not well-defined for
## disconnected graphs
## Barbara Susuhwe Asiedu Oluwatobi Akinyemi Oluwadamilola Akanni
## 0.11797753 0.12650602 0.13375796
## Kanika Bhalla Subhash Pemmaraju Raghu Mohan Sanugommula
## 0.14000000 0.13043478 0.12068966
## Karan Ketan Parekh Vinay Kumar Aavula Jiaoyan Zhang
## 0.12068966 0.04545455 0.05263158
## Jiaqi Fan Doruk Sinan Adali Nazli Mergen
## 0.12500000 0.12883436 0.12883436
## Pavan Lakhubhai Chavda Tongtian Fan Shriya Deshmukh
## 0.11731844 0.05263158 0.14093960
## Yash Rajiv Pillai Tianhe Wang Zeeshan Bashir
## 0.13815789 0.05263158 0.10880829
## Paritosh Tiwari Rohan Mayank Mashru Lan Shen
## 0.13725490 0.12068966 0.05263158
## Tu Ngoc Anh Nguyen
## 0.12804878
Betweeness centrality
It is measured as the distance of one node from the weakest node in the network. Kanika Bhalla has the highest betweeness centrality in the network.
betweenness(schoolnetwork, directed=F, weights=NA, normalized = T)
## Barbara Susuhwe Asiedu Oluwatobi Akinyemi Oluwadamilola Akanni
## 0.07142857 0.18571429 0.22857143
## Kanika Bhalla Subhash Pemmaraju Raghu Mohan Sanugommula
## 0.28095238 0.18571429 0.00000000
## Karan Ketan Parekh Vinay Kumar Aavula Jiaoyan Zhang
## 0.00000000 0.00000000 0.00000000
## Jiaqi Fan Doruk Sinan Adali Nazli Mergen
## 0.00000000 0.00000000 0.00000000
## Pavan Lakhubhai Chavda Tongtian Fan Shriya Deshmukh
## 0.00000000 0.00000000 0.26666667
## Yash Rajiv Pillai Tianhe Wang Zeeshan Bashir
## 0.24761905 0.00000000 0.00000000
## Paritosh Tiwari Rohan Mayank Mashru Lan Shen
## 0.14285714 0.00000000 0.00000000
## Tu Ngoc Anh Nguyen
## 0.00000000
Dendogram of various clusters in the schoolnetwork
hclust_avg<- hclust(dist(snanodes))
## Warning in dist(snanodes): NAs introduced by coercion
cut_avg <- cutree(hclust_avg, k = 3)
plot(hclust_avg)
rect.hclust(hclust_avg , k = 3)
abline(h = 20, col = 'red')
This dendogram underlined how the students in the network are broken into different clusters as illustrated in the above cluster map.
degree(schoolnetwork)
## Barbara Susuhwe Asiedu Oluwatobi Akinyemi Oluwadamilola Akanni
## 5 6 6
## Kanika Bhalla Subhash Pemmaraju Raghu Mohan Sanugommula
## 4 6 5
## Karan Ketan Parekh Vinay Kumar Aavula Jiaoyan Zhang
## 5 2 4
## Jiaqi Fan Doruk Sinan Adali Nazli Mergen
## 1 3 3
## Pavan Lakhubhai Chavda Tongtian Fan Shriya Deshmukh
## 4 4 5
## Yash Rajiv Pillai Tianhe Wang Zeeshan Bashir
## 3 5 1
## Paritosh Tiwari Rohan Mayank Mashru Lan Shen
## 3 3 3
## Tu Ngoc Anh Nguyen
## 1
hist(degree(schoolnetwork))
From the histogram, the degree of the connection of students range from 1 to 6. It is observered that 2 to 4 are recorded highest number of connections between students. The average number of edges per node is found as 3.72 indicating that there are atleast 3 edges from each node within the network. the only exception is Vinay who is an outlier with no connections in the network.
Part IV: Results
The survey dataset presented significant insights on the social network behavior with the help of network analysis tools. To begin with, there are more male students compared to the female students in the class. A good number of students do not know their peer professional field or have different careers path. Only a few students worked with members of the class more than four times indicating that they did not study for long periods at the university. Even though students are in the same cluster, majority of the students do not know where their cluster member reside or how far they stay. This indicates that either the students keep their private life seperate from the academic life.
Part V: Conclusion
From the social network analysis, I was able to understand the degree of connections within the student network and a good number of 3.72 connections per node is still good.The centrality degree provided good insights on closeness within the group. I would conclude that this is a very resourceful exercise to apply the fundamentals of social network analysis principles on a dataset.