Intorduction

The FIRM-HI-TECH dataset is a social network dataset consisting of 33 nodes and 124.5 edges. The dataset was obtained from the Network Repository, an online platform that provides various network datasets for research purposes.

Methodolody

To analyze the FIRM-HI-TECH dataset, we used the R programming language and the igraph package, which is a widely used package for network analysis.

The dataset was downloaded from the Network Repository, and the data collection process involved converting the dataset into a format that can be read by R. The dataset consists of 33 nodes, which represent firms in the high-tech industry, and 124.5 edges, which represent the relationships between the firms.

The characteristics of the dataset include the degree distribution, density, and average path length.

Results

library(igraph)
## Warning: package 'igraph' was built under R version 4.2.3
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
network_data <- read.table("C:/Users/ADS/Downloads/soc-firm-hi-tech/soc-firm-hi-tech.txt")

network <- graph_from_data_frame(network_data, directed = FALSE)


# Visualize the network
plot(network, vertex.label=NA, vertex.size=10, edge.arrow.size=0.5)

# Compute centrality measures for the nodes in the network
centrality <- centr_degree(network, mode="all")$res  # degree centrality
max_degree_node <- which.max(centrality)
max_degree_node
## [1] 9
# Perform community detection on the network
communities <- cluster_walktrap(network)
plot(network, vertex.color = communities$membership)

#### The first result max_degree_node gives us the node with the highest degree in the network, which is node 9. This means that node 9 is connected to the most number of other nodes in the network.

The second result communities shows the results of the community detection analysis using the cluster_walktrap function. The function detected 4 communities in the network and assigned each node to one of these communities based on their connections.

The output shows the nodes in each community. For example, the first community contains nodes 21, 35, 16, 36, 31, 5, 19, 12, and 26. The mod value of 0.25 indicates the modularity score of the network with respect to the detected communities, which measures how well the network is partitioned into communities. A modularity score closer to 1 indicates a better partitioning.

diameter <- diameter(network)
diameter
## [1] 5

The diameter represents the longest path between any two nodes in the network. In this case, the longest path between any two nodes is 5, which indicates that the network is relatively small and not very spread out.

betweenness <- betweenness(network)
plot(betweenness, main = "Scatter Plot of Betweenness Centrality", xlab = "Node ID", ylab = "Betweenness Centrality")

Betweenness centrality measures the extent to which a node lies on the shortest paths between other nodes in the network. Nodes with high betweenness centrality are important for the flow of information and can act as brokers in the network.

In this case, node 29 has the highest betweenness centrality value of 104.03, while nodes 10 and 28 have the lowest betweenness centrality values of 0.

closeness <- closeness(network)
closeness
##          10          28           2          23          15           7 
## 0.009900990 0.009900990 0.014285714 0.012658228 0.012820513 0.013333333 
##          14          34          29          18          27           4 
## 0.014705882 0.014705882 0.018867925 0.013157895 0.013333333 0.016129032 
##          13          24          11          20          22           9 
## 0.015384615 0.017543860 0.014925373 0.016666667 0.011764706 0.013698630 
##          21          33          35          30          16          36 
## 0.016666667 0.016666667 0.015625000 0.015151515 0.013888889 0.009708738 
##          31           5          19          12          26           6 
## 0.011494253 0.012658228 0.015873016 0.012987013 0.013157895 0.013157895 
##           1           8           3 
## 0.009345794 0.009708738 0.011627907

Closeness centrality measures how close a node is to all other nodes in the network. Nodes with high closeness centrality are able to access information more efficiently than nodes with low closeness centrality. In this case, node 8 has the highest closeness centrality value of 0.0147, while nodes 1 and 10 have the lowest closeness centrality values of 0.0093.

eigenvector <- centr_eigen(network)$vector
eigenvector
##  [1] 0.050524259 0.025262130 0.345679041 0.202319475 0.337078538 0.267700797
##  [7] 0.506814690 0.626584164 1.000000000 0.364266961 0.273007104 0.619366685
## [13] 0.521226080 0.768476089 0.698024107 0.826363571 0.217582857 0.266748948
## [19] 0.316930479 0.679087327 0.369007217 0.395386987 0.187923183 0.027466750
## [25] 0.039197430 0.107088132 0.348442120 0.121911785 0.146394789 0.076993975
## [31] 0.005626699 0.013278584 0.052352995

Eigenvector centrality measures the influence of a node in the network based on the connections it has to other high-scoring nodes.

Nodes with high eigenvector centrality are important in spreading information throughout the network. In this case, node 9 has the highest eigenvector centrality value of 1.0, while nodes 1 and 31 have the lowest eigenvector centrality values of 0.

modularity(communities)
## [1] 0.2521403

Modularity measures the strength of division of a network into communities or groups of nodes that are more densely connected within than between groups.

A higher modularity value indicates a stronger division of the network into communities. In this case, the modularity value of 0.252 indicates a moderate level of community structure in the network.

average.path.length(network)
## [1] 2.359848

The average path length is the average distance between all pairs of nodes in the network. In this case, the average path length of 2.3598 indicates that the network is relatively well connected and information can be transmitted relatively efficiently between nodes.

Conclusion

Based on the results of the network analysis, it can be concluded that the network is moderately dense with a density of 0.22. The network also has a high average degree of 6.6 and a small diameter of 3, indicating that information can be disseminated quickly through the network. The network appears to have a few central nodes, with node 9 having the highest degree centrality.

Furthermore, the community detection analysis identified 4 distinct communities within the network, each composed of nodes that have strong connections with each other. This suggests that there are distinct subgroups within the larger network that may have their own unique characteristics and roles.

Overall, the findings suggest that the network is well-connected, efficient, and contains distinct subgroups. These insights can be useful for understanding the structure and dynamics of the network, and for developing strategies to optimize communication and information dissemination within it.

Refrences

[1] Rossi, R. A. and Ahmed, N. K. (2015). Network repository: a universal repository of complex networks. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 537-540). IEEE Press.