R Packages for Network Analysis
Francisco Cardozo
University of Miami
January 9, 2025
Who I Am
Francisco Cardozo
PhD Candidate in Prevention Science
Data Scientist
My research focuses on evaluating health interventions and integrating AI into prevention work.
I first encountered network analysis during my master’s degree in Psychology and have been using it ever since. I have also found R to be an excellent tool for network analysis.
GitHub | LinkedIn
Introduction
Goal
Present an overview of R packages for network analysis
Approach
- Extract package descriptions from CRAN
- Use embeddings to identify relevant network packages
- Build two collaboration networks: package dependencies and author collaborations
- Create a Shiny app that visualizes these results
Motivation
- R’s Popularity:
- R is one of the most widely used languages in statistics and social sciences.
- Rich ecosystem with thousands of CRAN packages for various types of analysis.
- Growing Field of Network Analysis:
- Networks help model relationships in social sciences, biology, computer science, and more.
- R offers multiple packages with functions to visualize, analyze, and measure network structures.
Motivation
- Leverage for Collaboration:
- Understanding package authors and their networks can highlight expertise clusters.
- These insights can guide potential collaborations and reveal new development directions.
What is CRAN?
- Comprehensive R Archive Network (CRAN) is the main repository for R packages
- Ensures packages meet certain standards and quality checks
- Offers thousands of packages covering a wide range of disciplines
- Is central to R’s ecosystem: users install packages directly from CRAN
- Facilitates community-driven development and broad collaboration
Description file
Method
- Package Descriptions: Collected text descriptions from CRAN.
- Embeddings: Converted descriptions into numeric vectors to capture semantic meaning.
2.1. Used model from Hugging Face (sentence-transformers)
2.2. Created a subset of some popular Network Analysis packages.
- Cosine Similarity: Measured how closely packages relate to known network-focused packages.
- Selection: Chose a final set of packages above a certain similarity threshold (top 5%).
Dependencies Network
Used “Depends” fields to map how packages depend on each other.
- Node: Package name
- Edge: Dependency
This shows outdegree, where the source is the package and the target is the dependency.
Created a graph illustrating the interplay of R packages.
Author Collaboration Network
Extracted the authors, collaborators, and others involved in package development from the description file.
This produced a bipartite network, where the package is the group and the authors are the members.
Got projections of the bipartite network to get the collaboration network (network of collaborators).
- Node: Author
- Edge: Collaboration
Created a graph illustrating the collaboration network.
The Shiny App
- Purpose: Provide an interactive dashboard to explore:
- Package similarity and relevance to network analysis.
- Dependencies among packages.
- Author collaboration relationships.
The Shiny App
Visit the app here.
Conclusion
Key Findings:
- R’s vast ecosystem offers numerous tools for network analysis.
- Embedding-based similarity searches can efficiently identify related packages.
- Dependency networks reveal deeper connections in the R community. - Author collaboration networks reveal a lack of collaboration in the R community.