R Packages for Network Analysis

Francisco Cardozo

University of Miami

January 9, 2025

Who I Am

Francisco Cardozo

PhD Candidate in Prevention Science

Data Scientist

My research focuses on evaluating health interventions and integrating AI into prevention work.

I first encountered network analysis during my master’s degree in Psychology and have been using it ever since. I have also found R to be an excellent tool for network analysis.

GitHub | LinkedIn

Introduction

Goal

Present an overview of R packages for network analysis

Approach

  1. Extract package descriptions from CRAN
  2. Use embeddings to identify relevant network packages
  3. Build two collaboration networks: package dependencies and author collaborations
  4. Create a Shiny app that visualizes these results

Motivation

  • R’s Popularity:
    • R is one of the most widely used languages in statistics and social sciences.
    • Rich ecosystem with thousands of CRAN packages for various types of analysis.
  • Growing Field of Network Analysis:
    • Networks help model relationships in social sciences, biology, computer science, and more.
    • R offers multiple packages with functions to visualize, analyze, and measure network structures.

Motivation

  • Leverage for Collaboration:
    • Understanding package authors and their networks can highlight expertise clusters.
    • These insights can guide potential collaborations and reveal new development directions.

What is CRAN?

  • Comprehensive R Archive Network (CRAN) is the main repository for R packages
  • Ensures packages meet certain standards and quality checks
  • Offers thousands of packages covering a wide range of disciplines
  • Is central to R’s ecosystem: users install packages directly from CRAN
  • Facilitates community-driven development and broad collaboration

Description file

Method

  1. Package Descriptions: Collected text descriptions from CRAN.
  2. Embeddings: Converted descriptions into numeric vectors to capture semantic meaning.
    2.1. Used model from Hugging Face (sentence-transformers)
    2.2. Created a subset of some popular Network Analysis packages.
  3. Cosine Similarity: Measured how closely packages relate to known network-focused packages.
  4. Selection: Chose a final set of packages above a certain similarity threshold (top 5%).

Dependencies Network

Used “Depends” fields to map how packages depend on each other.

  • Node: Package name
  • Edge: Dependency

This shows outdegree, where the source is the package and the target is the dependency.

Created a graph illustrating the interplay of R packages.

Author Collaboration Network

Extracted the authors, collaborators, and others involved in package development from the description file.

This produced a bipartite network, where the package is the group and the authors are the members.

Got projections of the bipartite network to get the collaboration network (network of collaborators).

  • Node: Author
  • Edge: Collaboration

Created a graph illustrating the collaboration network.

The Shiny App

  • Purpose: Provide an interactive dashboard to explore:
    • Package similarity and relevance to network analysis.
    • Dependencies among packages.
    • Author collaboration relationships.

The Shiny App

Visit the app here.

Conclusion

Key Findings:
- R’s vast ecosystem offers numerous tools for network analysis.
- Embedding-based similarity searches can efficiently identify related packages.
- Dependency networks reveal deeper connections in the R community. - Author collaboration networks reveal a lack of collaboration in the R community.

Thank You!

Questions?