install.packages("igraph")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::%--%()      masks igraph::%--%()
## ✖ dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
## ✖ purrr::compose()       masks igraph::compose()
## ✖ tidyr::crossing()      masks igraph::crossing()
## ✖ dplyr::filter()        masks stats::filter()
## ✖ dplyr::lag()           masks stats::lag()
## ✖ purrr::simplify()      masks igraph::simplify()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Explanation

In this step we install and load the packages needed for the project.

The igraph package is used for network analysis. It allows us to create graphs, calculate centrality measures, and visualize relationships between nodes.

The tidyverse package is used for working with data. It includes tools that make it easier to organize, manipulate, and analyze datasets.

These packages give us the main tools we need to build and analyze the social network.

url <- "https://raw.githubusercontent.com/bb2955/Data-620/main/plot_davis_club.py"

lines <- readLines(url)

head(lines)
## [1] "\"\"\""                    "=========="               
## [3] "Davis Club"                "=========="               
## [5] ""                          "Davis Southern Club Women"

Explanation

In this step we load the dataset file directly from GitHub.

The url variable stores the location of the dataset file on GitHub. Using the raw file link allows R to read the contents directly.

The readLines() function reads the file line by line and stores it in an object called lines.

The head() function shows the first few lines of the file so we can quickly check that the file loaded correctly.

This step allows us to access the dataset without downloading it manually.

women <- c(
"Evelyn Jefferson","Laura Mandeville","Theresa Anderson","Brenda Rogers",
"Charlotte McDowd","Frances Anderson","Eleanor Nye","Pearl Oglethorpe",
"Ruth DeSand","Verne Sanderson","Myra Liddel","Katherina Rogers",
"Sylvia Avondale","Nora Fayette","Helen Lloyd","Dorothy Murchison",
"Olivia Carleton","Flora Price"
)

events <- paste0("E",1:14)

Explanation

This step defines the nodes that will appear in our network.

The women object contains a list of the 18 women who participated in the social events.

The events object creates labels for the 14 events. The paste0() function automatically generates names from E1 to E14.

These two groups represent the two types of nodes in our bipartite network.

attendance <- matrix(0, nrow=18, ncol=14)

attendance[c(1,2,3,4,5,6,7,8),1] <- 1
attendance[c(1,2,3,4,5,6,7,8),2] <- 1
attendance[c(1,2,3,4,5,6,7),3] <- 1
attendance[c(1,2,3,4,5),4] <- 1
attendance[c(1,2,3,4,5,6),5] <- 1
attendance[c(1,2,3,5,6,7),6] <- 1
attendance[c(2,3,6,7,8,9),7] <- 1
attendance[c(1,2,3,7,8,10),8] <- 1
attendance[c(1,3,7,8,9,10,11),9] <- 1
attendance[c(11,12,13,14,15,16),10] <- 1
attendance[c(12,13,14,15,16,17,18),11] <- 1
attendance[c(13,14,15,16,17),12] <- 1
attendance[c(13,14,15,16,17),13] <- 1
attendance[c(14,15,16,17,18),14] <- 1

rownames(attendance) <- women
colnames(attendance) <- events

Explanation

This first line creates an empty matrix that will store the attendance data.

The matrix has 18 rows representing the women and 14 columns representing the events.

All values start at 0, which means no attendance has been recorded yet.

Later we change some values to 1 to show when a woman attended a particular event.

This matrix is the main structure that represents the network.

The lines after attendance add attendance information to the matrix.

The numbers inside the brackets represent the women who attended the event. The column number represents the event.

For example, this code sets the value to 1 for several women in Event 1, meaning they attended that event.

A value of 1 means attendance, while 0 means the woman did not attend.

By filling in the matrix this way, we recreate the original dataset.

For rownames and colnames, thi step adds labels to the rows and columns of the matrix.

Row names are assigned to the list of women, and column names are assigned to the events.

This makes the data easier to read and allows us to identify each node when building the network.

g <- graph_from_incidence_matrix(attendance)
## Warning: `graph_from_incidence_matrix()` was deprecated in igraph 1.6.0.
## ℹ Please use `graph_from_biadjacency_matrix()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plot(g,
     vertex.color=ifelse(V(g)$type,"pink","lightblue"),
     vertex.size=7,
     vertex.label.cex=.7,
     main="Southern Women Bipartite Network")

This code chunk converts the attendance matrix into a network graph.

The graph_from_incidence_matrix() function recognizes that the matrix represents a bipartite relationship between two groups: women and events.

The result is stored in the object g, which is our network graph.

This graph structure allows us to analyze the relationships between nodes.

Next, the plot code creates a visual representation of the network.

The plot() function draws the network graph.

The vertex.color option colors the nodes based on their type. Events appear in pink and women appear in light blue.

The vertex.size adjusts the size of the nodes so they are easier to see.

The vertex.label.cex controls the label size.

The title helps explain what the graph represents.

Visualizing the network makes it easier to understand the relationships between participants and events.

degree_values <- degree(g)

centrality <- data.frame(
Node = names(degree_values),
DegreeCentrality = degree_values
)

centrality %>% arrange(desc(DegreeCentrality))

Explanation

This section of code calculates and organizes the degree centrality of each node in the network.

First, the degree() function calculates how many connections each node has in the graph. In this dataset, degree centrality tells us how many events each woman attended or how many women attended each event.

Next, the results are placed into a data frame using the data.frame() function. This step makes the results easier to read and analyze by organizing them into a table with two columns: the node name and its degree centrality value.

Finally, the arrange() function from the tidyverse is used to sort the results in descending order. This allows us to quickly identify the most connected nodes in the network, meaning the women who attended the most events and the events that had the highest attendance.

This step helps us understand which participants and events play the most important roles in the network.

women_network <- bipartite_projection(g)$proj1

plot(women_network,
     vertex.color="lightblue",
     vertex.size=8,
     main="Women Social Network")

Explanation

This code creates and displays a network that shows how the women are connected to each other.

The function bipartite_projection(g) takes the original network, which includes both women and events, and separates it into two new networks. One network contains only the women and the other contains only the events.

The $proj1 part selects the network that contains only the women. In this new network, two women are connected if they attended the same event. This helps show the social relationships between the women based on shared event participation.

The plot() function then draws the network so we can see the connections visually. The argument vertex.color=“lightblue” colors the nodes so they are easy to see. The argument vertex.size=8 makes the nodes larger so the graph is clearer. The main argument adds a title to the graph so it is clear that the visualization represents the women’s social network.

event_network <- bipartite_projection(g)$proj2

plot(event_network,
     vertex.color="pink",
     vertex.size=8,
     main="Event Network")

Explanation

This code creates and shows a network that focuses only on the events.

The function bipartite_projection(g) takes the original network, which contains both women and events, and splits it into two separate networks. One network shows relationships between the women, and the other shows relationships between the events.

The $proj2 part selects the network that contains only the events. In this network, two events are connected if the same women attended both events. This helps us see which events had overlapping participants.

The plot() function is used to draw the network so we can see the connections visually. The argument vertex.color=“pink” colors the event nodes so they are easy to identify. The argument vertex.size=8 makes the nodes larger so they are easier to see. The main argument adds a title to the graph so it is clear that the visualization represents the event network.

Relationships Between Women

The projected network of women shows how the women are connected through the social events they attended. Two women are connected if they attended the same event. This allows us to see which women were most socially active and which ones interacted with many other members of the group.

Women who attended many events tend to have higher degree centrality. This means they are connected to more people in the network because they were present at more gatherings. These women likely played an important role in the social structure of the group. They acted as connectors or bridges, linking different social circles together.

For example, women who appear in many events likely knew most of the other women in the group. Because they attended several gatherings, they had more opportunities to interact with others and build relationships. This increases their importance in the network.

On the other hand, some women attended fewer events. These women have lower centrality values and fewer connections in the network. This suggests they may have been part of smaller social circles or were less active in the community gatherings.

Overall, the network suggests that the group had a core set of highly active members who regularly attended events and helped connect the rest of the group. Around this core group are other women who participated less frequently and therefore had fewer connections.

Relationships Between Social Events

The event projection shows how the events are connected based on shared attendees. Two events are connected if the same women attended both events.

Events that share many of the same attendees will have stronger connections to other events. These events likely had larger attendance or were especially popular among the women in the group. Because many of the same participants attended these events, they helped strengthen the social connections between members of the community.

Events with higher centrality can be thought of as important gathering points. These events likely brought together many women who may not have interacted otherwise. In this way, these events played an important role in maintaining the social structure of the group.

Events with lower centrality had fewer shared attendees with other events. This suggests that these gatherings may have been smaller, more selective, or attended by a specific subgroup of women.

Looking at the network overall, it appears that some events served as major social hubs, while others were more limited in participation.

Overall Network Structure

When looking at the entire network, a clear pattern appears. A group of highly active women attended many events, and these events often had overlapping participants. This creates a dense central part of the network where many nodes are connected to each other.

Around this core structure are smaller groups of women who attended fewer events and therefore have fewer connections. This pattern is common in real social networks, where a small number of individuals are highly connected while others participate less frequently.

This structure suggests that the social group likely had key individuals and key events that helped keep the community connected.

Conclusion

This project used the Southern Women dataset to examine how relationships can form through shared participation in social events. By modeling the data as a bipartite network, we were able to analyze connections between two different types of nodes: women and social events. This approach helps show how individual participation in events can create a broader social structure.

The results show that some women attended many events and therefore became highly connected within the network. These women had higher degree centrality values, meaning they shared events with many other participants. Because of this, they likely played an important role in linking different social circles together. Women who attended fewer events had fewer connections, suggesting they may have been less involved in the overall social group or interacted with a smaller number of people.

The analysis of the events also revealed important patterns. Some events were connected to many other events through shared attendees. These events likely had higher participation and served as important meeting points for the group. Events with lower centrality likely had smaller attendance or involved a more specific subset of women.

When looking at the entire network, the structure suggests that the social group had a core group of highly active participants and several key events that helped maintain the connections within the community. This pattern is common in many real-world social networks, where a small number of individuals and activities play a central role in maintaining relationships across a larger group.

Overall, the Southern Women dataset provides a clear example of how network analysis can be used to understand social relationships. Even though the dataset is small, it demonstrates how patterns of interaction can be identified using tools such as bipartite graphs and centrality measures. These techniques are widely used in fields such as sociology, data science, and network analysis to study how people, organizations, or events are connected.

By analyzing this dataset, we can see how shared participation in activities helps create a network of relationships, highlighting the importance of both individuals and events in shaping social structures.