Network Analysis Assignment- James Anderson 3/19/2024
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(igraph) # This is the package to analyze the network
Attaching package: 'igraph'
The following objects are masked from 'package:lubridate':
%--%, union
The following objects are masked from 'package:dplyr':
as_data_frame, groups, union
The following objects are masked from 'package:purrr':
compose, simplify
The following object is masked from 'package:tidyr':
crossing
The following object is masked from 'package:tibble':
as_data_frame
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
library(visNetwork) # Creates visualizations of the networklibrary(DT)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:igraph':
groups
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(broom) # includes some of our regression functions
The data set this assignment used comes from an article which studied the perceptions of college students in relation to their individual levels of friendships. The college students had rated each other on a scale of 0 to 5, with 0 meaning they didn’t know the person, 3 meaning they were seen as a friend, and 5 meaning that they were the participants’ best friend.
students <-read_csv('friendship_nodes.csv')
Rows: 84 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): id, name
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
student_links <-read_csv('friendships.csv')
Rows: 6972 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): nominator, nominated, score, expected_score
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Create a data table with scores for nominator and nominated, and be sure to include an appropriate title.
student_links |>datatable(rownames = F,caption ='Data Table for Nominator and Nominated Scores and Expected Scores')
This is the data table associated with the scores of both actual and expected scores for the Nominator and Nominated groups in the study. As we see for the most part, the actual and expected scores are the same for the nominator.
2.Plot the scores and the expected_scores, and include a regression line.
friendship_model <-lm(expected_score ~ score, data = student_links)tidy(friendship_model)
student_links %>%plot_ly(x =~score, y =~expected_score) %>%add_markers() %>%add_lines(y =fitted(friendship_model))
This plot shows the relationship between the actual (x-axis) score, and expected (y-axis) score. From this plot we can see that there is an almost one to one (r-squared= 0.91) ratio between the actual and expected scores.
3.After creating the network object, use the degree function to create a data table with named_friends, named_by_others, and popularity.
By subtracting the difference from ‘named by others’ and ‘named friends’ we are able to determine their individual popularity. For the most part we see that a vast majority of individuals have a negative value for popularity. Meaning that most individuals claim they have more friends than how other individuals feel about the friendship.
Create a histogram using plotly of the number of friends that students named.
students %>%mutate(named_friends =degree((friendship_network), mode ="out")) %>%mutate(named_by_others =degree(friendship_network), mode ="in") %>%plot_ly(x =~named_friends) %>%add_histogram() %>%layout(title ="Amount of Friends Students Named")
This histogram shows how often the college participants listed others as their perceived friend. For the most part, most of students listed themselves as having between 10-20 friends.
Find the reciprocity in the network.
reciprocity(friendship_network, mode ='ratio')
[1] 0.5205479
The rate of reciprocity for this data set is about 52%. Meaning that a little more than half of the time, the feelings of friendship were mutual between participants.
Because there are too many students to visualize all friendships clearly, create a new network using only best friends. Size the nodes by degree, and include different colors for the communities.
In the scaling format of this study, ‘best friends’ were only considered when scored as a variable of ‘5’.
students %>%mutate(group =membership(infomap.community(bestfriendship_network))) %>%mutate(value =degree(bestfriendship_network)) %>%visNetwork(friends, main ="Network of Best Friendships") %>%visIgraphLayout(layout ="layout_nicely") %>%visEdges(arrows ="to") %>%visOptions(highlightNearest = T, nodesIdSelection = T)
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `group = membership(infomap.community(bestfriendship_network))`.
Caused by warning:
! `infomap.community()` was deprecated in igraph 2.0.0.
ℹ Please use `cluster_infomap()` instead.
This cluster map is used to visualize the “bestfriend-ships” within the dataset. As ‘best friends’ were only seen as a value of ‘5’ we must use values >4 to meet the filtered needs.