Network Analysis Assignment- James Anderson 3/19/2024

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(igraph)                    # This is the package to analyze the network


Attaching package: 'igraph'

The following objects are masked from 'package:lubridate':

    %--%, union

The following objects are masked from 'package:dplyr':

    as_data_frame, groups, union

The following objects are masked from 'package:purrr':

    compose, simplify

The following object is masked from 'package:tidyr':

    crossing

The following object is masked from 'package:tibble':

    as_data_frame

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

library(visNetwork)                # Creates visualizations of the network
library(DT)
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:igraph':

    groups

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(broom)                     # includes some of our regression functions

The data set this assignment used comes from an article which studied the perceptions of college students in relation to their individual levels of friendships. The college students had rated each other on a scale of 0 to 5, with 0 meaning they didn’t know the person, 3 meaning they were seen as a friend, and 5 meaning that they were the participants’ best friend.

students <-read_csv('friendship_nodes.csv')

Rows: 84 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): id, name

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

student_links <- read_csv('friendships.csv')

Rows: 6972 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): nominator, nominated, score, expected_score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Create a data table with scores for nominator and nominated, and be sure to include an appropriate title.

student_links |>
  datatable(rownames = F,
            caption = 'Data Table for Nominator and Nominated Scores and Expected Scores')

This is the data table associated with the scores of both actual and expected scores for the Nominator and Nominated groups in the study. As we see for the most part, the actual and expected scores are the same for the nominator.

2.Plot the scores and the expected_scores, and include a regression line.

friendship_model <- lm(expected_score ~ score, data = student_links)
tidy(friendship_model)

# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   0.0987   0.00739      13.4 3.22e-40
2 score         0.940    0.00352     267.  0

glance(friendship_model)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.911         0.911 0.339    71178.       0     1 -2353. 4713. 4733.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

student_links %>% 
  plot_ly(x = ~score, y = ~expected_score) %>% 
  add_markers() %>% 
  add_lines(y = fitted(friendship_model))

This plot shows the relationship between the actual (x-axis) score, and expected (y-axis) score. From this plot we can see that there is an almost one to one (r-squared= 0.91) ratio between the actual and expected scores.

3.After creating the network object, use the degree function to create a data table with named_friends, named_by_others, and popularity.

friends <- student_links %>% 
  filter(score > 3) %>% 
  select(nominator, nominated) %>% 
  rename(from = nominator) %>% 
  rename(to = nominated)

friendship_network <- graph_from_data_frame(friends, students, directed = T)

students %>% 
  mutate(named_friends = degree(friendship_network), mode = "out") %>% 
  mutate(named_by_others = degree((friendship_network), mode = "in"))%>% 
  mutate(popularity = named_by_others - named_friends) %>% 
  select(named_friends, named_by_others, popularity) %>% 
  datatable(rownames = F)

By subtracting the difference from ‘named by others’ and ‘named friends’ we are able to determine their individual popularity. For the most part we see that a vast majority of individuals have a negative value for popularity. Meaning that most individuals claim they have more friends than how other individuals feel about the friendship.

Create a histogram using plotly of the number of friends that students named.

students %>% 
  mutate(named_friends = degree((friendship_network), mode = "out")) %>% 
  mutate(named_by_others = degree(friendship_network), mode = "in") %>% 
  plot_ly(x = ~named_friends) %>% 
  add_histogram() %>% 
  layout(title = "Amount of Friends Students Named")

This histogram shows how often the college participants listed others as their perceived friend. For the most part, most of students listed themselves as having between 10-20 friends.

Find the reciprocity in the network.

reciprocity(friendship_network, mode = 'ratio')

[1] 0.5205479

The rate of reciprocity for this data set is about 52%. Meaning that a little more than half of the time, the feelings of friendship were mutual between participants.

Because there are too many students to visualize all friendships clearly, create a new network using only best friends. Size the nodes by degree, and include different colors for the communities.

In the scaling format of this study, ‘best friends’ were only considered when scored as a variable of ‘5’.

bestfriends <- student_links %>% 
  filter(score > 4) %>% 
  select(nominator, nominated) %>% 
  rename(from = nominator) %>% 
  rename (to = nominated)

bestfriendship_network <- graph_from_data_frame(bestfriends, students, directed = T)

students %>% 
  mutate(group = membership(infomap.community(bestfriendship_network))) %>% 
  mutate(value = degree(bestfriendship_network)) %>% 
  visNetwork(friends, main = "Network of Best Friendships") %>% 
  visIgraphLayout(layout = "layout_nicely") %>% 
  visEdges(arrows = "to") %>% 
  visOptions(highlightNearest = T, nodesIdSelection = T)

Warning: There was 1 warning in `mutate()`.
ℹ In argument: `group = membership(infomap.community(bestfriendship_network))`.
Caused by warning:
! `infomap.community()` was deprecated in igraph 2.0.0.
ℹ Please use `cluster_infomap()` instead.

This cluster map is used to visualize the “bestfriend-ships” within the dataset. As ‘best friends’ were only seen as a value of ‘5’ we must use values >4 to meet the filtered needs.