friend network

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(igraph)                    # This is the package to analyze the network


Attaching package: 'igraph'

The following objects are masked from 'package:lubridate':

    %--%, union

The following objects are masked from 'package:dplyr':

    as_data_frame, groups, union

The following objects are masked from 'package:purrr':

    compose, simplify

The following object is masked from 'package:tidyr':

    crossing

The following object is masked from 'package:tibble':

    as_data_frame

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

library(visNetwork)                # Creates visualizations of the network
library(DT)
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:igraph':

    groups

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(broom)                     # includes some of our regression functions

friendships <- read_csv("friendships.csv")

Rows: 6972 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): nominator, nominated, score, expected_score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

friendship_nodes <- read_csv("friendship_nodes.csv")

Rows: 84 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): id, name

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

friendships |> 
  group_by(nominator, nominated )

# A tibble: 6,972 × 4
# Groups:   nominator, nominated [6,972]
   nominator nominated score expected_score
       <dbl>     <dbl> <dbl>          <dbl>
 1         1         2     0              0
 2         1         3     0              0
 3         1         4     4              4
 4         1         5     1              1
 5         1         6     1              1
 6         1         7     2              2
 7         1         8     4              4
 8         1         9     3              3
 9         1        10     3              3
10         1        11     5              5
# ℹ 6,962 more rows

friendships |> 
  group_by(nominator, nominated) |> 
  summarize(Mean = round(mean(score), 1)) |> 
  arrange(-Mean) |> 
  datatable(rownames = F,
            caption = 'Average friendship score given/nominated')

`summarise()` has grouped output by 'nominator'. You can override using the
`.groups` argument.

This data tables shows us average friendship scores that were given/nominated. We are also able to see the mean score of the friendships.

friendships_model <- lm(expected_score ~ score, data = friendships)

friendships |> 
  plot_ly(x = ~score, y = ~expected_score) |> 
  add_markers() |> 
  add_lines(y = fitted(friendships_model))

With this plot graph the data shows us the scores that were all given were spread out with one another. The line through the middle of the graph is a regression line. The regression line helps show us an average middle point with the line but there is no correlation because the plots are all spread out.

friends <- friendships |> 
  filter(score > 2) |> 
  select(nominator, nominated) |> 
  rename(from = nominator) |> 
  rename(to= nominated)

friend_network <- graph_from_data_frame(friends,friendship_nodes, directed = T)

friend_network

IGRAPH 4a84f65 DN-- 84 1353 -- 
+ attr: name (v/n)
+ edges from 4a84f65 (vertex names):
  [1] 1-> 4 1-> 8 1-> 9 1->10 1->11 1->16 1->22 1->25 1->29 1->30 1->38 1->40
 [13] 1->43 1->45 1->46 1->52 1->59 1->66 1->67 1->69 1->71 1->72 1->75 1->80
 [25] 1->82 1->83 2->26 2->34 2->57 2->62 2->65 3-> 6 3-> 7 3->19 3->24 3->28
 [37] 3->31 3->35 3->36 3->38 3->41 3->43 3->48 3->50 3->51 3->54 3->61 3->74
 [49] 3->76 3->77 3->84 4-> 1 4->15 4->20 4->30 4->36 4->40 4->52 4->58 4->59
 [61] 4->66 4->70 4->72 4->79 5->21 5->24 5->32 5->34 5->39 5->46 5->56 5->62
 [73] 5->65 5->80 6-> 3 6-> 5 6-> 9 6->12 6->15 6->19 6->20 6->27 6->28 6->35
 [85] 6->36 6->38 6->41 6->44 6->48 6->50 6->51 6->52 6->54 6->58 6->60 6->66
+ ... omitted several edges

degree(friend_network)

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
58  9 39 29 22 51 30 19 39 28 78 28 28 18 30 36 22 30 32 43 24 41 12 55 42 22 
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
 7 23 26 60 43 22 11 22 28 42  9 62 49 46 42 40 33 38 36 25 21 38 16 42 45 50 
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 
23 56  4 78  9 29 41 28 29 20 41 12 32 32 31 38 33 29 45 59  2 23 53 25 26 18 
79 80 81 82 83 84 
19 26  0 39 37 28

degree(friend_network, mode = "in")

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
32  4 19 16 12 20 15 12 23 12 26 14  9 12 11 14 11 12 19 13  9 23  3 25 28 12 
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
 5 16 13 27 25 13  7 14 12 20  3 24 20 26 20 13 19 20 21 15 18 14  5 18 21 28 
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 
13 29  2 23  6 14 24 17 16 11 30  9 16 21 18 22 16 10 19 28  1 15 33 10 19  4 
79 80 81 82 83 84 
14 16  0 16 23 15

degree(friend_network, mode = "out")

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
26  5 20 13 10 31 15  7 16 16 52 14 19  6 19 22 11 18 13 30 15 18  9 30 14 10 
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
 2  7 13 33 18  9  4  8 16 22  6 38 29 20 22 27 14 18 15 10  3 24 11 24 24 22 
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 
10 27  2 55  3 15 17 11 13  9 11  3 16 11 13 16 17 19 26 31  1  8 20 15  7 14 
79 80 81 82 83 84 
 5 10  0 23 14 13

friendship_nodes |> 
  mutate(named_friends = degree ((friend_network), mode = "out"))

# A tibble: 84 × 3
      id  name named_friends
   <dbl> <dbl>         <dbl>
 1     1     1            26
 2     2     2             5
 3     3     3            20
 4     4     4            13
 5     5     5            10
 6     6     6            31
 7     7     7            15
 8     8     8             7
 9     9     9            16
10    10    10            16
# ℹ 74 more rows

friendship_nodes |> 
    mutate(named_friends = degree((friend_network), mode = "out")) |> 
  mutate(named_by_others = degree((friend_network), mode = "in")) |> 
  mutate(popularity = named_by_others - named_friends) |> 
  select(name, named_friends, named_by_others, popularity) |> 
  datatable(rownames = F)

Above is a data table. The data table helps show the names which is in numbers by numbering the student. Then the next column is them naming their friend. The other column is named friends by others. And the final column is the popularity of that person.

friendship_nodes |> 
    mutate(named_friends = degree((friend_network), mode = "out")) |> 
  mutate(named_by_others= degree((friend_network), mode = "in")) |> 
  plot_ly(x = ~named_friends) |> 
  add_histogram() |> 
  layout(title = "How many college students named friends")

Above is a histogram of the college students that named other students as their friends. With this data we are able to see that the highest about friends named was 21. We are able to see that the histogram shows some outliers but a big chunk of friends named is towards the beginning of the histogram.

reciprocity(friend_network, mode = "ratio")

[1] 0.527088

Above is the reciprocity in the network of friends. The data helps show that it is 0.52. So this tells us that half of the friendships are reciprocal.

best_friends <- friendships |> 
  filter(score > 4) |> 
  select(nominator, nominated) |> 
  rename(from = nominator) |> 
  rename(to= nominated)

best_friend_network <- graph_from_data_frame(best_friends,friendship_nodes, directed = T)

friendship_nodes|> 
  mutate(group = membership(infomap.community(best_friend_network))) |> 
  visNetwork(best_friends, main = "Best Friends Network") |> 
  visIgraphLayout(layout = "layout_with_kk") |> 
  visEdges(arrows = "to") |> 
  visOptions(highlightNearest = T, nodesIdSelection = T)

Warning: There was 1 warning in `mutate()`.
ℹ In argument: `group = membership(infomap.community(best_friend_network))`.
Caused by warning:
! `infomap.community()` was deprecated in igraph 2.0.0.
ℹ Please use `cluster_infomap()` instead.

Above is a graph layout that is color coded and the arrows help show us who is best friends with who. By clicking on a circle you are able to see who is friends with who. We are able to see who they surround themselves with and we can see who the outliers are.