Dataset used

the data set used for this activity was downloaded from the SNAP repository and is called amazon0302. This bipartite data set contains 2 sets of nodes. the first column is the id of a certain item and the second column is the id of another item that was purchased along with the first item. this data is dated from march 02, 2003. this dataset is extremely large so it was trimmed down to the first 20 rows.

library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.0.3
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## Warning: package 'tibble' was built under R version 4.0.3
## Warning: package 'readr' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter()     masks stats::filter()
## x dplyr::group_rows() masks kableExtra::group_rows()
## x dplyr::lag()        masks stats::lag()
library(bipartite)
## Warning: package 'bipartite' was built under R version 4.0.3
## Loading required package: vegan
## Warning: package 'vegan' was built under R version 4.0.3
## Loading required package: permute
## Warning: package 'permute' was built under R version 4.0.3
## Loading required package: lattice
## This is vegan 2.5-7
## Loading required package: sna
## Warning: package 'sna' was built under R version 4.0.3
## Loading required package: statnet.common
## Warning: package 'statnet.common' was built under R version 4.0.3
## 
## Attaching package: 'statnet.common'
## The following object is masked from 'package:base':
## 
##     order
## Loading required package: network
## Warning: package 'network' was built under R version 4.0.3
## network: Classes for Relational Data
## Version 1.16.1 created on 2020-10-06.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Martina Morris, University of Washington
##                     Skye Bender-deMoll, University of Washington
##  For citation information, type citation("network").
##  Type help("network-package") to get started.
## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
##  This is bipartite 2.15.
##  For latest changes see versionlog in ?"bipartite-package". For citation see: citation("bipartite").
##  Have a nice time plotting and analysing two-mode networks.
## 
## Attaching package: 'bipartite'
## The following object is masked from 'package:vegan':
## 
##     nullmodel
library(ggnetwork)
## Warning: package 'ggnetwork' was built under R version 4.0.3
library(igraph)
## Warning: package 'igraph' was built under R version 4.0.3
## 
## Attaching package: 'igraph'
## The following object is masked from 'package:bipartite':
## 
##     strength
## The following objects are masked from 'package:sna':
## 
##     betweenness, bonpow, closeness, components, degree, dyad.census,
##     evcent, hierarchy, is.connected, neighborhood, triad.census
## The following objects are masked from 'package:network':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
##     get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
##     is.directed, list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute
## The following object is masked from 'package:vegan':
## 
##     diversity
## The following object is masked from 'package:permute':
## 
##     permute
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(ggplot2)
library(network)
library(sna)

data = read.table("Amazon0302.txt", header = FALSE)
sub <- data%>%slice(1:20)
kable(head(sub))
V1 V2
0 1
0 2
0 3
0 4
0 5
1 0
matrix = table(sub)

matrix = as.matrix(matrix)

matrix
##    V2
## V1  0 1 2 3 4 5 11 12 13 14 15 63 64 65 66 67
##   0 0 1 1 1 1 1  0  0  0  0  0  0  0  0  0  0
##   1 1 0 1 0 1 1  0  0  0  0  1  0  0  0  0  0
##   2 1 0 0 0 0 0  1  1  1  1  0  0  0  0  0  0
##   3 0 0 0 0 0 0  0  0  0  0  0  1  1  1  1  1

analysis

Next the closeness and betweenness of each node are calculated. closeness indicates how close each node is to every other node in the network.as seen below, node 3 has a smaller distance than the other two nodes. Betweenness measures the number of shortest paths that pass through each vertex. based on the calculations, node 1 and 4 have the highest betweenness. the plotweb function then creates a bipartite graph with the id of the nodes on the bottom and their associated counterparts on top(only 4 items were included in the first 20 rows). this visually shows how each node is connected to one another. Furthermore, a more detailed visualization can be produced showing the groups within the netowork and their connectivity.

closeness_w(matrix)
##      node closeness n.closeness
## [1,]    1 0.5000000   0.2500000
## [2,]    2 0.5000000   0.2500000
## [3,]    3 0.3333333   0.1666667
betweenness_w(matrix)
##       node betweenness
##  [1,]    1          16
##  [2,]    2           2
##  [3,]    3           8
##  [4,]    4          15
##  [5,]    5           0
##  [6,]    6           0
##  [7,]    7           0
##  [8,]    8           0
##  [9,]    9           0
## [10,]   10           0
## [11,]   11           0
## [12,]   12           0
## [13,]   13           0
## [14,]   14           0
## [15,]   15           0
## [16,]   16           0
plotweb(matrix)

data.g = graph_from_incidence_matrix(matrix)
plot.igraph(data.g)