主要議題:社會網路簡介
學習重點:
rm(list=ls(all=T))
Sys.setlocale("LC_ALL","C")
## [1] "C"
options(digits=4, scipen=12)
library(magrittr)
library(igraph)
#library(rgl)
Load the data from edges.csv into a data frame called edges, and load the data from users.csv into a data frame called users.
edge= read.csv("data/edges.csv")
users =read.csv("data/users.csv")
2*nrow(edge)/nrow(users)
## [1] 4.949
How many Facebook users are there in our dataset?
In our dataset, what is the average number of friends per user? Hint: this question is tricky, and it might help to start by thinking about a small example with two users who are friends.
Out of all the students who listed a school, what was the most common locale?
table(users$school, users$locale)
##
## A B
## 3 6 31
## A 0 0 17
## AB 0 0 2
Is it possible that either school A or B is an all-girls or all-boys school?
table(users$gender, users$school)
##
## A AB
## 1 1 0
## A 11 3 1
## B 28 13 1
We will be using the igraph package to visualize networks; install and load this package using the install.packages and library commands.
We can create a new graph object using the graph.data.frame() function. Based on ?graph.data.frame, which of the following commands will create a graph g describing our social network, with the attributes of each user correctly loaded?
Note: A directed graph is one where the edges only go one way – they point from one vertex to another. The other option is an undirected graph, which means that the relations between the vertices are symmetric.
#install.packages("igraph")
library(igraph)
g= graph.data.frame(d=edge, directed = F, vertices = users)
Use the correct command from Problem 2.1 to load the graph g.
Now, we want to plot our graph. By default, the vertices are large and have text labels of a user’s identifier. Because this would clutter the output, we will plot with no text labels and smaller vertices:
plot(g, vertex.size=5, vertex.label=NA)
In this graph, there are a number of groups of nodes where all the nodes in each group are connected but the groups are disjoint from one another, forming “islands” in the graph. Such groups are called “connected components,” or “components” for short. How many connected components with at least 2 nodes are there in the graph?
How many users are there with no friends in the network?
In our graph, the “degree” of a node is its number of friends. We have already seen that some nodes in our graph have degree 0 (these are the nodes with no friends), while others have much higher degree. We can use degree(g) to compute the degree of all the nodes in our graph g.
degree(g) %>% sort
## 3984 4008 4010 4015 4022 4024 4035 3983 3987 4001 4006 4012 4028 4029 4032
## 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
## 4034 4036 3991 3992 4005 4033 3990 594 3996 3999 4007 4011 4016 4025 4037
## 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3
## 4003 3985 3989 3993 4013 3988 4002 4018 4027 3981 4019 4020 3986 3995 4000
## 4 5 5 5 5 6 6 6 6 7 7 7 8 8 8
## 4017 4026 4038 4004 4009 3994 3997 4021 4031 4014 3982 3998 4023 4030
## 8 8 8 9 9 10 10 10 10 11 13 13 17 18
How many users are friends with 10 or more other Facebook users in this network?
In a network, it’s often visually useful to draw attention to “important” nodes in the network. While this might mean different things in different contexts, in a social network we might consider a user with a large number of friends to be an important user. From the previous problem, we know this is the same as saying that nodes with a high degree are important users.
To visually draw attention to these nodes, we will change the size of the vertices so the vertices with high degrees are larger. To do this, we will change the “size” attribute of the vertices of our graph to be an increasing function of their degrees:
V(g)$size = degree(g)/2+2
Now that we have specified the vertex size of each vertex, we will no longer use the vertex.size parameter when we plot our graph:
plot(g, vertex.label=NA)
table(degree(g))
##
## 0 1 2 3 4 5 6 7 8 9 10 11 13 17 18
## 7 10 4 9 1 4 4 3 6 2 4 1 2 1 1
18/2+2
## [1] 11
0/2+2
## [1] 2
#From table(degree(g)) or summary(degree(g)), we see that the maximum degree of any node in the graph is 18 and the minimum degree of any node is 0. Therefore, the maximum size of any point is 18/2+2=11, and the minimum size is 0/2+2=2
# range(V(g)$size)
What is the largest size we assigned to any node in our graph?
What is the smallest size we assigned to any node in our graph?
Thus far, we have changed the “size” attributes of our vertices. However, we can also change the colors of vertices to capture additional information about the Facebook users we are depicting.
When changing the size of nodes, we first obtained the vertices of our graph with V(g) and then accessed the the size attribute with V(g)\(size. To change the color, we will update the attribute `V(g)\)color`.
To color the vertices based on the gender of the user, we will need access to that variable. When we created our graph g, we provided it with the data frame users, which had variables gender, school, and locale. These are now stored as attributes V(g)\(gender, V(g)\)school, and V(g)$locale.
We can update the colors by setting the color to black for all vertices, than setting it to red for the vertices with gender A and setting it to gray for the vertices with gender B:
V(g)$color = "black"
V(g)$color[V(g)$gender == "A"] = "red"
V(g)$color[V(g)$gender == "B"] = "gray"
Plot the resulting graph.
plot(g, vertex.label=NA)
What is the gender of the users with the highest degree in the graph?
Now, color the vertices based on the school that each user in our network attended.
V(g)$color = "black"
V(g)$color[V(g)$school == "A"] = "purple"
V(g)$color[V(g)$school == "AB"] = "orange"
plot(g, vertex.label=NA)
Are the two users who attended both schools A and B Facebook friends with each other?
What best describes the users with highest degree?
Now, color the vertices based on the locale of the user.
V(g)$color = "black"
V(g)$color[V(g)$locale == "A"] = "purple"
V(g)$color[V(g)$locale == "B"] = "orange"
plot(g, vertex.label=NA)
The large connected component is most associated with which locale?
The 4-user connected component is most associated with which locale?
igraph PlotingWhich igraph plotting function would enable us to plot our graph in 3-D?
#install.packages("rgl")
#library(rgl)
#rglplot(g, vertex.label=NA)
What parameter to the plot() function would we use to change the edge width when plotting g?
plot(g, edge.width=2, vertex.label=NA)