主要議題:社會網路簡介

學習重點:

rm(list=ls(all=T))
Error in names(frame) <- `*vtmp*` : names() applied to a non-vector
Sys.setlocale("LC_ALL","C")
[1] "C"
options(digits=4, scipen=12)
library(magrittr)
library(igraph)
library(rgl)


1 Summarizing the Data

1.1

Load the data from edges.csv into a data frame called edges, and load the data from users.csv into a data frame called users.

edges = read.csv('data/edges.csv')
users = read.csv('data/users.csv')

How many Facebook users are there in our dataset?

nrow(users)
[1] 59

In our dataset, what is the average number of friends per user? Hint: this question is tricky, and it might help to start by thinking about a small example with two users who are friends.

MIT
str(edges) 
'data.frame':   146 obs. of  2 variables:
 $ V1: int  4019 4023 4023 4027 3988 3982 3994 3998 3993 3982 ...
 $ V2: int  4026 4031 4030 4032 4021 3986 3998 3999 3995 4021 ...
nrow(edges)
[1] 146
146*2/59
[1] 4.949
#因為A是B的朋友,同時B也是A的朋友,總共有146對用戶是好友,每一對(A,B)需算兩次,因此146乘以2=292,再除以59
2*nrow(edges)/nrow(users)
[1] 4.949
1.2

Out of all the students who listed a school, what was the most common locale?

MIT
table(users$locale, users$school)
   
        A AB
     3  0  0
  A  6  0  0
  B 31 17  2
print("Locale B")
[1] "Locale B"
table(users$locale)

    A  B 
 3  6 50 
1.3

Is it possible that either school A or B is an all-girls or all-boys school?

table(gender=users$gender, school=users$school)
      school
gender     A AB
        1  1  0
     A 11  3  1
     B 28 13  1
print("No")
[1] "No"
#因為做出來性別A和B在學校A或B都有分布


2 Creating a Network

We will be using the igraph package to visualize networks; install and load this package using the install.packages and library commands.

2.1 Construct a Graph

We can create a new graph object using the graph.data.frame() function. Based on ?graph.data.frame,

library(igraph)
g = graph.data.frame(d=edges, directed=FALSE, vertices=users)

which of the following commands will create a graph g describing our social network, with the attributes of each user correctly loaded?

  • g = graph.data.frame(edges, FALSE, users)

Note: A directed graph is one where the edges only go one way – they point from one vertex to another. The other option is an undirected graph, which means that the relations between the vertices are symmetric.

2.2 Components

Use the correct command from Problem 2.1 to load the graph g.

Now, we want to plot our graph. By default, the vertices are large and have text labels of a user’s identifier. Because this would clutter the output, we will plot with no text labels and smaller vertices:

plot(g, vertex.size=5, vertex.label=NA)

In this graph, there are a number of groups of nodes where all the nodes in each group are connected but the groups are disjoint from one another, forming “islands” in the graph. Such groups are called “connected components,” or “components” for short. How many connected components with at least 2 nodes are there in the graph?

  • 4

How many users are there with no friends in the network? + 7 +

2.3 Degree

In our graph, the “degree” of a node is its number of friends. We have already seen that some nodes in our graph have degree 0 (these are the nodes with no friends), while others have much higher degree. We can use degree(g) to compute the degree of all the nodes in our graph g.

MIT
table(degree(g))

 0  1  2  3  4  5  6  7  8  9 10 11 13 17 18 
 7 10  4  9  1  4  4  3  6  2  4  1  2  1  1 
table(degree(g) >= 10)

FALSE  TRUE 
   50     9 
#10個以上的用戶:4+1+2+1+1=9

How many users are friends with 10 or more other Facebook users in this network?

sum(degree(g) >= 10)
[1] 9
2.4 Size by Degree

In a network, it’s often visually useful to draw attention to “important” nodes in the network. While this might mean different things in different contexts, in a social network we might consider a user with a large number of friends to be an important user. From the previous problem, we know this is the same as saying that nodes with a high degree are important users.

To visually draw attention to these nodes, we will change the size of the vertices so the vertices with high degrees are larger. To do this, we will change the “size” attribute of the vertices of our graph to be an increasing function of their degrees:

V(g)$size = degree(g)/2+2 
plot(g, vertex.label=NA)

#把有大量臉書朋友的頂點放大

What is the largest size we assigned to any node in our graph?
What is the smallest size we assigned to any node in our graph?

MIT
table(degree(g))

 0  1  2  3  4  5  6  7  8  9 10 11 13 17 18 
 7 10  4  9  1  4  4  3  6  2  4  1  2  1  1 
summary(degree(g))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    1.00    3.00    4.95    8.00   18.00 
range(V(g)$size)
[1]  2 11


3 Coloring Vertices

3.1 Colored by Gender

Thus far, we have changed the “size” attributes of our vertices. However, we can also change the colors of vertices to capture additional information about the Facebook users we are depicting.

When changing the size of nodes, we first obtained the vertices of our graph with V(g) and then accessed the the size attribute with V(g)\(size. To change the color, we will update the attribute `V(g)\)color`.

To color the vertices based on the gender of the user, we will need access to that variable. When we created our graph g, we provided it with the data frame users, which had variables gender, school, and locale. These are now stored as attributes V(g)\(gender, V(g)\)school, and V(g)$locale.

We can update the colors by setting the color to black for all vertices, than setting it to red for the vertices with gender A and setting it to gray for the vertices with gender B:

Plot the resulting graph.

V(g)$color = "black"
V(g)$color[V(g)$gender == "A"] = "red"
V(g)$color[V(g)$gender == "B"] = "gray"
plot(g, vertex.label=NA)

What is the gender of the users with the highest degree in the graph?

  • B
3.2 Colored by School

Now, color the vertices based on the school that each user in our network attended.

MIT
V(g)$color = "black"
V(g)$color[V(g)$school == "A"] = "red"
V(g)$color[V(g)$school == "AB"] = "gray"
plot(g, vertex.label=NA)

par(mar=c(1,1,1,1))
table(V(g)$school, useNA="ifany")

    A AB 
40 17  2 
V(g)$color = "gray"
V(g)$color[V(g)$school == "A"] = "green"
V(g)$color[V(g)$school == "AB"] = "red"
plot(g, vertex.label=NA)

Are the two users who attended both schools A and B Facebook friends with each other?

  • Yes

What best describes the users with highest degree?

  • Some, but not all, of the high-degree users attended school A
3.3 Colored by Locale

Now, color the vertices based on the locale of the user.

MIT
V(g)$color = "black"
V(g)$color[V(g)$locale == "A"] = "red"
V(g)$color[V(g)$locale == "B"] = "gray"
plot(g, vertex.label=NA)

table(V(g)$locale, useNA="ifany")

    A  B 
 3  6 50 
V(g)$color = "gray"
V(g)$color[V(g)$locale == "A"] = "red"
V(g)$color[V(g)$locale == "B"] = "green"
plot(g, vertex.label=NA)

#MIT用紅色代表A,灰色代表B,比較不清楚,老師設定的顏色處理比較明顯。

The large connected component is most associated with which locale?

  • B

The 4-user connected component is most associated with which locale?

  • A


4. Help Page for igraph Ploting

Which igraph plotting function would enable us to plot our graph in 3-D?

library(rgl)
rglplot(g, vertex.label=NA) # not working in windows

What parameter to the plot() function would we use to change the edge width when plotting g?

plot(g, edge.width=2, vertex.label=NA)








