When you sort of know, use two question marks.
??average ??tail
Install igraph, haven, and tidyverse:
install.packages('tidyverse')
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/tidyverse_1.2.1.tgz'
Content type 'application/x-gzip' length 89217 bytes (87 KB)
==================================================
downloaded 87 KB
The downloaded binary packages are in
/var/folders/zm/zhc689mj20sf8gz6nfjt09r40000gn/T//Rtmpy32DoF/downloaded_packages
install.packages('igraph')
Error in install.packages : Updating loaded packages
install.packages("haven")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/haven_2.2.0.tgz'
Content type 'application/x-gzip' length 1077133 bytes (1.0 MB)
==================================================
downloaded 1.0 MB
The downloaded binary packages are in
/var/folders/zm/zhc689mj20sf8gz6nfjt09r40000gn/T//Rtmpy32DoF/downloaded_packages
Set seed:
rr set.seed(09032019)
Load igraph , haven, and tidyverse OR attach data already loaded in R:
library(igraph)
Attaching package: ‘igraph’
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
library(tidyverse)
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
[37m── [1mAttaching packages[22m ───────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[37m[32m✔[37m [34mggplot2[37m 3.2.1 [32m✔[37m [34mpurrr [37m 0.3.2
[32m✔[37m [34mtibble [37m 2.1.3 [32m✔[37m [34mdplyr [37m 0.8.3
[32m✔[37m [34mtidyr [37m 1.0.0 [32m✔[37m [34mstringr[37m 1.4.0
[32m✔[37m [34mreadr [37m 1.3.1 [32m✔[37m [34mforcats[37m 0.4.0[39m
[37m── [1mConflicts[22m ──────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[37m [34mdplyr[37m::[32mas_data_frame()[37m masks [34mtibble[37m::as_data_frame(), [34migraph[37m::as_data_frame()
[31m✖[37m [34mpurrr[37m::[32mcompose()[37m masks [34migraph[37m::compose()
[31m✖[37m [34mtidyr[37m::[32mcrossing()[37m masks [34migraph[37m::crossing()
[31m✖[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31m✖[37m [34mdplyr[37m::[32mgroups()[37m masks [34migraph[37m::groups()
[31m✖[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()
[31m✖[37m [34mpurrr[37m::[32msimplify()[37m masks [34migraph[37m::simplify()[39m
library(haven)
library(readxl)
attach(mtcars)
The following object is masked from package:ggplot2:
mpg
Read in data from a desktop:
data2 <- read.delim('~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv', sep=',')
cannot open file '/Users/Al/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv': No such file or directoryError in file(file, "rt") : cannot open the connection
Read in data from a website:
# This step only needed to read excel files from the internet:
download.file("https://stats.idre.ucla.edu/stat/data/hsb2.xls", "~/Downloads/myfile.xls", mode="wb")
trying URL 'https://stats.idre.ucla.edu/stat/data/hsb2.xls'
Content type 'application/vnd.ms-excel' length 41984 bytes (41 KB)
==================================================
downloaded 41 KB
Use read_graph to load network data. Create a network from an edgelist. Load data at https://raw.githubusercontent.com/ZacharyST/Data/master/dolphins. Open that link first in your web browser so you know what the data look like. The data are undirected (two options):
data <- graph_from_edgelist(as.matrix(data), directed = FALSE)
Error in as.vector(x, mode) :
cannot coerce type 'closure' to vector of type 'any'
Get the adjacency matrix from the graph above. Tell the function not to return a sparse matrix.
rr amps2 <- as_adjacency_matrix(dataps2, sparse = FALSE)
Load the dolphin data again, this time telling R the network is directed.
rr dataps2d <- read_graph(://raw.githubusercontent.com/ZacharyST/Data/master/dolphins, directed = TRUE)
Load the SCAD data from my Github Data repository. The link is https://raw.githubusercontent.com/ZacharyST/Data/master/scad_latiname rica.
rr data <- read.csv(://raw.githubusercontent.com/ZacharyST/Data/master/scad_latinamerica)
INSPECTING A NETWORK (edges, nodes, components, size, degree)
How many nodes does the network have (two options)?
length(V(dataps2))
[1] 63
How many nodes does the network have (two options) using an adjacency matrix?
ncol(amps2)
[1] 63
How many edges does the network have (three options)?
sum(colSums(amps2))/2
[1] 159
Does the network have any components?
count_components(dataps2)
[1] 2
Save the results from above as a new object. Call the entry of that object that tells you have many components there are.
c$no
[1] 2
What is the size of the component(s)?
c$csize
[1] 1 62
What is the degree of each node? Use the object created when you read in the file.
degree(dataps2)
[1] 0 6 8 4 3 1 4 6 5 6 7 5 1 1 8 12 7 6 9 7 4 9 6 1 3 6 3
[28] 3 5 5 9 5 1 3 10 5 1 7 11 8 2 8 5 6 7 4 11 2 6 1 2 7 10 4
[55] 2 7 2 2 9 1 5 1 3
What is the degree of each node? Use the adjacency matrix?
rowSums(amps2)
[1] 0 6 8 4 3 1 4 6 5 6 7 5 1 1 8 12 7 6 9 7 4 9 6 1 3 6 3
[28] 3 5 5 9 5 1 3 10 5 1 7 11 8 2 8 5 6 7 4 11 2 6 1 2 7 10 4
[55] 2 7 2 2 9 1 5 1 3
What is the degree of the most social dolphin?
max(degree(dataps2))
[1] 12
Which dolphin is the most social?
V(dataps2)[degree(dataps2) == max(degree(dataps2))]
+ 1/63 vertex, from c1bd9b9:
[1] 16
Repeat the last two queries above, but save the result of the most social dolphin as an object.
V(dataps2)[degree(dataps2) == social]
+ 1/63 vertex, from c1bd9b9:
[1] 16
Load and inspect the KARATE data.
Make column names match the row numbers:
Now, let us count how many nodes there are, the number of components, and the degree of each node:
rr library(igraph) # To load igraph functions g <- graph(d) # Fails. Requires edgelist. g <- graph_from_adjacency_matrix(d) # Fails because d is a dataframe. g <- graph_from_adjacency_matrix(as.matrix(d), mode=) # Works. Do not forget to tell it the kind of network you have.
To just get the count of components:
count_components(g)
[1] 1
The most connections and its node or nodes (two options):
V(g)[degree(g) == maximum]
+ 1/34 vertex, named, from b64e902:
[1] 34
Loop, check work, correct:
# Using the apply function:
# MARGIN=2 means columns.
apply(d, MARGIN=2, FUN=sum) # You have to capitalize the argument names.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4
29 30 31 32 33 34
3 4 4 6 12 17
Find the nodes in the karate network that have the minimum degree.
V(g)[which.min(degree(g))]
+ 1/34 vertex, named, from b64e902:
[1] 12
RANDOM NUMBER GENERATOR (igraph, adjacency matrix)
A binomial distribution is one that produces a series of numbers according to parameters you pass it.We can easily make it produces 1s and 0s and then populate an adjacency matrix with them. The last argument controls the ratio of 1s and 0s. So, half the output will be 1, half will be 0, on average.
rbinom(1,1,.5)
[1] 1
rbinom(1,1,.5)
[1] 1
rbinom(1,1,.5)
[1] 1
rbinom(1,1,.5)
[1] 1
rbinom(1,1,.5)
[1] 0
Let’s speed that up. Why does the below line produce more numbers? 10 replaces 1.
rbinom(10,1,.5)
[1] 0 0 1 0 1 1 0 1 1 1
Let’s understand rbinom a little more. Second number is options.
rbinom(10,3,.5)
[1] 2 1 2 0 1 1 2 1 0 3
We can now make a matrix of arbitrary size, so long as rbinom() creates a square amount of numbers.
temp
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0 0 0 0 0 0 1 1
[2,] 0 1 0 1 1 1 1 0 1
[3,] 0 0 1 0 0 0 1 1 1
[4,] 0 0 0 0 0 0 1 1 1
[5,] 1 0 0 1 1 1 1 1 0
[6,] 0 0 1 1 0 0 0 1 0
[7,] 1 1 1 1 0 0 1 0 1
[8,] 1 1 0 0 0 0 1 1 1
[9,] 0 0 0 1 1 0 0 1 0
Maybe this approach is a little more intuitive:
temp
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 0 0 0 0 1 0 0 0 1 1 1 0 1
[2,] 0 0 1 0 1 1 0 1 0 0 0 1 1 1
[3,] 0 1 0 1 0 0 0 1 0 0 0 1 0 0
[4,] 0 1 1 0 1 0 0 0 0 1 1 1 1 0
[5,] 0 0 0 0 0 0 1 0 1 0 0 1 1 1
[6,] 0 0 0 0 1 0 0 0 1 1 0 0 1 1
[7,] 0 0 1 1 1 1 0 1 0 0 0 1 0 1
[8,] 1 1 0 0 1 0 0 0 1 0 0 1 1 0
[9,] 0 0 1 0 0 1 1 1 0 0 0 1 0 0
[10,] 1 0 1 1 1 1 0 0 1 0 1 1 1 0
[11,] 1 0 0 1 1 1 1 0 0 1 0 0 1 1
[12,] 1 0 0 0 0 1 1 1 0 1 0 0 0 1
[13,] 0 1 1 0 1 1 0 1 0 0 1 0 0 0
[14,] 0 1 0 0 0 1 0 0 0 1 0 0 1 0
[15,] 1 1 1 1 0 1 1 1 1 1 0 0 0 0
[,15]
[1,] 1
[2,] 0
[3,] 0
[4,] 0
[5,] 1
[6,] 1
[7,] 0
[8,] 0
[9,] 0
[10,] 1
[11,] 0
[12,] 1
[13,] 0
[14,] 1
[15,] 0
Convert to igraph object.
rr g_temp <- graph_from_adjacency_matrix(temp, mode=)
Note that this approach requires using a directed network.
(colSums(temp) + rowSums(temp)) == degree(g_temp)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
You can also make an undirected graph from the adjacency matrix. igraph will handle the construction of edges by assuming any edge is symmetric.
degree(g_temp2) # Gives the same as the above for graph
[1] 7 9 9 9 12 11 10 9 8 11 8 12 9 10 12
But calculation of other statistics will be tricky. It is better to just make the matrix symmetric right away, which requires some massaging. For the bottom left half of the matrix, make it the same as the top right.
temp2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 0 0 0 0 1 0 0 0 1 1 1 0 1
[2,] 0 0 1 0 1 1 0 1 0 0 0 1 1 1
[3,] 0 1 0 1 0 0 0 1 0 0 0 1 0 0
[4,] 0 0 1 0 1 0 0 0 0 1 1 1 1 0
[5,] 0 1 0 1 0 0 1 0 1 0 0 1 1 1
[6,] 1 1 0 0 0 0 0 0 1 1 0 0 1 1
[7,] 0 0 0 0 1 0 0 1 0 0 0 1 0 1
[8,] 0 1 1 0 0 0 1 0 1 0 0 1 1 0
[9,] 0 0 0 0 1 1 0 1 0 0 0 1 0 0
[10,] 1 0 0 1 0 1 0 0 0 0 1 1 1 0
[11,] 1 0 0 1 0 0 0 0 0 1 0 0 1 1
[12,] 1 1 1 1 1 0 1 1 1 1 0 0 0 1
[13,] 0 1 0 1 1 1 0 1 0 1 1 0 0 0
[14,] 1 1 0 0 1 1 1 0 0 0 1 1 0 0
[15,] 1 0 0 0 1 1 0 0 0 1 0 1 0 1
[,15]
[1,] 1
[2,] 0
[3,] 0
[4,] 0
[5,] 1
[6,] 1
[7,] 0
[8,] 0
[9,] 0
[10,] 1
[11,] 0
[12,] 1
[13,] 0
[14,] 1
[15,] 0
ADJACENCY MATRIX EXAMPLE
Create an adjacency matrix that has 25 rows and columns, where each cell has a .8 probability of containing a 1. Set the diagonal to zero.
probability <- .8
thisnumber <- 25
thismatrix <- matrix(rbinom(thisnumber^2, 1, probability), nrow=thisnumber)
diag(thismatrix) <- 0
thismatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 1 1 0 0 0 1 1 0 1 1 1 1 0
[2,] 1 0 1 1 1 1 0 1 0 1 1 0 1 1
[3,] 0 1 0 1 1 1 0 1 1 1 1 1 1 1
[4,] 1 1 1 0 1 1 1 1 1 1 1 1 1 0
[5,] 1 1 1 1 0 1 1 1 1 0 1 1 1 0
[6,] 1 1 1 1 1 0 1 0 1 1 1 0 1 1
[7,] 1 1 0 0 1 0 0 1 1 1 1 1 1 1
[8,] 0 0 1 0 1 1 1 0 1 0 1 0 0 1
[9,] 1 1 1 1 1 1 1 0 0 1 1 1 1 1
[10,] 0 1 0 0 1 1 1 1 1 0 1 0 1 1
[11,] 1 1 1 1 1 1 1 1 1 1 0 1 1 1
[12,] 1 1 1 1 1 1 1 0 1 1 1 0 1 1
[13,] 1 1 1 1 1 1 1 1 1 1 1 0 0 1
[14,] 1 1 1 1 1 1 1 1 1 1 0 1 1 0
[15,] 1 1 1 1 1 1 1 1 1 1 1 1 0 1
[16,] 1 1 1 1 1 1 0 1 1 0 1 0 1 1
[17,] 1 1 1 1 0 1 0 1 0 1 0 1 0 0
[18,] 0 1 0 0 1 1 1 1 0 1 0 1 1 0
[19,] 1 1 1 1 1 1 1 1 1 1 0 1 1 1
[20,] 1 0 1 1 1 0 0 0 0 1 1 1 1 1
[21,] 1 1 1 1 1 1 1 1 1 0 1 0 0 1
[22,] 0 1 1 1 1 1 1 0 1 0 1 1 1 1
[23,] 1 1 1 1 1 1 1 1 1 1 1 1 0 1
[24,] 1 1 1 1 1 1 1 1 1 1 1 0 1 1
[25,] 1 0 1 0 0 1 1 1 1 1 1 1 1 1
[,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
[1,] 0 1 1 0 1 1 1 1 1 1 0
[2,] 1 1 1 1 1 0 0 1 1 1 1
[3,] 1 0 1 1 1 1 0 0 0 1 0
[4,] 1 1 0 0 1 1 1 1 1 1 0
[5,] 0 1 1 0 1 1 1 1 0 0 1
[6,] 1 0 0 1 1 1 1 1 0 1 0
[7,] 1 1 0 1 0 1 1 1 1 0 1
[8,] 0 1 1 0 1 1 1 1 1 1 1
[9,] 1 0 1 0 1 0 1 1 1 1 1
[10,] 0 0 1 1 1 1 1 0 1 1 1
[11,] 1 1 1 0 1 1 1 0 0 1 0
[12,] 1 1 0 1 1 1 1 1 1 0 1
[13,] 1 1 1 1 1 1 1 1 1 0 1
[14,] 0 1 1 1 1 0 1 1 1 1 1
[15,] 0 1 0 1 1 1 1 1 1 1 1
[16,] 0 0 1 1 1 1 1 1 1 1 1
[17,] 1 1 0 1 0 1 1 0 1 1 1
[18,] 0 0 1 0 1 1 1 1 0 1 1
[19,] 1 0 1 1 0 1 1 0 1 1 0
[20,] 1 1 1 0 1 0 1 1 1 1 0
[21,] 1 0 0 0 1 1 0 1 0 1 1
[22,] 1 1 1 1 1 0 1 0 1 1 1
[23,] 1 0 0 1 1 1 1 1 0 1 1
[24,] 1 0 1 1 1 0 1 1 1 0 1
[25,] 1 1 1 0 1 1 1 0 1 1 0
Convert it to an undirected network.
thismatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 1 1 0 0 0 1 1 0 1 1 1 1 0
[2,] 1 0 1 1 1 1 0 1 0 1 1 0 1 1
[3,] 1 1 0 1 1 1 0 1 1 1 1 1 1 1
[4,] 0 1 1 0 1 1 1 1 1 1 1 1 1 0
[5,] 0 1 1 1 0 1 1 1 1 0 1 1 1 0
[6,] 0 1 1 1 1 0 1 0 1 1 1 0 1 1
[7,] 1 0 0 1 1 1 0 1 1 1 1 1 1 1
[8,] 1 1 1 1 1 0 1 0 1 0 1 0 0 1
[9,] 0 0 1 1 1 1 1 1 0 1 1 1 1 1
[10,] 1 1 1 1 0 1 1 0 1 0 1 0 1 1
[11,] 1 1 1 1 1 1 1 1 1 1 0 1 1 1
[12,] 1 0 1 1 1 0 1 0 1 0 1 0 1 1
[13,] 1 1 1 1 1 1 1 0 1 1 1 1 0 1
[14,] 0 1 1 0 0 1 1 1 1 1 1 1 1 0
[15,] 0 1 1 1 0 1 1 0 1 0 1 1 1 0
[16,] 1 1 0 1 1 0 1 1 0 0 1 1 1 1
[17,] 1 1 1 0 1 0 0 1 1 1 1 0 1 1
[18,] 0 1 1 0 0 1 1 0 0 1 0 1 1 1
[19,] 1 1 1 1 1 1 0 1 1 1 1 1 1 1
[20,] 1 0 1 1 1 1 1 1 0 1 1 1 1 0
[21,] 1 0 0 1 1 1 1 1 1 1 1 1 1 1
[22,] 1 1 0 1 1 1 1 1 1 0 0 1 1 1
[23,] 1 1 0 1 0 0 1 1 1 1 0 1 1 1
[24,] 1 1 1 1 0 1 0 1 1 1 1 0 0 1
[25,] 0 1 0 0 1 0 1 1 1 1 0 1 1 1
[,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
[1,] 0 1 1 0 1 1 1 1 1 1 0
[2,] 1 1 1 1 1 0 0 1 1 1 1
[3,] 1 0 1 1 1 1 0 0 0 1 0
[4,] 1 1 0 0 1 1 1 1 1 1 0
[5,] 0 1 1 0 1 1 1 1 0 0 1
[6,] 1 0 0 1 1 1 1 1 0 1 0
[7,] 1 1 0 1 0 1 1 1 1 0 1
[8,] 0 1 1 0 1 1 1 1 1 1 1
[9,] 1 0 1 0 1 0 1 1 1 1 1
[10,] 0 0 1 1 1 1 1 0 1 1 1
[11,] 1 1 1 0 1 1 1 0 0 1 0
[12,] 1 1 0 1 1 1 1 1 1 0 1
[13,] 1 1 1 1 1 1 1 1 1 0 1
[14,] 0 1 1 1 1 0 1 1 1 1 1
[15,] 0 1 0 1 1 1 1 1 1 1 1
[16,] 1 0 1 1 1 1 1 1 1 1 1
[17,] 0 1 0 1 0 1 1 0 1 1 1
[18,] 1 1 1 0 1 1 1 1 0 1 1
[19,] 1 1 0 1 0 1 1 0 1 1 0
[20,] 1 1 1 1 1 0 1 1 1 1 0
[21,] 1 1 1 1 1 1 0 1 0 1 1
[22,] 1 1 0 1 0 1 1 0 1 1 1
[23,] 1 1 1 0 1 1 0 1 0 1 1
[24,] 1 1 1 1 1 1 1 1 1 0 1
[25,] 1 1 1 1 0 0 1 1 1 1 0
Convert the matrix to an igraph graph object.
rr ig2 <- graph_from_adjacency_matrix(thismatrix, mode=‘undirected’)
What is the average degree of the network?
summary(degree(ig2))[4]
Mean
18.64
What is the density of the network?
edge_density(ig2) # same
[1] 0.7766667
What is the global transitivity of the network?
transitivity(ig2, type='globalundirected')
[1] 0.7654172
Use an adjacency matrix to create a fake network that has 30 individuals.
newmatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0 1 0 0 1 0 0 0 0 1 1 0 1 0
[2,] 1 1 0 0 0 1 0 0 0 0 1 1 0 0
[3,] 0 0 0 0 1 1 0 0 1 0 0 0 1 1
[4,] 0 0 0 0 1 0 1 0 1 0 0 0 1 0
[5,] 1 1 0 1 1 1 0 1 0 0 0 0 1 0
[6,] 1 1 0 0 0 0 1 0 1 0 0 1 0 0
[7,] 1 1 0 1 0 0 0 1 0 0 0 0 0 1
[8,] 0 0 0 0 0 1 1 0 0 1 0 0 1 1
[9,] 1 0 0 1 0 1 1 1 0 0 1 1 0 1
[10,] 0 0 1 1 0 1 0 1 0 1 0 1 0 1
[11,] 1 0 0 1 1 0 0 1 0 0 0 1 1 1
[12,] 0 1 0 1 1 0 1 0 1 0 1 1 1 0
[13,] 1 0 0 0 0 0 1 0 0 1 1 0 1 1
[14,] 1 0 0 0 0 1 0 1 0 1 1 0 1 0
[15,] 0 1 0 0 0 0 0 1 1 1 1 0 0 0
[16,] 1 0 0 0 0 1 1 0 1 1 0 0 0 1
[17,] 1 0 1 1 1 0 0 0 0 1 0 0 0 1
[18,] 1 0 0 0 1 0 0 1 1 0 0 0 0 0
[19,] 1 1 0 0 1 1 1 1 0 1 1 1 1 0
[20,] 0 1 1 0 1 0 0 0 1 0 0 1 1 1
[21,] 0 0 0 1 1 1 1 0 1 1 0 0 1 0
[22,] 0 1 0 0 1 0 1 1 0 0 0 0 1 1
[23,] 0 1 0 1 0 1 0 1 0 1 0 0 1 0
[24,] 1 0 1 0 0 1 1 0 0 0 0 0 1 1
[25,] 0 1 0 0 1 1 1 1 1 0 1 1 1 1
[26,] 0 1 1 1 1 0 1 0 0 0 1 0 0 1
[27,] 0 1 0 0 0 1 0 1 1 0 1 0 0 0
[28,] 1 0 0 0 1 1 1 0 0 0 1 0 0 1
[29,] 0 1 1 1 0 1 1 1 1 0 1 0 0 0
[30,] 1 1 1 0 1 1 1 0 0 1 0 0 0 0
[,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27]
[1,] 1 1 0 0 0 0 1 1 1 1 1 0 1
[2,] 1 1 1 1 0 0 1 0 1 1 1 1 1
[3,] 1 1 1 1 0 1 0 1 0 0 1 0 0
[4,] 0 0 0 0 0 0 0 1 0 0 0 1 1
[5,] 0 1 1 1 0 1 1 0 0 0 1 1 1
[6,] 0 0 1 0 0 0 1 1 0 0 0 1 1
[7,] 1 0 1 1 0 0 0 1 1 1 1 0 1
[8,] 0 0 0 1 0 1 0 0 0 1 1 0 1
[9,] 1 1 1 1 0 0 0 0 1 1 0 1 1
[10,] 1 0 1 0 1 1 1 1 1 1 1 1 0
[11,] 0 0 0 1 0 0 1 1 0 0 1 1 0
[12,] 1 1 0 0 0 1 1 0 1 0 1 0 1
[13,] 1 0 1 1 1 1 0 1 0 1 0 0 0
[14,] 1 0 1 1 0 1 1 1 0 1 0 0 1
[15,] 1 0 1 0 0 1 1 1 0 0 0 0 0
[16,] 1 1 1 0 0 1 1 0 0 0 0 1 0
[17,] 0 0 0 0 0 1 1 0 0 0 0 1 1
[18,] 1 0 1 0 1 1 1 1 1 0 0 0 1
[19,] 0 0 1 1 0 0 0 1 0 1 1 1 1
[20,] 0 1 1 0 1 0 1 1 1 0 1 0 1
[21,] 1 1 1 1 1 1 0 1 1 0 0 0 1
[22,] 1 0 1 1 0 0 0 1 1 1 1 0 1
[23,] 1 0 0 1 0 0 0 0 0 0 1 0 1
[24,] 1 1 1 0 1 1 1 1 1 1 0 1 1
[25,] 1 0 0 0 0 0 0 0 0 1 0 0 1
[26,] 1 0 1 0 0 1 1 0 1 1 1 0 0
[27,] 0 1 0 0 1 0 1 1 1 0 0 0 0
[28,] 0 0 0 1 0 1 0 1 1 1 1 1 1
[29,] 0 1 1 0 1 0 1 0 0 1 0 1 1
[30,] 1 1 0 1 0 0 0 1 0 1 0 0 1
[,28] [,29] [,30]
[1,] 1 0 1
[2,] 1 0 1
[3,] 1 0 0
[4,] 0 1 0
[5,] 1 0 0
[6,] 0 0 0
[7,] 0 0 1
[8,] 1 0 1
[9,] 0 1 0
[10,] 1 0 0
[11,] 1 0 0
[12,] 1 1 0
[13,] 1 1 0
[14,] 1 0 1
[15,] 1 1 0
[16,] 1 1 1
[17,] 1 0 1
[18,] 0 0 0
[19,] 0 0 0
[20,] 1 1 1
[21,] 1 1 1
[22,] 0 1 0
[23,] 0 0 1
[24,] 1 0 1
[25,] 0 0 1
[26,] 1 0 0
[27,] 1 0 1
[28,] 0 0 0
[29,] 1 1 0
[30,] 1 1 1
Calculate the degree of each node using the adjacency matrix:
degree(newmatrix) # fails since not graph object
Error in degree(newmatrix) : Not a graph object
Degree for directed adjacency matrix:
colSums(newmatrix) + rowSums(newmatrix) - 2*diag(newmatrix) # No self loops
[1] 30 31 20 19 30 27 30 26 29 28 26 24 29 31 29 26 31 27 25 33 36 32 25 33 29 28 34
[28] 34 26 28
Degree for undirected
colSums(newmatrix) + rowSums(newmatrix) - diag(newmatrix) # No self loops
[1] 30 32 20 19 31 27 30 26 29 29 26 25 30 31 30 27 31 27 25 33 36 33 25 34 29 28 34
[28] 34 27 29
Convert the adjacency matrix to an asymmetric graph object. Use igraph to calculate the degree of each node.
degree(g3)
[1] 30 33 20 19 32 27 30 26 29 30 26 26 31 31 31 28 31 27 25 33 36 34 25 35 29 28 34
[28] 34 28 30
DENSITY
What is the density of the dolphin network (two options)?
ecount(dataps2)/((vcount(dataps2) * (vcount(dataps2) - 1))/2)
[1] 0.08141321
What is the density of the directed dolphin data?
graph.density(dataps2d)
[1] 0.04070661
Why is the density this value now? How does it compare to what you found in Question 1?
rr #The density of a directed network is half that of an undirected one because there are twice as many possible edges. In the directed network the density is 0.04070661, which is lower than the density in the undirected network (0.08141321). The directed network density is lower, because when we went from an undirected to a directed network, we are adding edges even though we did not add nodes.
Karate club density, automatically and manually:
graph.density(g)
[1] 0.1390374
Let’s make this clearier to understand what’s happening:
totaledges/possibleedges
[1] 0.1390374
Same, counting directly from the adjacency matrix:
sum(colSums(d))/(nrow(d)*ncol(d))
[1] 0.1349481
Our directed graph:
edges/possibleedges
[1] 0.1764706
Our undirected graph:
graph.density(g_temp2b)
[1] 0.4571429
Do you notice anything about the density of the directed graph? It is lower. Why is the density of the undirected graph higher than that of the directed one even though they use the same data? Because when we went from undirected to directed adjacency matrices, we added a 1 to the bottom left or top right triangle if it did not exist before. So, we are adding edges even though we are not adding nodes.
Generate a 10 x 10 matrix.
rr number <- 10 thismatrix <- matrix(rbinom(number^2, 1, .5), nrow=number)
Calculate its directed density.
tm_d <- graph_from_adjacency_matrix(thismatrix, mode='directed')
tm_d <- graph_from_adjacency_matrix(thismatrix, mode='directed')
graph.density(tm_d)
[1] 0.6111111
edges <- sum(thismatrix)
possibleedges <- nrow(thismatrix)*ncol(thismatrix)-nrow(thismatrix)
edges/possibleedges
[1] 0.6111111
Calculate its undirected density. Remember, the calculation is the same once you define the edges correctly. First, make symmetric before passing to igraph and remove self loops.
graph.density(tm_u, loops=FALSE)
[1] 0.4888889
CLUSTERING
What is the global clustering of the undirected dolphin data (two options)?
transitivity(dataps2, type = "global")
[1] 0.3087757
What is the node with the highest local clustering value (two options)?
V(data)[V(data)$clustering == clusteringMax]
+ 0/62 vertices, from 294c26f:
Note that one can specificy undirected but it does not change the calculation; igraph does not appear to calculate directed clustering, globally or locally:

Calculate the global transitivity for our directed graph. transitivity(tm_d, type=‘global’)
Calculate the local transitivity for our directed graph. transitivity(tm_d, type=‘local’)
Calculate the global transitivity for our undirected graph. transitivity(tm_u, type=‘global’) transitivity(tm_u, type=‘globalundirected’)
Calculate the local transitivity for our undirected graph. transitivity(tm_u, type=‘local’) transitivity(tm_u, type=‘localundirected’)
Find the node with the highest clustering coefficient for the directed network. V(tm_d)[transitivity(tm_d, type=‘local’) == max(transitivity(tm_d, type=‘local’))]
mt <- max(transitivity(tm_d, type=‘local’)) V(tm_d)[transitivity(tm_d, type=‘local’) == mt]
Find the node with the lowest clustering coefficient undirected network. V(tm_u)[transitivity(tm_u, type=‘local’) == min(transitivity(tm_u, type=‘local’))]
mint <- min(transitivity(tm_u, type=‘local’)) V(tm_u)[transitivity(tm_u, type=‘local’) == mint]
Plot the relationship between degree and clustering for the 10x10 undirected network. plot(degree(tm_u), transitivity(tm_u, type=‘local’), xlab=‘Node Degree’, ylab=‘Local Clustering’, pch=20)
RANDOM NETWORK
Generate a random network containing 200 nodes with a probability of .2 that any two nodes are connected. Use igraph to convert it to an undirected matrix. Make a histogram of the degree distribution, label the axes, give it a title, and save it to file (two options).

Repeat Problem 1, except vary the probability of ties from .1 to 1 in increments of .1. Save each histogram in a file. What do you notice as p changes?
rr nodes <- 200 prob <- seq(.1,1,by=.1)
for(i in 1:length(prob)){ thenetwork <- matrix(rbinom(nodes^2, 1, p=prob[i]), nrow=nodes) diag(thenetwork) <- 0 thenetwork <- graph_from_adjacency_matrix(thenetwork, mode=‘upper’)
pdf(paste0(‘~/Desktop/UCLA’, prob[i], ‘.pdf’)) hist(degree(thenetwork), xlab=‘Degree’, ylab=‘Count’, main=paste0(‘Distribution of Degree, p=’, prob[i]), breaks=50) dev.off() }
#Making Symmetric Networks From an Adjacency Matrix
Convert an unbalaned adjacency matrix to an undirected adjacency matrix. Start by making the matrix:
rr probability <- .2 thisnumber <- 12 thismatrix <- matrix(rbinom(thisnumber^2, 1, probability), nrow=thisnumber) diag(thismatrix) <- 0
Convert it to an undirected network.
t(thismatrix)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 0 0 0 1 0 1 1 0 0 1 0 0
[2,] 0 0 1 0 0 0 0 0 1 0 0 0
[3,] 0 0 0 1 0 0 0 1 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 1 0 0 0
[6,] 0 0 0 0 1 0 0 0 0 0 0 0
[7,] 0 0 0 1 1 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0 0 0
[9,] 0 1 0 0 0 1 0 1 0 0 0 1
[10,] 0 0 0 0 0 0 0 0 0 0 0 1
[11,] 0 0 0 0 0 1 0 0 0 0 0 0
[12,] 0 0 0 0 0 0 0 1 0 0 0 0
RANDOM GRAPH
Build random graphs from matrices. Then, we use igraph to build a random graph. In both cases, we will learn about the network via some statistics. At the end of this section, we will learn how to automatically vary parameters and view the statistics of the resulting networks. Create using matrix.
howmanyrows <- 100
temp <- matrix(rbinom(howmanyrows^2, 1, .5), nrow=howmanyrows)
diag(temp) <- 0 # Make sure there are no self-loops.
# Do not forget to make symmetric
temp[lower.tri(temp)] <- t(temp)[lower.tri(temp)]
bus720 <- graph_from_adjacency_matrix(temp, mode='undirected') # I have chosen a weird object name to emphasize that they are arbitrary. Most documentation for igraph calls graph objects g, but you do not have to.
Explore degree:
summary(degree(bus720))
Min. 1st Qu. Median Mean 3rd Qu. Max.
36.00 46.00 50.00 49.66 53.00 66.00
Explore clustering:
transitivity(bus720, type="local")[1:10]
[1] 0.5206349 0.5132979 0.5063425 0.4804878 0.4964706 0.4956140 0.5130435 0.5045455
[9] 0.5175510 0.4927536
Explore density:
graph.density(bus720)
[1] 0.5016162
Now, let us learn how to have igraph make a random graph for us. I will then show how you can vary the density of the graph using a loop; you could do the same for nodes, but I leave that exercise for the problem set. I will then show how to put the output from each variation into a dataframe; otherwise, you lose the results as each new loop runs. Create using igraph package, explore degree, clustering, density.
graph.density(ibus720)
[1] 0.4923232
SMALL-WORLD NETWORKS
Recreate the small-world chart I showed in class, the one that shows how average path length and clustering change as the probability of rewiring edges increases. Make the network a ring with 1,000 nodes where each node is connected to ten neighbors. As you work on this problem, keep in mind six things: a. You will need a baseline model by which to normalize (divide) the values of each simulation. That is, you will need a model that does not change for each iteration of the loop. b. Make sure you have very small values of p, smaller than the class example. c. Make sure the x-axis is the logarithm of p. 1. If one of your values of p is 0, you should add a small amount, such as .01 or .1, to each value of the rewiring probability when you plot. log(0) is undefined, but you need to see the result for that value. 2. When you do this, you may notice the x-axis values become negative. It is acceptable if you leave the values like that, but I encourage you to explore how to fix them. Subhint: look at xaxt in plot() and axis() after plot(). Subhint2: You could also look at the log argument in plot(). d. To add a second line to a graph, use lines() after plot(). e. Save the chart to your computer. f. How will you distinguish which line corresponds to which statistic? Use the lty argument to lines() and plot(). If you are adventurous, add a legend to the plot as well.

Use igraph: From here on out, we will rely on igraph’s functions to build networks. While we could build them manually from adjacency matrices, creating the proper adjacency matrix or edgelist is a bit more complicated. It is doable but not necessary for the aims of this class. The igraph function sample_smallworld can be confusing. You define the number of dimensions of the lattice; 1 means a line, 2 means a matrix, and so on. The dimension will raise the number of nodes you want to the power of the dimension. It is therefore easiest to just define the number of nodes you want and assume one dimension. nei is how far (many steps) in each direction to make connections between nodes; the higher it is, the greater the average degree.
Create a small-worl network using igraph package, explore degree, clustering, density:
graph.density(bus20)
[1] 0.06060606
Let’s make the printed output a little easier to read.
paste('The density of the graph is', graph.density(bus20))
[1] "The density of the graph is 0.0606060606060606"
Let’s do that again, changing the size of the network fro 100 to 1,000.
rr nodes <- 1000
bus20 <- sample_smallworld(dim=1, size=nodes, nei=3, p=.2)
Let’s make the printed output a little easier to read.
summary(degree(bus20))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 5 6 6 7 12
paste('The graph\'s transitivity is', round(transitivity(bus20, type='global'), 5))
[1] "The graph's transitivity is 0.16376"
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(bus20, type='local')))
[1] "The other way to calculate transitivity suggests the value is NaN"
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(bus20, type='local'), na.rm=TRUE))
[1] "The other way to calculate transitivity suggests the value is 0.185631518964852"
paste('The density of the graph is', graph.density(bus20))
[1] "The density of the graph is 0.00600600600600601"
paste('There are', ecount(bus20), 'edges in the graph')
[1] "There are 3000 edges in the graph"
Now, we can vary the parameters like we did for random graphs. Since we are familiar with this process, I will vary p and save the results to a dataframe immediately. Notice that I define how many iterations of the loop differently than for random graphs; they both work the same. I am going to introduce a new way to generate sequences of numbers as well.
Vary p using sequence. There are two ways to do this.
Why does the density never change? Because it is the percentage of possible edges that actually exist; density tells you about the network overall (while degree tells you something about individual nodes)
Create a small-world ring network with 100 nodes that each start with connections to 10 neighbors. Randomly rewire with probability of .03.
sw
IGRAPH 77a5caa U--- 100 500 -- Watts-Strogatz random graph
+ attr: name (g/c), dim (g/n), size (g/n), nei (g/n), p (g/n), loops (g/l),
| multiple (g/l)
+ edges from 77a5caa:
[1] 1-- 2 2-- 3 3-- 4 4-- 5 5-- 6 7--74 7-- 8 8-- 9 10--82 10--11 11--12
[12] 12--13 13--14 14--15 15--16 16--17 17--18 18--19 19--20 20--21 21--22 22--23
[23] 23--24 24--25 25--26 26--27 27--28 28--29 29--30 30--31 14--32 32--33 33--34
[34] 34--35 35--36 36--37 24--37 38--39 39--40 40--41 41--42 42--43 43--44 44--45
[45] 45--46 46--47 47--48 48--49 49--50 50--51 51--52 52--53 53--54 54--55 55--56
[56] 56--57 57--58 58--59 59--60 60--61 61--62 62--63 63--64 64--65 65--66 66--67
[67] 67--68 68--69 69--70 70--71 71--72 72--73 73--74 74--75 18--75 77--96 77--78
+ ... omitted several edges
Write a loop that creates a new small-world network each iteration. Vary the rewiring probability from 0 to .2 in increments of .01. Save the results in a dataframe, tracking at least the overall clustering, diameter, and average path length.
SCALE FREE NETWORKS
Create a scale free network with 12,000 nodes where each node adds 20 edges.
a. Make a histogram of the degree distribution. b. What is the ratio of the mean to median degree? c. Is the density less than a small-world network? Why?
graph.density(sf)
[1] 0.003330694
Now, we will explore scale-free networks, using Barabasi’s model of preferential attachment. While there are many mechanisms that can generate scale-free distributions, his is the most commonly used and therefore the one most commonly included out of the box in software. We will focus on varying the number of edges added to a network. A key feature of scale-free networks is that the mean degree is much higher than the median; we will track that ratio.
degree_distribution(sf)
[1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[7] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.615000e-01 1.225833e-01
[13] 9.866667e-02 7.875000e-02 6.375000e-02 5.725000e-02 4.841667e-02 3.783333e-02
[19] 3.300000e-02 2.925000e-02 2.550000e-02 2.066667e-02 1.966667e-02 1.608333e-02
[25] 1.325000e-02 1.308333e-02 9.833333e-03 1.033333e-02 1.000000e-02 8.416667e-03
[31] 8.416667e-03 6.750000e-03 6.083333e-03 5.666667e-03 5.916667e-03 5.916667e-03
[37] 4.750000e-03 3.833333e-03 4.000000e-03 2.666667e-03 2.250000e-03 2.166667e-03
[43] 3.250000e-03 2.416667e-03 2.916667e-03 2.333333e-03 2.166667e-03 2.666667e-03
[49] 2.083333e-03 1.750000e-03 7.500000e-04 1.250000e-03 2.250000e-03 1.916667e-03
[55] 1.750000e-03 1.416667e-03 1.000000e-03 1.250000e-03 8.333333e-04 5.000000e-04
[61] 1.500000e-03 1.083333e-03 9.166667e-04 8.333333e-04 4.166667e-04 8.333333e-04
[67] 5.000000e-04 6.666667e-04 7.500000e-04 5.000000e-04 5.833333e-04 5.000000e-04
[73] 2.500000e-04 9.166667e-04 5.833333e-04 2.500000e-04 5.833333e-04 4.166667e-04
[79] 4.166667e-04 5.000000e-04 8.333333e-04 4.166667e-04 5.000000e-04 2.500000e-04
[85] 5.000000e-04 3.333333e-04 5.000000e-04 4.166667e-04 4.166667e-04 5.000000e-04
[91] 2.500000e-04 2.500000e-04 1.666667e-04 5.000000e-04 3.333333e-04 8.333333e-05
[97] 3.333333e-04 4.166667e-04 2.500000e-04 8.333333e-05 3.333333e-04 3.333333e-04
[103] 1.666667e-04 1.666667e-04 5.833333e-04 2.500000e-04 8.333333e-05 8.333333e-05
[109] 3.333333e-04 0.000000e+00 0.000000e+00 1.666667e-04 4.166667e-04 1.666667e-04
[115] 2.500000e-04 1.666667e-04 8.333333e-05 8.333333e-05 8.333333e-05 0.000000e+00
[121] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00
[127] 8.333333e-05 1.666667e-04 1.666667e-04 8.333333e-05 1.666667e-04 1.666667e-04
[133] 1.666667e-04 1.666667e-04 8.333333e-05 1.666667e-04 8.333333e-05 8.333333e-05
[139] 0.000000e+00 1.666667e-04 0.000000e+00 2.500000e-04 0.000000e+00 0.000000e+00
[145] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 1.666667e-04
[151] 0.000000e+00 8.333333e-05 0.000000e+00 8.333333e-05 8.333333e-05 0.000000e+00
[157] 2.500000e-04 8.333333e-05 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[163] 0.000000e+00 0.000000e+00 8.333333e-05 8.333333e-05 8.333333e-05 0.000000e+00
[169] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00
[175] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 8.333333e-05 0.000000e+00
[181] 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05
[187] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[193] 8.333333e-05 0.000000e+00 8.333333e-05 0.000000e+00 1.666667e-04 0.000000e+00
[199] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 1.666667e-04 8.333333e-05
[205] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[211] 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[217] 1.666667e-04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[223] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05
[229] 8.333333e-05 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[235] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[241] 0.000000e+00 8.333333e-05 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[247] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[253] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[259] 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[265] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05
[271] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[277] 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[283] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[289] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[295] 0.000000e+00 0.000000e+00 1.666667e-04 0.000000e+00 0.000000e+00 0.000000e+00
[301] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00
[307] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[313] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[319] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[325] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[331] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00
[337] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[343] 0.000000e+00 0.000000e+00 0.000000e+00 1.666667e-04 0.000000e+00 0.000000e+00
[349] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[355] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[361] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[367] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[373] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[379] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 8.333333e-05
[385] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[391] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[397] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[403] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05
[409] 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[415] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[421] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[427] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[433] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[439] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[445] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[451] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[457] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[463] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[469] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[475] 0.000000e+00 0.000000e+00 8.333333e-05
Let’s make the printed output a little easier to read.

Now, let’s use a loop to see how these measures change as we increase the number of nodes in the network. Let us increase it in units of 1,000. Vary network size using a loop:
nodes <- seq(from=1000,to=10000,by=1000)
remember <- data.frame(iteration=NULL, networkSize=NULL, avgDegree=NULL, globalClustering=NULL, glbalClustering2=NULL, density=NULL, ratio=NULL, avgPathLength=NULL)
for(i in 1:length(nodes)){
temp <- sample_pa(n=nodes[i], power=1, m=10, directed=FALSE)
print(paste('This is loop', i, 'of', length(nodes)))
d <- data.frame(iteration=i, size=nodes[i], avgDegree=mean(degree(temp)), globalClustering=transitivity(temp, type='global'), globalClustering2=mean(transitivity(temp, type='local')), density=graph.density(temp), ratio=mean(degree(temp))/median(degree(temp)), avgPathLength=average.path.length(temp))
remember <- rbind(remember, d) # "rowbind", adds a new row, d, to whatever exists in remember.
}
remember # Not bad.
REAL NETWORKS
In this section, we will explore actual network data. The first dataset is e-mails from Enron, an energy firm that imploded when its accounting manipulations were discovered. The second is a sample of individuals from Facebook.
Load Enron:
ecount(enron)
[1] 116865
summary(degree(enron2))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 1.000 6.375 2.000 1367.000
paste('The graph\'s transitivity is', transitivity(enron2, type='global'))
[1] "The graph's transitivity is 0.0715762061104356"
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(enron2, type='local')))
[1] "The other way to calculate transitivity suggests the value is NaN"
paste('The density of the graph is', graph.density(enron2))
[1] "The density of the graph is 0.000173859823993874"
paste('There are', ecount(enron2), 'edges in the graph')
[1] "There are 116865 edges in the graph"
paste('The diameter is', diameter(enron2)) # Takes awhile to run.
[1] "The diameter is 7"
paste('The average path length is', average.path.length(enron2))
[1] "The average path length is 3.5201731990681"
We can also take subsets of a network based on node features. Below, I will show you how to keep only nodes meeting degree thresholds.
vcount(degree10)/vcount(enron2) # 8.94% of nodes
[1] 0.08940163
ecount(degree10)/ecount(enron2) # 62.45% of edges
[1] 0.6245497
You can keep nodes that match any sort of filter you want. Next, I show based on clustering. Show how to select nodes with high clustering
table(degree(highclustering_g))
0 1 2 3 4 5 6 7 8 9
7462 364 95 54 37 10 13 2 2 2
CALCULATIONS (vectors, objects, phrases)
What is 3819 times 102?
rr 3819 * 102
What is 3819 minus 1892?
rr 3819 - 1892
Repeat the above, but save each number as an object.
rr obj1 <- 3819 obj2 <- 1892 obj1 * obj2 obj1 - obj2
Make a vector of the numbers 10, 20, 30, 40, and 50. Multiply that vector # by another of the numbers 3, 4, 5, 6, and 7.
rr v1 <- c(10, 20, 30, 40, 50) v1_alt <- seq(from = 10, to = 50, by = 10) v2 <- c(3, 4, 5, 6, 7) v2_alt <- 3:7
v1 * v2 v1_alt * v2_alt
Repeat above, except make the second vector the numbers 5 and 10. Show the results and explain, in words, what R is doing with this shorter vector.
rr v2 <- c(5, 10) v1 * v2
#(Warning in v1 * v2: longer object length is not a multiple of shorter object length): R repeats the v2 vector as many times as it takes to match the length of v1. So, it creates a vector that is (5,10,5,10,5)
Does the phrase, “I enjoy learning to program” contain the word “learning”? What about “learn”? What about “towards”?
rr > phrase <- c(‘I enjoy learning to program’) > grepl(‘learning’, phrase) [1] TRUE > grepl(‘learn’, phrase) [1] TRUE > grepl(‘towards’, phrase) [1] FALSE
Why does R return [1]?
rr #Because it is telling the result is a vector of length 1. Vectors and calculation become especially useful when your vector has length greater than 1, such as when it represents a variable with multiple observations. You can then easily manipulate all entries in one vector or perform operations on multiple vectors. To make a vector of length greater than 1, you need c(). The stands for .
Practice with vectors of length greater than 1:
rr first <- c(100,19,20,89,76,102,74,81,65) first * 2
#This multiplies each entry by 2 (great to convert inches to centimeters) first - 20
#This subtracts each entry by 20
second <- c(1,2,4,10,10,1,4,5,10) #Do not need to have the exact same number of entries again, but made it more simple)
first * second
#Multiplies each entry in first by second in order, ie 1*100; multiplying each entry is weighting data first / second
Example using vectors to calculate a more complicated outcome:
rr observed <- first # I have created a new object that has the values of first, i.e., is mapped to first above.
observed == first
# Returns TRUE for as many entries as there are in each vector. (Need two equal marks, one equal will just make a copy like <-, while this compares, good way to check data after doing a transformation)
observed != first #(booleian operation for does not equal)
Chi-square
rr expected <- c(90,18,30,79,80,88,80,83,70) numerator <- (observed-expected)^2 sum(numerator/expected)
Vectors do not have to be numbers. For example, you may have a variable that is a respondent’s favorite sports team, so the entries are likely to be words. If you have a dataframe, the column names are a string vector:
rr char_vec <- c(, , , ) length(char_vec)
‘words’ %in% char_vec ‘words’ == char_vec #for how often ‘word’ %in% char_vec #^good for searching larger texts
sametype <- c(‘this’, 20, ‘converts to’, 5, ‘strings’) #^if R is unsure of a thing it converts it into a string
Create an object that contains five numbers. Multiply those numbers by 2. Add 82 to those numbers.
rr this <- c(1,2,3,4,5) this*2 this+82
Create a second object that contains five numbers. Multiply the two objects together and save the result as a new object. Divide the two objects, saving the result as a new object.
rr arbitrary <- c(18,18,93,103, 45) new <- this*arbitrary new <- this/arbitrary
Does the vector contain the word “tomato”?
rr characters <- c(‘this’, ‘vector’, ‘contains’, ‘words’, ‘that’, ‘have’, ‘words’, ‘with’, ‘letters’, ‘and’, ‘numbers’, 29, ‘30’ )
‘tomato’ %in% characters
How many times does the vector contain the word “words”?
rr grep(‘words’, characters) length(grep(‘words’, characters))
Search the vector for the number 29.
rr grep(‘29’, characters)
Search the vector for the number 100.
rr grep(‘100’, characters)
#integer(0) –> way R says no
It works differently if the words are not separated in a vector:
rr words <- c(‘this vector contains words that have words with letters and numbers 29 30’) ‘this’ %in% words
#FALSE because the word this does not = the first unit in vector words
length(words)
grep(‘this’, words) grepl(‘this’, words)
Notice that grep partial matches by default. To get exact word, use spaces; grep will tell me anywhere vs %in% which is an exact match
grepl(‘contain’, words) grepl(‘contain’, words)
Note: You cannot give a vector a number, e.g., in scores[2,4], R is looking for the second row and fourth column; would need to write:
rr scores <- c(John=25, Marge=34, Dan=24, Emily=29) scores[c(2,4)]
Return individuals with scores greater than 30:
rr scores[scores > 30]
DATA STRUCTURES (create a list, matrix, dataframe, identify variables)
Name each vector in the list:
rr mary_info <- list(classes=c(, , , ), friends=c(, , ), SAT=c(1450,1000,1600))
Return the second vector:
rr mary_info[[2]]
Return the vector named “SAT”
rr mary_info$SAT
Return the second element of friends vector
rr mary_info$friends[2]
Make a list that contains the three departments at Luskin as well as the number of enrolled students in each. Assume the numbers are 60, 80, and 100.
rr list0 <- list(depts = c(, , ), enrolled = c(60 80, 100))
For these next questions, use the list created in the below code.
rr list1 <- list(names=c(‘John’, ‘Kate’, ‘Jessica’, ‘Carl’), grades=c(80,85,60,100), sex=c(‘Male’, ‘Female’, ‘Female’, ‘Male’))
Who has grades greater than 83?
rr list1\(names[list1\)grades > 83]
Which people are female?
rr list1\(names[list1\)sex == ]
Create a 2x3 matrix, filling down columns:
rr a <- matrix(1:6, nrow=2)
Fill across rows:
rr b <- matrix(5:14, nrow=2, byrow=TRUE)
Row 2, column 3:
rr a[2,3]
All rows, column 2:
rr b[,2]
All columns, row 1:
rr a[1,]
Other ways to create matrices:
rr matrix(c(0,1,1,1,0,1,0,0,1), nrow=3)
matrix(0:1, nrow=10, ncol=10)
Create a matrix of 10 rows and 10 columns where each entry is the number 0 or 1 (multiple options).
rr m1 <- matrix(rbinom(100, 1, 0.5), nrow = 10) m <- matrix(0:1, nrow=10, ncol=10)
Show me the first row of the matrix.
rr m1[1, ]
Show the first six rows of the matrix (two options).
rr head(m1) m1[1:6, ]
Show rows 4 through 8.
rr m1[4:8, ]
Show rows 3, 5, 7, and 9.
rr m1[c(3, 5, 7, 9), ]
Show the third column of the matrix.
rr m1[, 3]
Show the third, fifth, and seventh columns of the matrix.
rr m1[, c(3, 5, 7)]
Convert the grades list from above into a data frame (four options).
rr d1 <- data.frame(list1)
as.data.frame(list1)
d1b <- data.frame(names=c(‘John’, ‘Kate’, ‘Jessica’, ‘Carl’), grades=c(80,85,60,100), sex=c(‘Male’, ‘Female’, ‘Female’, ‘Male’))
data.frame(list1)
Who has grades greater than 90?
rr d1[d1$grades > 90, ]
Which people are male?
rr d1[d1$sex == , ]
What are the first six rows and nine columns in the SCAD data?
rr data[1:6, 1:9]
What are the last five rows in the SCAD data (two options)?
rr tail(data, 5) data[(nrow(data)-5):nrow(data),]
Show rows 100 to 107 in the SCAD data.
rr data[100:107, ]
What are the names of the columns in the SCAD data?
rr names(data)
How many times does each country appear in the dataset using ‘table’ in the SCAD data?
table(SCADdata$countryname)
Costa Rica Cuba Dominican Republic El Salvador
35 278 214 135
Guatemala Haiti Honduras Jamaica
348 1322 214 275
Mexico Nicaragua Panama Trinidad and Tobago
1905 165 97 45
How many events started in each year?
rr table(data\(styr) table(data\)eyr)
Write code to save only events that started in 2006 as a new object.
rr new <- data[data$styr == 2006, ]
How many events started in 2006?
rr nrow(new)
Make a table showing the number of events per country that started in 2006.
rr table(new$countryname)
Drop all events where the number of participants is -99.
rr new2 <- data[data$npart != -99, ]
Create a logical vector and numeric vector of equal length:
rr mydata <- data.frame(diabetic = c(TRUE, FALSE, TRUE, FALSE), height = c(65, 69, 71, 73), sex=c(‘Male’, ‘Female’, ‘Female’, ‘Male’), minutesExercise = c(20,30,22,80))
Identify row 3, column 2 height of third male:
rr mydata[3,2]
Using column name, identify the first two rows:
rr mydata[1:2, ]
List all rows of column “diabetic”:
rr mydata[,]
Subsetting creates a numeric vector:
rr mydata$height[2:3]
Numeric vector:
rr mydata[[]]
What are the names of my variables?
rr colnames(mydata)
Assign column names:
rr colnames(mydata) <- c(, , , of Exercise per Week)
#Prof recs alphabetizing
To change the first variable’s name, just use indexing:
rr colnames(mydata)[1] <-
#to verify: colnames(mydata) == ‘Diabetes’
#If you do not now the index but know the name, you can do that too: colnames(mydata)[colnames(mydata) == ‘Diabetes’] <- ‘Diabetic’
#^Where the column name is , change it to .
What are the dimensions of my data frame?
rr dim(mydata)
What are the first five rows?
rr head(mydata)
What are the last five rows?
rr tail(mydata)
Get information about the dataframe, structure:
rr str(mydata)
Get summary statistics:
rr summary(mydata)
summary(mydata$height)
Get number of rows:
rr nrow(mydata)
#Don’t select on col numbers, they won’t stay the same; use colnames(mydata)[colnames(mydata) == ‘Diabetes’] (can also use grep)
Create a dataframe that is five rows tall and three columns wide. The first column is people’s names: Amanda, Brandon, Carl, David, and Eleanor. The second column is their birth year: 1981, 1928, 1997, 2002, 1989. The third column is their height in centimeters: 200, 180, 176, 190, 210.
rr mydata <- data.frame(diabetic = c(TRUE, FALSE, TRUE, FALSE), height = c(65, 69, 71, 73), sex=c(‘Male’, ‘Female’, ‘Female’, ‘Male’), minutesExercise = c(20,30,22,80))
d2 <- data.frame(names=c(‘Amanda’, ‘Brandon’, ‘Carl’, ‘David’, ‘Eleanor’), year=c(1981, 1928, 1997, 2002, 1989), height = c(200,180,176,190,210))
Select rows three through five of the dataframe.
rr d2[3:5,]
Rename the name column to “First_Name”. Rename the birthyear column to “Year”.
rr colnames(d2)[colnames(d2)==‘names’] <- ‘First_Name’ colnames(d2)[colnames(d2)==‘year’] <- ‘Year’
Create a new dataframe of only the individuals who are at least 200 centimeters tall.
rr dtall <- d2[d2$height > 200,]
Create a four row, four column matrix. Multiple that matrix by 3.
rr m <- matrix(1:3, nrow=4, ncol=4) m2 <- matrix(1:16, nrow=4)
m2*3
From that matrix, take the sum of the first column. Take the sum of the third row.
rr sum(m2[,1]) m2[,1] temp <- m2[,1] sum(temp)
sum(m2[3,])
How many columns? ncol(data)
How many rows? nrow(data)
What are the column names? names(data)
What are the row names? rownames(data)
What are the first 6 rows? head(data)
What are the last 6 rows? tail(data)
Note that you can tell head() and tail() how many rows you want to see. 5 is the default. head(data, 10) head(data, 2)
tail(data, 8) tail(data, 2)
If you want to know the set of values a variable takes, use unique() unique(data\(COUNTRY) unique(data\)BYEAR)
If you want to get a sense of how many observations there are for each value of a variable, use table() table(data\(COUNTRY) table(data\)BYEAR)
Explore how to access different data points, based on variable values. We will then modify some of these variables and save the result to a new file.
rr # Select by year.
temp <- data[data$BYEAR == ‘2010’,]
# The events from 2010 are now in this dataframe dim(temp) nrow(temp) - nrow(data)
temp <- data[data$BYEAR == ‘2011’,]
# Now, the dataframe is for 2011.
temp <- data[data$BYEAR < 2010,]
temp2 <- data[data$BYEAR <= 2010,]
nodeaths <- data[data$NDEATH == 0,] january <- data[data$BMONTH == 1,] notjanuary <- data[data$BMONTH != 1,] # != means equal to. notjanuary2 <- data[data$BMONTH > 1,]
Manipulate variables and then save the dataframe. Remember, each column is just a vector, so the vector operations we learned earlier are what we will do here. Note that I do some transformations, like multiple BDAY by BMONTH, just to demonstrate capabilities.
rr data\(newnpart <- data\)NPART*10 data\(newnpart2 <- ifelse(data\)NPART != 99, data\(NPART, NA) data\)npart <- data$NPART/1000
names(data) # Notice that the above line created a new variable.
data\(NPART <- data\)NPART/1000
# Creating a new variable with the same name as an existing one replaces that variable
data\(deathRate <- data\)NDEATH/data\(NPART # What does this do? summary(data\)deathRate) # It did nothing because NDEATH is weirdly constructed.
# We will not discuss here how to fix it.
data\(nonsense <- data\)BDAY*data\(BMONTH data\)COUNTRY[data$COUNTRY == ‘Cote d'Ivoire’] <- ‘Ivory Coast’
# Fails because R treats country name as a factor.
Load the dataframe from earlier. Add the argument `stringsAsFactors=FALSE’.
rr data <- read.csv(‘~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv’, stringsAsFactors = FALSE) data\(COUNTRY[data\)COUNTRY == ‘Cote d'Ivoire’] <- ‘Ivory Coast’
How could you verify that the change happened?
rr # Save the file. Make sure to change the file path for your computer. write.csv(data, ‘~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events_v3.csv’)
write.table(data, ‘~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events_v3.txt’, sep=’)
What are the first five rows?
rr head(mtcars, 5) mtcars[1:5,]
What are the last five rows?
rr tail(mtcars, 5)
What are rows 12 through 21?
rr mtcars[12:21,]
What are the column names?
rr names(mtcars) sort(names(mtcars))
Using mtcars: create a new variable that is horsepower per cylinder.
rr mtcars\(hpc <- mtcars\)hp/mtcars$cyl
Using mtcars: Weight is in thousands of pounds. Create a new variable that is weight in pounds.
rr mtcars\(wt_new <- mtcars\)wt*1000
Using mtcars: Create a new dataframe of cars that get at least 20 miles per gallon. How many cars is that?
rr new_mtcars <- mtcars[mtcars$mpg >= 20,]
Using mtcars: Create a new dataframe of cars that get fewer than 20 miles per gallon.
rr new_mtcars2 <- mtcars[mtcars$mpg < 20,]
PLOTS (histograms, plots, save pdf)
Make a plot of the number of participants at each event using the SCAD data. Label the x and y axis.

Save that plot
rr pdf(~/Desktop/UCLA/Courses/4_Fall 2019/PP 290/Assignments/assn1plot1.pdf) plot(data\(npart[data\)npart != -99], xlab = Index, ylab = of Participants) dev.off()
Make a histogram of the number of participants at each event. Make sure to label the axes.

Make a barplot of the number of participants per event. Hint: you will need to use table as well.

Load data, explore it.
Now plot it:
barplot(table(SCADdata$BYEAR))
no non-missing arguments to min; returning Infno non-missing arguments to max; returning -Infno non-missing arguments to min; returning Infno non-missing arguments to max; returning -InfError in plot.window(xlim, ylim, log = log, ...) :
need finite 'xlim' values

Using mcars: Make a table of the gear variable.
table(mtcars$gear)
3 4 5
15 12 5
Using mcars: Make a proportion table of the gear variable.
prop.table(table(mtcars$gear))
3 4 5
0.46875 0.37500 0.15625
Using mcars: Make a two-way table of the number of cylinders and number of gears.
table(mtcars$gear, mtcars$cyl)
4 6 8
3 1 2 12
4 8 4 0
5 2 1 2
Using mcars: Make a two-way proportion table of the number of cylinders and number of gears. Make one table where the cells add to 1, another where each column adds to 1, and another where each row does.
prop.table(table(mtcars$gear, mtcars$cyl), margin=1)
4 6 8
3 0.06666667 0.13333333 0.80000000
4 0.66666667 0.33333333 0.00000000
5 0.40000000 0.20000000 0.40000000
Using mcars: Plot the miles per gallon against the number of cylinders.

Using mcars: Make a barplot of the cars by the carb variable.

Using mcars: Make a histogram of miles per gallon.

#Non-Network Plots
names(mtcars) # The variables
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
[12] "car"
Base Plotting

Base Plotting, with Multiple Variables: As we learned in lecture, a two-dimensional plot is often too simplistic for the relationships we want to inspect. We will now learn how to change the size, shape, and color of points.

Change the size with the cex argument
legend('topright', legend='Sized by displacement')
Error in strwidth(legend, units = "user", cex = cex, font = text.font) :
plot.new has not been called yet
Change the size with the cex argument

Why would a square root result in dots that are too large? mtcars is very large! Because you do not want the cex numbers to be too large, it is common to take the square root or logarithm. Here, square roots will still create dots that are too large, so I divide by 100 as well as take the logarithm.
What does sizing the dots by displacement reveal about the relationship of displacement with horsepower? What about with miles per gallon? decreased horsepower may be associated with increased miles per gallon.
Now, let us change the shape of points. The key is to use the pch argument, which stands for “point character”. If you look at the help documentation - ?par, then search for pch - you will see the various shapes base R has. The code below first shows you how to change the shape of each entry, and it then shows you how to create a variable enumerating the shape by some other variable.
Change the shape with the pch argument

Why are some dots circles but others triangles? the 1 and 2 in the pch line
Change the shape with the pch argument

This variable is convenient because it is already coded as a number
mtcars$am
[1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
Change shape to be Mazda or not Mazda

Now, let us color the points. The approach is the same as we have seen - pass R a vector of colors and R will make the points those colors. Notice that I have double the size of each point with the cex argument; I do that simply to make the chart more legible.
Change the color with the col argument

What do you notice about the relationship between horsepower, miles per gallon, and the weight of a car?
Plot quarter mile second time against miles per gallon. Change the size based on the car’s weight.

Plot quarter mile second time against miles per gallon. Change the shape based on whether car has an automatic transmission.

Plot quarter mile second time against miles per gallon. Change the color based on whether the car has more than 4 cylinders.

Make one plot that combines the three plots above. That is, change the size, shape, and color of each point based on the car’s weight, transmission, and number of cylinders, respectively.

Modiying the size, shape, and color of nodes is identical. First, let us explore the karate club dataset.

