BASICS

Type ‘q()’ to quit R.

Asking for help:

rr # When you know the function name, use one question mark. ?tail

When you sort of know, use two question marks.

??average ??tail

Install igraph, haven, and tidyverse:

install.packages('tidyverse')
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/tidyverse_1.2.1.tgz'
Content type 'application/x-gzip' length 89217 bytes (87 KB)
==================================================
downloaded 87 KB

The downloaded binary packages are in
    /var/folders/zm/zhc689mj20sf8gz6nfjt09r40000gn/T//Rtmpy32DoF/downloaded_packages
install.packages('igraph')
Error in install.packages : Updating loaded packages
install.packages("haven")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/haven_2.2.0.tgz'
Content type 'application/x-gzip' length 1077133 bytes (1.0 MB)
==================================================
downloaded 1.0 MB

The downloaded binary packages are in
    /var/folders/zm/zhc689mj20sf8gz6nfjt09r40000gn/T//Rtmpy32DoF/downloaded_packages

Set seed:

rr set.seed(09032019)

Load igraph , haven, and tidyverse OR attach data already loaded in R:

library(igraph)

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union
library(tidyverse)
Registered S3 method overwritten by 'dplyr':
  method           from
  print.rowwise_df     
── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.2.1     ✔ purrr   0.3.2
✔ tibble  2.1.3     ✔ dplyr   0.8.3
✔ tidyr   1.0.0     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
✖ purrr::compose()       masks igraph::compose()
✖ tidyr::crossing()      masks igraph::crossing()
✖ dplyr::filter()        masks stats::filter()
✖ dplyr::groups()        masks igraph::groups()
✖ dplyr::lag()           masks stats::lag()
✖ purrr::simplify()      masks igraph::simplify()
library(haven)

library(readxl) 

attach(mtcars)
The following object is masked from package:ggplot2:

    mpg

Read in data from a desktop:

data2 <- read.delim('~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv', sep=',')
cannot open file '/Users/Al/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv': No such file or directoryError in file(file, "rt") : cannot open the connection

Read in data from a website:

# This step only needed to read excel files from the internet:
download.file("https://stats.idre.ucla.edu/stat/data/hsb2.xls", "~/Downloads/myfile.xls", mode="wb")
trying URL 'https://stats.idre.ucla.edu/stat/data/hsb2.xls'
Content type 'application/vnd.ms-excel' length 41984 bytes (41 KB)
==================================================
downloaded 41 KB

Use read_graph to load network data. Create a network from an edgelist. Load data at https://raw.githubusercontent.com/ZacharyST/Data/master/dolphins. Open that link first in your web browser so you know what the data look like. The data are undirected (two options):

data <- graph_from_edgelist(as.matrix(data), directed = FALSE)
Error in as.vector(x, mode) : 
  cannot coerce type 'closure' to vector of type 'any'

Get the adjacency matrix from the graph above. Tell the function not to return a sparse matrix.

rr amps2 <- as_adjacency_matrix(dataps2, sparse = FALSE)

Load the dolphin data again, this time telling R the network is directed.

rr dataps2d <- read_graph(://raw.githubusercontent.com/ZacharyST/Data/master/dolphins, directed = TRUE)

Load the SCAD data from my Github Data repository. The link is https://raw.githubusercontent.com/ZacharyST/Data/master/scad_latiname rica.

rr data <- read.csv(://raw.githubusercontent.com/ZacharyST/Data/master/scad_latinamerica)

INSPECTING A NETWORK (edges, nodes, components, size, degree)

How many nodes does the network have (two options)?

length(V(dataps2))
[1] 63

How many nodes does the network have (two options) using an adjacency matrix?

ncol(amps2)
[1] 63

How many edges does the network have (three options)?

sum(colSums(amps2))/2
[1] 159

Does the network have any components?

count_components(dataps2)
[1] 2

Save the results from above as a new object. Call the entry of that object that tells you have many components there are.

c$no
[1] 2

What is the size of the component(s)?

c$csize
[1]  1 62

What is the degree of each node? Use the object created when you read in the file.

degree(dataps2)
 [1]  0  6  8  4  3  1  4  6  5  6  7  5  1  1  8 12  7  6  9  7  4  9  6  1  3  6  3
[28]  3  5  5  9  5  1  3 10  5  1  7 11  8  2  8  5  6  7  4 11  2  6  1  2  7 10  4
[55]  2  7  2  2  9  1  5  1  3

What is the degree of each node? Use the adjacency matrix?

rowSums(amps2)
 [1]  0  6  8  4  3  1  4  6  5  6  7  5  1  1  8 12  7  6  9  7  4  9  6  1  3  6  3
[28]  3  5  5  9  5  1  3 10  5  1  7 11  8  2  8  5  6  7  4 11  2  6  1  2  7 10  4
[55]  2  7  2  2  9  1  5  1  3

What is the degree of the most social dolphin?

max(degree(dataps2))
[1] 12

Which dolphin is the most social?

V(dataps2)[degree(dataps2) == max(degree(dataps2))]
+ 1/63 vertex, from c1bd9b9:
[1] 16

Repeat the last two queries above, but save the result of the most social dolphin as an object.

V(dataps2)[degree(dataps2) == social]
+ 1/63 vertex, from c1bd9b9:
[1] 16

Load and inspect the KARATE data.

Make column names match the row numbers:

Now, let us count how many nodes there are, the number of components, and the degree of each node:

rr library(igraph) # To load igraph functions g <- graph(d) # Fails. Requires edgelist. g <- graph_from_adjacency_matrix(d) # Fails because d is a dataframe. g <- graph_from_adjacency_matrix(as.matrix(d), mode=) # Works. Do not forget to tell it the kind of network you have.

To just get the count of components:

count_components(g)
[1] 1

The most connections and its node or nodes (two options):

V(g)[degree(g) == maximum]
+ 1/34 vertex, named, from b64e902:
[1] 34

Loop, check work, correct:

# Using the apply function:  
# MARGIN=2 means columns.
apply(d, MARGIN=2, FUN=sum)  # You have to capitalize the argument names.
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 
16  9 10  6  3  4  4  4  5  2  3  1  2  5  2  2  2  2  2  3  2  2  2  5  3  3  2  4 
29 30 31 32 33 34 
 3  4  4  6 12 17 

Find the nodes in the karate network that have the minimum degree.

V(g)[which.min(degree(g))]
+ 1/34 vertex, named, from b64e902:
[1] 12

RANDOM NUMBER GENERATOR (igraph, adjacency matrix)

A binomial distribution is one that produces a series of numbers according to parameters you pass it.We can easily make it produces 1s and 0s and then populate an adjacency matrix with them. The last argument controls the ratio of 1s and 0s. So, half the output will be 1, half will be 0, on average.

rbinom(1,1,.5)
[1] 1
rbinom(1,1,.5)
[1] 1
rbinom(1,1,.5)
[1] 1
rbinom(1,1,.5)
[1] 1
rbinom(1,1,.5)
[1] 0

Let’s speed that up. Why does the below line produce more numbers? 10 replaces 1.

rbinom(10,1,.5)
 [1] 0 0 1 0 1 1 0 1 1 1

Let’s understand rbinom a little more. Second number is options.

rbinom(10,3,.5)
 [1] 2 1 2 0 1 1 2 1 0 3

We can now make a matrix of arbitrary size, so long as rbinom() creates a square amount of numbers.

temp
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]    1    0    0    0    0    0    0    1    1
 [2,]    0    1    0    1    1    1    1    0    1
 [3,]    0    0    1    0    0    0    1    1    1
 [4,]    0    0    0    0    0    0    1    1    1
 [5,]    1    0    0    1    1    1    1    1    0
 [6,]    0    0    1    1    0    0    0    1    0
 [7,]    1    1    1    1    0    0    1    0    1
 [8,]    1    1    0    0    0    0    1    1    1
 [9,]    0    0    0    1    1    0    0    1    0

Maybe this approach is a little more intuitive:

temp
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
 [1,]    0    0    0    0    0    1    0    0    0     1     1     1     0     1
 [2,]    0    0    1    0    1    1    0    1    0     0     0     1     1     1
 [3,]    0    1    0    1    0    0    0    1    0     0     0     1     0     0
 [4,]    0    1    1    0    1    0    0    0    0     1     1     1     1     0
 [5,]    0    0    0    0    0    0    1    0    1     0     0     1     1     1
 [6,]    0    0    0    0    1    0    0    0    1     1     0     0     1     1
 [7,]    0    0    1    1    1    1    0    1    0     0     0     1     0     1
 [8,]    1    1    0    0    1    0    0    0    1     0     0     1     1     0
 [9,]    0    0    1    0    0    1    1    1    0     0     0     1     0     0
[10,]    1    0    1    1    1    1    0    0    1     0     1     1     1     0
[11,]    1    0    0    1    1    1    1    0    0     1     0     0     1     1
[12,]    1    0    0    0    0    1    1    1    0     1     0     0     0     1
[13,]    0    1    1    0    1    1    0    1    0     0     1     0     0     0
[14,]    0    1    0    0    0    1    0    0    0     1     0     0     1     0
[15,]    1    1    1    1    0    1    1    1    1     1     0     0     0     0
      [,15]
 [1,]     1
 [2,]     0
 [3,]     0
 [4,]     0
 [5,]     1
 [6,]     1
 [7,]     0
 [8,]     0
 [9,]     0
[10,]     1
[11,]     0
[12,]     1
[13,]     0
[14,]     1
[15,]     0

Convert to igraph object.

rr g_temp <- graph_from_adjacency_matrix(temp, mode=)

Note that this approach requires using a directed network.

(colSums(temp) + rowSums(temp)) == degree(g_temp)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

You can also make an undirected graph from the adjacency matrix. igraph will handle the construction of edges by assuming any edge is symmetric.

degree(g_temp2) # Gives the same as the above for graph
 [1]  7  9  9  9 12 11 10  9  8 11  8 12  9 10 12

But calculation of other statistics will be tricky. It is better to just make the matrix symmetric right away, which requires some massaging. For the bottom left half of the matrix, make it the same as the top right.

temp2
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
 [1,]    0    0    0    0    0    1    0    0    0     1     1     1     0     1
 [2,]    0    0    1    0    1    1    0    1    0     0     0     1     1     1
 [3,]    0    1    0    1    0    0    0    1    0     0     0     1     0     0
 [4,]    0    0    1    0    1    0    0    0    0     1     1     1     1     0
 [5,]    0    1    0    1    0    0    1    0    1     0     0     1     1     1
 [6,]    1    1    0    0    0    0    0    0    1     1     0     0     1     1
 [7,]    0    0    0    0    1    0    0    1    0     0     0     1     0     1
 [8,]    0    1    1    0    0    0    1    0    1     0     0     1     1     0
 [9,]    0    0    0    0    1    1    0    1    0     0     0     1     0     0
[10,]    1    0    0    1    0    1    0    0    0     0     1     1     1     0
[11,]    1    0    0    1    0    0    0    0    0     1     0     0     1     1
[12,]    1    1    1    1    1    0    1    1    1     1     0     0     0     1
[13,]    0    1    0    1    1    1    0    1    0     1     1     0     0     0
[14,]    1    1    0    0    1    1    1    0    0     0     1     1     0     0
[15,]    1    0    0    0    1    1    0    0    0     1     0     1     0     1
      [,15]
 [1,]     1
 [2,]     0
 [3,]     0
 [4,]     0
 [5,]     1
 [6,]     1
 [7,]     0
 [8,]     0
 [9,]     0
[10,]     1
[11,]     0
[12,]     1
[13,]     0
[14,]     1
[15,]     0

ADJACENCY MATRIX EXAMPLE

Create an adjacency matrix that has 25 rows and columns, where each cell has a .8 probability of containing a 1. Set the diagonal to zero.

probability <- .8
thisnumber <- 25
thismatrix <- matrix(rbinom(thisnumber^2, 1, probability), nrow=thisnumber)
diag(thismatrix) <- 0
thismatrix
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
 [1,]    0    1    1    0    0    0    1    1    0     1     1     1     1     0
 [2,]    1    0    1    1    1    1    0    1    0     1     1     0     1     1
 [3,]    0    1    0    1    1    1    0    1    1     1     1     1     1     1
 [4,]    1    1    1    0    1    1    1    1    1     1     1     1     1     0
 [5,]    1    1    1    1    0    1    1    1    1     0     1     1     1     0
 [6,]    1    1    1    1    1    0    1    0    1     1     1     0     1     1
 [7,]    1    1    0    0    1    0    0    1    1     1     1     1     1     1
 [8,]    0    0    1    0    1    1    1    0    1     0     1     0     0     1
 [9,]    1    1    1    1    1    1    1    0    0     1     1     1     1     1
[10,]    0    1    0    0    1    1    1    1    1     0     1     0     1     1
[11,]    1    1    1    1    1    1    1    1    1     1     0     1     1     1
[12,]    1    1    1    1    1    1    1    0    1     1     1     0     1     1
[13,]    1    1    1    1    1    1    1    1    1     1     1     0     0     1
[14,]    1    1    1    1    1    1    1    1    1     1     0     1     1     0
[15,]    1    1    1    1    1    1    1    1    1     1     1     1     0     1
[16,]    1    1    1    1    1    1    0    1    1     0     1     0     1     1
[17,]    1    1    1    1    0    1    0    1    0     1     0     1     0     0
[18,]    0    1    0    0    1    1    1    1    0     1     0     1     1     0
[19,]    1    1    1    1    1    1    1    1    1     1     0     1     1     1
[20,]    1    0    1    1    1    0    0    0    0     1     1     1     1     1
[21,]    1    1    1    1    1    1    1    1    1     0     1     0     0     1
[22,]    0    1    1    1    1    1    1    0    1     0     1     1     1     1
[23,]    1    1    1    1    1    1    1    1    1     1     1     1     0     1
[24,]    1    1    1    1    1    1    1    1    1     1     1     0     1     1
[25,]    1    0    1    0    0    1    1    1    1     1     1     1     1     1
      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
 [1,]     0     1     1     0     1     1     1     1     1     1     0
 [2,]     1     1     1     1     1     0     0     1     1     1     1
 [3,]     1     0     1     1     1     1     0     0     0     1     0
 [4,]     1     1     0     0     1     1     1     1     1     1     0
 [5,]     0     1     1     0     1     1     1     1     0     0     1
 [6,]     1     0     0     1     1     1     1     1     0     1     0
 [7,]     1     1     0     1     0     1     1     1     1     0     1
 [8,]     0     1     1     0     1     1     1     1     1     1     1
 [9,]     1     0     1     0     1     0     1     1     1     1     1
[10,]     0     0     1     1     1     1     1     0     1     1     1
[11,]     1     1     1     0     1     1     1     0     0     1     0
[12,]     1     1     0     1     1     1     1     1     1     0     1
[13,]     1     1     1     1     1     1     1     1     1     0     1
[14,]     0     1     1     1     1     0     1     1     1     1     1
[15,]     0     1     0     1     1     1     1     1     1     1     1
[16,]     0     0     1     1     1     1     1     1     1     1     1
[17,]     1     1     0     1     0     1     1     0     1     1     1
[18,]     0     0     1     0     1     1     1     1     0     1     1
[19,]     1     0     1     1     0     1     1     0     1     1     0
[20,]     1     1     1     0     1     0     1     1     1     1     0
[21,]     1     0     0     0     1     1     0     1     0     1     1
[22,]     1     1     1     1     1     0     1     0     1     1     1
[23,]     1     0     0     1     1     1     1     1     0     1     1
[24,]     1     0     1     1     1     0     1     1     1     0     1
[25,]     1     1     1     0     1     1     1     0     1     1     0

Convert it to an undirected network.

thismatrix
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
 [1,]    0    1    1    0    0    0    1    1    0     1     1     1     1     0
 [2,]    1    0    1    1    1    1    0    1    0     1     1     0     1     1
 [3,]    1    1    0    1    1    1    0    1    1     1     1     1     1     1
 [4,]    0    1    1    0    1    1    1    1    1     1     1     1     1     0
 [5,]    0    1    1    1    0    1    1    1    1     0     1     1     1     0
 [6,]    0    1    1    1    1    0    1    0    1     1     1     0     1     1
 [7,]    1    0    0    1    1    1    0    1    1     1     1     1     1     1
 [8,]    1    1    1    1    1    0    1    0    1     0     1     0     0     1
 [9,]    0    0    1    1    1    1    1    1    0     1     1     1     1     1
[10,]    1    1    1    1    0    1    1    0    1     0     1     0     1     1
[11,]    1    1    1    1    1    1    1    1    1     1     0     1     1     1
[12,]    1    0    1    1    1    0    1    0    1     0     1     0     1     1
[13,]    1    1    1    1    1    1    1    0    1     1     1     1     0     1
[14,]    0    1    1    0    0    1    1    1    1     1     1     1     1     0
[15,]    0    1    1    1    0    1    1    0    1     0     1     1     1     0
[16,]    1    1    0    1    1    0    1    1    0     0     1     1     1     1
[17,]    1    1    1    0    1    0    0    1    1     1     1     0     1     1
[18,]    0    1    1    0    0    1    1    0    0     1     0     1     1     1
[19,]    1    1    1    1    1    1    0    1    1     1     1     1     1     1
[20,]    1    0    1    1    1    1    1    1    0     1     1     1     1     0
[21,]    1    0    0    1    1    1    1    1    1     1     1     1     1     1
[22,]    1    1    0    1    1    1    1    1    1     0     0     1     1     1
[23,]    1    1    0    1    0    0    1    1    1     1     0     1     1     1
[24,]    1    1    1    1    0    1    0    1    1     1     1     0     0     1
[25,]    0    1    0    0    1    0    1    1    1     1     0     1     1     1
      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
 [1,]     0     1     1     0     1     1     1     1     1     1     0
 [2,]     1     1     1     1     1     0     0     1     1     1     1
 [3,]     1     0     1     1     1     1     0     0     0     1     0
 [4,]     1     1     0     0     1     1     1     1     1     1     0
 [5,]     0     1     1     0     1     1     1     1     0     0     1
 [6,]     1     0     0     1     1     1     1     1     0     1     0
 [7,]     1     1     0     1     0     1     1     1     1     0     1
 [8,]     0     1     1     0     1     1     1     1     1     1     1
 [9,]     1     0     1     0     1     0     1     1     1     1     1
[10,]     0     0     1     1     1     1     1     0     1     1     1
[11,]     1     1     1     0     1     1     1     0     0     1     0
[12,]     1     1     0     1     1     1     1     1     1     0     1
[13,]     1     1     1     1     1     1     1     1     1     0     1
[14,]     0     1     1     1     1     0     1     1     1     1     1
[15,]     0     1     0     1     1     1     1     1     1     1     1
[16,]     1     0     1     1     1     1     1     1     1     1     1
[17,]     0     1     0     1     0     1     1     0     1     1     1
[18,]     1     1     1     0     1     1     1     1     0     1     1
[19,]     1     1     0     1     0     1     1     0     1     1     0
[20,]     1     1     1     1     1     0     1     1     1     1     0
[21,]     1     1     1     1     1     1     0     1     0     1     1
[22,]     1     1     0     1     0     1     1     0     1     1     1
[23,]     1     1     1     0     1     1     0     1     0     1     1
[24,]     1     1     1     1     1     1     1     1     1     0     1
[25,]     1     1     1     1     0     0     1     1     1     1     0

Convert the matrix to an igraph graph object.

rr ig2 <- graph_from_adjacency_matrix(thismatrix, mode=‘undirected’)

What is the average degree of the network?

summary(degree(ig2))[4]
 Mean 
18.64 

What is the density of the network?

edge_density(ig2)  # same
[1] 0.7766667

What is the global transitivity of the network?

transitivity(ig2, type='globalundirected')
[1] 0.7654172

Use an adjacency matrix to create a fake network that has 30 individuals.

newmatrix
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
 [1,]    0    1    0    0    1    0    0    0    0     1     1     0     1     0
 [2,]    1    1    0    0    0    1    0    0    0     0     1     1     0     0
 [3,]    0    0    0    0    1    1    0    0    1     0     0     0     1     1
 [4,]    0    0    0    0    1    0    1    0    1     0     0     0     1     0
 [5,]    1    1    0    1    1    1    0    1    0     0     0     0     1     0
 [6,]    1    1    0    0    0    0    1    0    1     0     0     1     0     0
 [7,]    1    1    0    1    0    0    0    1    0     0     0     0     0     1
 [8,]    0    0    0    0    0    1    1    0    0     1     0     0     1     1
 [9,]    1    0    0    1    0    1    1    1    0     0     1     1     0     1
[10,]    0    0    1    1    0    1    0    1    0     1     0     1     0     1
[11,]    1    0    0    1    1    0    0    1    0     0     0     1     1     1
[12,]    0    1    0    1    1    0    1    0    1     0     1     1     1     0
[13,]    1    0    0    0    0    0    1    0    0     1     1     0     1     1
[14,]    1    0    0    0    0    1    0    1    0     1     1     0     1     0
[15,]    0    1    0    0    0    0    0    1    1     1     1     0     0     0
[16,]    1    0    0    0    0    1    1    0    1     1     0     0     0     1
[17,]    1    0    1    1    1    0    0    0    0     1     0     0     0     1
[18,]    1    0    0    0    1    0    0    1    1     0     0     0     0     0
[19,]    1    1    0    0    1    1    1    1    0     1     1     1     1     0
[20,]    0    1    1    0    1    0    0    0    1     0     0     1     1     1
[21,]    0    0    0    1    1    1    1    0    1     1     0     0     1     0
[22,]    0    1    0    0    1    0    1    1    0     0     0     0     1     1
[23,]    0    1    0    1    0    1    0    1    0     1     0     0     1     0
[24,]    1    0    1    0    0    1    1    0    0     0     0     0     1     1
[25,]    0    1    0    0    1    1    1    1    1     0     1     1     1     1
[26,]    0    1    1    1    1    0    1    0    0     0     1     0     0     1
[27,]    0    1    0    0    0    1    0    1    1     0     1     0     0     0
[28,]    1    0    0    0    1    1    1    0    0     0     1     0     0     1
[29,]    0    1    1    1    0    1    1    1    1     0     1     0     0     0
[30,]    1    1    1    0    1    1    1    0    0     1     0     0     0     0
      [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27]
 [1,]     1     1     0     0     0     0     1     1     1     1     1     0     1
 [2,]     1     1     1     1     0     0     1     0     1     1     1     1     1
 [3,]     1     1     1     1     0     1     0     1     0     0     1     0     0
 [4,]     0     0     0     0     0     0     0     1     0     0     0     1     1
 [5,]     0     1     1     1     0     1     1     0     0     0     1     1     1
 [6,]     0     0     1     0     0     0     1     1     0     0     0     1     1
 [7,]     1     0     1     1     0     0     0     1     1     1     1     0     1
 [8,]     0     0     0     1     0     1     0     0     0     1     1     0     1
 [9,]     1     1     1     1     0     0     0     0     1     1     0     1     1
[10,]     1     0     1     0     1     1     1     1     1     1     1     1     0
[11,]     0     0     0     1     0     0     1     1     0     0     1     1     0
[12,]     1     1     0     0     0     1     1     0     1     0     1     0     1
[13,]     1     0     1     1     1     1     0     1     0     1     0     0     0
[14,]     1     0     1     1     0     1     1     1     0     1     0     0     1
[15,]     1     0     1     0     0     1     1     1     0     0     0     0     0
[16,]     1     1     1     0     0     1     1     0     0     0     0     1     0
[17,]     0     0     0     0     0     1     1     0     0     0     0     1     1
[18,]     1     0     1     0     1     1     1     1     1     0     0     0     1
[19,]     0     0     1     1     0     0     0     1     0     1     1     1     1
[20,]     0     1     1     0     1     0     1     1     1     0     1     0     1
[21,]     1     1     1     1     1     1     0     1     1     0     0     0     1
[22,]     1     0     1     1     0     0     0     1     1     1     1     0     1
[23,]     1     0     0     1     0     0     0     0     0     0     1     0     1
[24,]     1     1     1     0     1     1     1     1     1     1     0     1     1
[25,]     1     0     0     0     0     0     0     0     0     1     0     0     1
[26,]     1     0     1     0     0     1     1     0     1     1     1     0     0
[27,]     0     1     0     0     1     0     1     1     1     0     0     0     0
[28,]     0     0     0     1     0     1     0     1     1     1     1     1     1
[29,]     0     1     1     0     1     0     1     0     0     1     0     1     1
[30,]     1     1     0     1     0     0     0     1     0     1     0     0     1
      [,28] [,29] [,30]
 [1,]     1     0     1
 [2,]     1     0     1
 [3,]     1     0     0
 [4,]     0     1     0
 [5,]     1     0     0
 [6,]     0     0     0
 [7,]     0     0     1
 [8,]     1     0     1
 [9,]     0     1     0
[10,]     1     0     0
[11,]     1     0     0
[12,]     1     1     0
[13,]     1     1     0
[14,]     1     0     1
[15,]     1     1     0
[16,]     1     1     1
[17,]     1     0     1
[18,]     0     0     0
[19,]     0     0     0
[20,]     1     1     1
[21,]     1     1     1
[22,]     0     1     0
[23,]     0     0     1
[24,]     1     0     1
[25,]     0     0     1
[26,]     1     0     0
[27,]     1     0     1
[28,]     0     0     0
[29,]     1     1     0
[30,]     1     1     1

Calculate the degree of each node using the adjacency matrix:

degree(newmatrix)  # fails since not graph object
Error in degree(newmatrix) : Not a graph object

Degree for directed adjacency matrix:

colSums(newmatrix) + rowSums(newmatrix) - 2*diag(newmatrix) # No self loops
 [1] 30 31 20 19 30 27 30 26 29 28 26 24 29 31 29 26 31 27 25 33 36 32 25 33 29 28 34
[28] 34 26 28

Degree for undirected

colSums(newmatrix) + rowSums(newmatrix) - diag(newmatrix) # No self loops
 [1] 30 32 20 19 31 27 30 26 29 29 26 25 30 31 30 27 31 27 25 33 36 33 25 34 29 28 34
[28] 34 27 29

Convert the adjacency matrix to an asymmetric graph object. Use igraph to calculate the degree of each node.

degree(g3)
 [1] 30 33 20 19 32 27 30 26 29 30 26 26 31 31 31 28 31 27 25 33 36 34 25 35 29 28 34
[28] 34 28 30
DENSITY

What is the density of the dolphin network (two options)?

ecount(dataps2)/((vcount(dataps2) * (vcount(dataps2) - 1))/2)
[1] 0.08141321

What is the density of the directed dolphin data?

graph.density(dataps2d)
[1] 0.04070661

Why is the density this value now? How does it compare to what you found in Question 1?

rr #The density of a directed network is half that of an undirected one because there are twice as many possible edges. In the directed network the density is 0.04070661, which is lower than the density in the undirected network (0.08141321). The directed network density is lower, because when we went from an undirected to a directed network, we are adding edges even though we did not add nodes.

Karate club density, automatically and manually:

graph.density(g)
[1] 0.1390374

Let’s make this clearier to understand what’s happening:

totaledges/possibleedges
[1] 0.1390374

Same, counting directly from the adjacency matrix:

sum(colSums(d))/(nrow(d)*ncol(d))
[1] 0.1349481

Our directed graph:

edges/possibleedges  
[1] 0.1764706

Our undirected graph:

graph.density(g_temp2b)
[1] 0.4571429

Do you notice anything about the density of the directed graph? It is lower. Why is the density of the undirected graph higher than that of the directed one even though they use the same data? Because when we went from undirected to directed adjacency matrices, we added a 1 to the bottom left or top right triangle if it did not exist before. So, we are adding edges even though we are not adding nodes.

Generate a 10 x 10 matrix.

rr number <- 10 thismatrix <- matrix(rbinom(number^2, 1, .5), nrow=number)

Calculate its directed density.

tm_d <- graph_from_adjacency_matrix(thismatrix, mode='directed')
tm_d <- graph_from_adjacency_matrix(thismatrix, mode='directed')
graph.density(tm_d)
[1] 0.6111111
edges <- sum(thismatrix)
possibleedges <- nrow(thismatrix)*ncol(thismatrix)-nrow(thismatrix)
edges/possibleedges 
[1] 0.6111111

Calculate its undirected density. Remember, the calculation is the same once you define the edges correctly. First, make symmetric before passing to igraph and remove self loops.

graph.density(tm_u, loops=FALSE)
[1] 0.4888889

CLUSTERING

What is the global clustering of the undirected dolphin data (two options)?

transitivity(dataps2, type = "global")
[1] 0.3087757

What is the node with the highest local clustering value (two options)?

V(data)[V(data)$clustering == clusteringMax]
+ 0/62 vertices, from 294c26f:

Note that one can specificy undirected but it does not change the calculation; igraph does not appear to calculate directed clustering, globally or locally:

Calculate the global transitivity for our directed graph. transitivity(tm_d, type=‘global’)

Calculate the local transitivity for our directed graph. transitivity(tm_d, type=‘local’)

Calculate the global transitivity for our undirected graph. transitivity(tm_u, type=‘global’) transitivity(tm_u, type=‘globalundirected’)

Calculate the local transitivity for our undirected graph. transitivity(tm_u, type=‘local’) transitivity(tm_u, type=‘localundirected’)

Find the node with the highest clustering coefficient for the directed network. V(tm_d)[transitivity(tm_d, type=‘local’) == max(transitivity(tm_d, type=‘local’))]

mt <- max(transitivity(tm_d, type=‘local’)) V(tm_d)[transitivity(tm_d, type=‘local’) == mt]

Find the node with the lowest clustering coefficient undirected network. V(tm_u)[transitivity(tm_u, type=‘local’) == min(transitivity(tm_u, type=‘local’))]

mint <- min(transitivity(tm_u, type=‘local’)) V(tm_u)[transitivity(tm_u, type=‘local’) == mint]

Plot the relationship between degree and clustering for the 10x10 undirected network. plot(degree(tm_u), transitivity(tm_u, type=‘local’), xlab=‘Node Degree’, ylab=‘Local Clustering’, pch=20)

RANDOM NETWORK

Generate a random network containing 200 nodes with a probability of .2 that any two nodes are connected. Use igraph to convert it to an undirected matrix. Make a histogram of the degree distribution, label the axes, give it a title, and save it to file (two options).

Repeat Problem 1, except vary the probability of ties from .1 to 1 in increments of .1. Save each histogram in a file. What do you notice as p changes?

rr nodes <- 200 prob <- seq(.1,1,by=.1)

for(i in 1:length(prob)){ thenetwork <- matrix(rbinom(nodes^2, 1, p=prob[i]), nrow=nodes) diag(thenetwork) <- 0 thenetwork <- graph_from_adjacency_matrix(thenetwork, mode=‘upper’)

pdf(paste0(‘~/Desktop/UCLA’, prob[i], ‘.pdf’)) hist(degree(thenetwork), xlab=‘Degree’, ylab=‘Count’, main=paste0(‘Distribution of Degree, p=’, prob[i]), breaks=50) dev.off() }

#Making Symmetric Networks From an Adjacency Matrix

Convert an unbalaned adjacency matrix to an undirected adjacency matrix. Start by making the matrix:

rr probability <- .2 thisnumber <- 12 thismatrix <- matrix(rbinom(thisnumber^2, 1, probability), nrow=thisnumber) diag(thismatrix) <- 0

Convert it to an undirected network.

t(thismatrix) 
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]    0    0    0    1    0    1    1    0    0     1     0     0
 [2,]    0    0    1    0    0    0    0    0    1     0     0     0
 [3,]    0    0    0    1    0    0    0    1    0     0     0     0
 [4,]    0    0    0    0    0    0    0    0    0     0     0     0
 [5,]    0    0    0    0    0    0    0    0    1     0     0     0
 [6,]    0    0    0    0    1    0    0    0    0     0     0     0
 [7,]    0    0    0    1    1    0    0    0    0     0     0     0
 [8,]    0    0    0    0    0    0    0    0    0     0     0     0
 [9,]    0    1    0    0    0    1    0    1    0     0     0     1
[10,]    0    0    0    0    0    0    0    0    0     0     0     1
[11,]    0    0    0    0    0    1    0    0    0     0     0     0
[12,]    0    0    0    0    0    0    0    1    0     0     0     0

RANDOM GRAPH

Build random graphs from matrices. Then, we use igraph to build a random graph. In both cases, we will learn about the network via some statistics. At the end of this section, we will learn how to automatically vary parameters and view the statistics of the resulting networks. Create using matrix.

howmanyrows <- 100
temp <- matrix(rbinom(howmanyrows^2, 1, .5), nrow=howmanyrows)
diag(temp) <- 0  # Make sure there are no self-loops.

# Do not forget to make symmetric
temp[lower.tri(temp)] <- t(temp)[lower.tri(temp)]

bus720 <- graph_from_adjacency_matrix(temp, mode='undirected')  # I have chosen a weird object name to emphasize that they are arbitrary.  Most documentation for igraph calls graph objects g, but you do not have to.

Explore degree:

summary(degree(bus720))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  36.00   46.00   50.00   49.66   53.00   66.00 

Explore clustering:

transitivity(bus720, type="local")[1:10]
 [1] 0.5206349 0.5132979 0.5063425 0.4804878 0.4964706 0.4956140 0.5130435 0.5045455
 [9] 0.5175510 0.4927536

Explore density:

graph.density(bus720)
[1] 0.5016162

Now, let us learn how to have igraph make a random graph for us. I will then show how you can vary the density of the graph using a loop; you could do the same for nodes, but I leave that exercise for the problem set. I will then show how to put the output from each variation into a dataframe; otherwise, you lose the results as each new loop runs. Create using igraph package, explore degree, clustering, density.

graph.density(ibus720)
[1] 0.4923232

SMALL-WORLD NETWORKS

Recreate the small-world chart I showed in class, the one that shows how average path length and clustering change as the probability of rewiring edges increases. Make the network a ring with 1,000 nodes where each node is connected to ten neighbors. As you work on this problem, keep in mind six things: a. You will need a baseline model by which to normalize (divide) the values of each simulation. That is, you will need a model that does not change for each iteration of the loop. b. Make sure you have very small values of p, smaller than the class example. c. Make sure the x-axis is the logarithm of p. 1. If one of your values of p is 0, you should add a small amount, such as .01 or .1, to each value of the rewiring probability when you plot. log(0) is undefined, but you need to see the result for that value. 2. When you do this, you may notice the x-axis values become negative. It is acceptable if you leave the values like that, but I encourage you to explore how to fix them. Subhint: look at xaxt in plot() and axis() after plot(). Subhint2: You could also look at the log argument in plot(). d. To add a second line to a graph, use lines() after plot(). e. Save the chart to your computer. f. How will you distinguish which line corresponds to which statistic? Use the lty argument to lines() and plot(). If you are adventurous, add a legend to the plot as well.

Use igraph: From here on out, we will rely on igraph’s functions to build networks. While we could build them manually from adjacency matrices, creating the proper adjacency matrix or edgelist is a bit more complicated. It is doable but not necessary for the aims of this class. The igraph function sample_smallworld can be confusing. You define the number of dimensions of the lattice; 1 means a line, 2 means a matrix, and so on. The dimension will raise the number of nodes you want to the power of the dimension. It is therefore easiest to just define the number of nodes you want and assume one dimension. nei is how far (many steps) in each direction to make connections between nodes; the higher it is, the greater the average degree.

Create a small-worl network using igraph package, explore degree, clustering, density:

graph.density(bus20)
[1] 0.06060606

Let’s make the printed output a little easier to read.

paste('The density of the graph is', graph.density(bus20))
[1] "The density of the graph is 0.0606060606060606"

Let’s do that again, changing the size of the network fro 100 to 1,000.

rr nodes <- 1000

bus20 <- sample_smallworld(dim=1, size=nodes, nei=3, p=.2)

Let’s make the printed output a little easier to read.

summary(degree(bus20))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       5       6       6       7      12 
paste('The graph\'s transitivity is', round(transitivity(bus20, type='global'), 5))
[1] "The graph's transitivity is 0.16376"
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(bus20, type='local')))
[1] "The other way to calculate transitivity suggests the value is NaN"
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(bus20, type='local'), na.rm=TRUE))
[1] "The other way to calculate transitivity suggests the value is 0.185631518964852"
paste('The density of the graph is', graph.density(bus20))
[1] "The density of the graph is 0.00600600600600601"
paste('There are', ecount(bus20), 'edges in the graph')
[1] "There are 3000 edges in the graph"

Now, we can vary the parameters like we did for random graphs. Since we are familiar with this process, I will vary p and save the results to a dataframe immediately. Notice that I define how many iterations of the loop differently than for random graphs; they both work the same. I am going to introduce a new way to generate sequences of numbers as well.

Vary p using sequence. There are two ways to do this.

Why does the density never change? Because it is the percentage of possible edges that actually exist; density tells you about the network overall (while degree tells you something about individual nodes)

Create a small-world ring network with 100 nodes that each start with connections to 10 neighbors. Randomly rewire with probability of .03.

sw
IGRAPH 77a5caa U--- 100 500 -- Watts-Strogatz random graph
+ attr: name (g/c), dim (g/n), size (g/n), nei (g/n), p (g/n), loops (g/l),
| multiple (g/l)
+ edges from 77a5caa:
 [1]  1-- 2  2-- 3  3-- 4  4-- 5  5-- 6  7--74  7-- 8  8-- 9 10--82 10--11 11--12
[12] 12--13 13--14 14--15 15--16 16--17 17--18 18--19 19--20 20--21 21--22 22--23
[23] 23--24 24--25 25--26 26--27 27--28 28--29 29--30 30--31 14--32 32--33 33--34
[34] 34--35 35--36 36--37 24--37 38--39 39--40 40--41 41--42 42--43 43--44 44--45
[45] 45--46 46--47 47--48 48--49 49--50 50--51 51--52 52--53 53--54 54--55 55--56
[56] 56--57 57--58 58--59 59--60 60--61 61--62 62--63 63--64 64--65 65--66 66--67
[67] 67--68 68--69 69--70 70--71 71--72 72--73 73--74 74--75 18--75 77--96 77--78
+ ... omitted several edges

Write a loop that creates a new small-world network each iteration. Vary the rewiring probability from 0 to .2 in increments of .01. Save the results in a dataframe, tracking at least the overall clustering, diameter, and average path length.

SCALE FREE NETWORKS

Create a scale free network with 12,000 nodes where each node adds 20 edges.
a. Make a histogram of the degree distribution. b. What is the ratio of the mean to median degree? c. Is the density less than a small-world network? Why?

graph.density(sf)
[1] 0.003330694

Now, we will explore scale-free networks, using Barabasi’s model of preferential attachment. While there are many mechanisms that can generate scale-free distributions, his is the most commonly used and therefore the one most commonly included out of the box in software. We will focus on varying the number of edges added to a network. A key feature of scale-free networks is that the mean degree is much higher than the median; we will track that ratio.

degree_distribution(sf)
  [1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
  [7] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.615000e-01 1.225833e-01
 [13] 9.866667e-02 7.875000e-02 6.375000e-02 5.725000e-02 4.841667e-02 3.783333e-02
 [19] 3.300000e-02 2.925000e-02 2.550000e-02 2.066667e-02 1.966667e-02 1.608333e-02
 [25] 1.325000e-02 1.308333e-02 9.833333e-03 1.033333e-02 1.000000e-02 8.416667e-03
 [31] 8.416667e-03 6.750000e-03 6.083333e-03 5.666667e-03 5.916667e-03 5.916667e-03
 [37] 4.750000e-03 3.833333e-03 4.000000e-03 2.666667e-03 2.250000e-03 2.166667e-03
 [43] 3.250000e-03 2.416667e-03 2.916667e-03 2.333333e-03 2.166667e-03 2.666667e-03
 [49] 2.083333e-03 1.750000e-03 7.500000e-04 1.250000e-03 2.250000e-03 1.916667e-03
 [55] 1.750000e-03 1.416667e-03 1.000000e-03 1.250000e-03 8.333333e-04 5.000000e-04
 [61] 1.500000e-03 1.083333e-03 9.166667e-04 8.333333e-04 4.166667e-04 8.333333e-04
 [67] 5.000000e-04 6.666667e-04 7.500000e-04 5.000000e-04 5.833333e-04 5.000000e-04
 [73] 2.500000e-04 9.166667e-04 5.833333e-04 2.500000e-04 5.833333e-04 4.166667e-04
 [79] 4.166667e-04 5.000000e-04 8.333333e-04 4.166667e-04 5.000000e-04 2.500000e-04
 [85] 5.000000e-04 3.333333e-04 5.000000e-04 4.166667e-04 4.166667e-04 5.000000e-04
 [91] 2.500000e-04 2.500000e-04 1.666667e-04 5.000000e-04 3.333333e-04 8.333333e-05
 [97] 3.333333e-04 4.166667e-04 2.500000e-04 8.333333e-05 3.333333e-04 3.333333e-04
[103] 1.666667e-04 1.666667e-04 5.833333e-04 2.500000e-04 8.333333e-05 8.333333e-05
[109] 3.333333e-04 0.000000e+00 0.000000e+00 1.666667e-04 4.166667e-04 1.666667e-04
[115] 2.500000e-04 1.666667e-04 8.333333e-05 8.333333e-05 8.333333e-05 0.000000e+00
[121] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00
[127] 8.333333e-05 1.666667e-04 1.666667e-04 8.333333e-05 1.666667e-04 1.666667e-04
[133] 1.666667e-04 1.666667e-04 8.333333e-05 1.666667e-04 8.333333e-05 8.333333e-05
[139] 0.000000e+00 1.666667e-04 0.000000e+00 2.500000e-04 0.000000e+00 0.000000e+00
[145] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 1.666667e-04
[151] 0.000000e+00 8.333333e-05 0.000000e+00 8.333333e-05 8.333333e-05 0.000000e+00
[157] 2.500000e-04 8.333333e-05 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[163] 0.000000e+00 0.000000e+00 8.333333e-05 8.333333e-05 8.333333e-05 0.000000e+00
[169] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00
[175] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 8.333333e-05 0.000000e+00
[181] 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05
[187] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[193] 8.333333e-05 0.000000e+00 8.333333e-05 0.000000e+00 1.666667e-04 0.000000e+00
[199] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 1.666667e-04 8.333333e-05
[205] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[211] 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[217] 1.666667e-04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[223] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05
[229] 8.333333e-05 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[235] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[241] 0.000000e+00 8.333333e-05 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[247] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[253] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[259] 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[265] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05
[271] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[277] 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[283] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[289] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[295] 0.000000e+00 0.000000e+00 1.666667e-04 0.000000e+00 0.000000e+00 0.000000e+00
[301] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00
[307] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[313] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[319] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[325] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[331] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00
[337] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[343] 0.000000e+00 0.000000e+00 0.000000e+00 1.666667e-04 0.000000e+00 0.000000e+00
[349] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[355] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[361] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[367] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[373] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[379] 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 8.333333e-05
[385] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[391] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[397] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[403] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05
[409] 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[415] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[421] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[427] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[433] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[439] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[445] 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00 0.000000e+00
[451] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[457] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[463] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 8.333333e-05 0.000000e+00
[469] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[475] 0.000000e+00 0.000000e+00 8.333333e-05

Let’s make the printed output a little easier to read.

Now, let’s use a loop to see how these measures change as we increase the number of nodes in the network. Let us increase it in units of 1,000. Vary network size using a loop:

nodes <- seq(from=1000,to=10000,by=1000)

remember <- data.frame(iteration=NULL, networkSize=NULL, avgDegree=NULL, globalClustering=NULL, glbalClustering2=NULL, density=NULL, ratio=NULL, avgPathLength=NULL)

for(i in 1:length(nodes)){
  temp <- sample_pa(n=nodes[i], power=1, m=10, directed=FALSE)
  
  print(paste('This is loop', i, 'of', length(nodes)))
  
  d <- data.frame(iteration=i, size=nodes[i], avgDegree=mean(degree(temp)), globalClustering=transitivity(temp, type='global'), globalClustering2=mean(transitivity(temp, type='local')), density=graph.density(temp), ratio=mean(degree(temp))/median(degree(temp)), avgPathLength=average.path.length(temp))

  remember <- rbind(remember, d)  # "rowbind", adds a new row, d, to whatever exists in remember.
}

remember # Not bad.

REAL NETWORKS

In this section, we will explore actual network data. The first dataset is e-mails from Enron, an energy firm that imploded when its accounting manipulations were discovered. The second is a sample of individuals from Facebook.

Load Enron:

ecount(enron)
[1] 116865
summary(degree(enron2))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   0.000    0.000    1.000    6.375    2.000 1367.000 
paste('The graph\'s transitivity is', transitivity(enron2, type='global'))
[1] "The graph's transitivity is 0.0715762061104356"
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(enron2, type='local')))
[1] "The other way to calculate transitivity suggests the value is NaN"
paste('The density of the graph is', graph.density(enron2))
[1] "The density of the graph is 0.000173859823993874"
paste('There are', ecount(enron2), 'edges in the graph')
[1] "There are 116865 edges in the graph"
paste('The diameter is', diameter(enron2))  # Takes awhile to run.
[1] "The diameter is 7"
paste('The average path length is', average.path.length(enron2))  
[1] "The average path length is 3.5201731990681"

We can also take subsets of a network based on node features. Below, I will show you how to keep only nodes meeting degree thresholds.

vcount(degree10)/vcount(enron2)  # 8.94% of nodes
[1] 0.08940163
ecount(degree10)/ecount(enron2)  # 62.45% of edges
[1] 0.6245497

You can keep nodes that match any sort of filter you want. Next, I show based on clustering. Show how to select nodes with high clustering

table(degree(highclustering_g))  

   0    1    2    3    4    5    6    7    8    9 
7462  364   95   54   37   10   13    2    2    2 

CALCULATIONS (vectors, objects, phrases)

What is 3819 times 102?

rr 3819 * 102

What is 3819 minus 1892?

rr 3819 - 1892

Repeat the above, but save each number as an object.

rr obj1 <- 3819 obj2 <- 1892 obj1 * obj2 obj1 - obj2

Make a vector of the numbers 10, 20, 30, 40, and 50. Multiply that vector # by another of the numbers 3, 4, 5, 6, and 7.

rr v1 <- c(10, 20, 30, 40, 50) v1_alt <- seq(from = 10, to = 50, by = 10) v2 <- c(3, 4, 5, 6, 7) v2_alt <- 3:7

v1 * v2 v1_alt * v2_alt

Repeat above, except make the second vector the numbers 5 and 10. Show the results and explain, in words, what R is doing with this shorter vector.

rr v2 <- c(5, 10) v1 * v2

#(Warning in v1 * v2: longer object length is not a multiple of shorter object length): R repeats the v2 vector as many times as it takes to match the length of v1. So, it creates a vector that is (5,10,5,10,5)

Does the phrase, “I enjoy learning to program” contain the word “learning”? What about “learn”? What about “towards”?

rr > phrase <- c(‘I enjoy learning to program’) > grepl(‘learning’, phrase) [1] TRUE > grepl(‘learn’, phrase) [1] TRUE > grepl(‘towards’, phrase) [1] FALSE

Why does R return [1]?

rr #Because it is telling the result is a vector of length 1. Vectors and calculation become especially useful when your vector has length greater than 1, such as when it represents a variable with multiple observations. You can then easily manipulate all entries in one vector or perform operations on multiple vectors. To make a vector of length greater than 1, you need c(). The stands for .

Practice with vectors of length greater than 1:

rr first <- c(100,19,20,89,76,102,74,81,65) first * 2
#This multiplies each entry by 2 (great to convert inches to centimeters) first - 20
#This subtracts each entry by 20

second <- c(1,2,4,10,10,1,4,5,10) #Do not need to have the exact same number of entries again, but made it more simple)

first * second
#Multiplies each entry in first by second in order, ie 1*100; multiplying each entry is weighting data first / second

Example using vectors to calculate a more complicated outcome:

rr observed <- first # I have created a new object that has the values of first, i.e., is mapped to first above.

observed == first
# Returns TRUE for as many entries as there are in each vector. (Need two equal marks, one equal will just make a copy like <-, while this compares, good way to check data after doing a transformation)

observed != first #(booleian operation for does not equal)

Chi-square

rr expected <- c(90,18,30,79,80,88,80,83,70) numerator <- (observed-expected)^2 sum(numerator/expected)

Vectors do not have to be numbers. For example, you may have a variable that is a respondent’s favorite sports team, so the entries are likely to be words. If you have a dataframe, the column names are a string vector:

rr char_vec <- c(, , , ) length(char_vec)

‘words’ %in% char_vec ‘words’ == char_vec #for how often ‘word’ %in% char_vec #^good for searching larger texts

sametype <- c(‘this’, 20, ‘converts to’, 5, ‘strings’) #^if R is unsure of a thing it converts it into a string

Create an object that contains five numbers. Multiply those numbers by 2. Add 82 to those numbers.

rr this <- c(1,2,3,4,5) this*2 this+82

Create a second object that contains five numbers. Multiply the two objects together and save the result as a new object. Divide the two objects, saving the result as a new object.

rr arbitrary <- c(18,18,93,103, 45) new <- this*arbitrary new <- this/arbitrary

Does the vector contain the word “tomato”?

rr characters <- c(‘this’, ‘vector’, ‘contains’, ‘words’, ‘that’, ‘have’, ‘words’, ‘with’, ‘letters’, ‘and’, ‘numbers’, 29, ‘30’ )

‘tomato’ %in% characters

How many times does the vector contain the word “words”?

rr grep(‘words’, characters) length(grep(‘words’, characters))

Search the vector for the number 29.

rr grep(‘29’, characters)

Search the vector for the number 100.

rr grep(‘100’, characters)

#integer(0) –> way R says no

It works differently if the words are not separated in a vector:

rr words <- c(‘this vector contains words that have words with letters and numbers 29 30’) ‘this’ %in% words

#FALSE because the word this does not = the first unit in vector words

length(words)

grep(‘this’, words) grepl(‘this’, words)

Notice that grep partial matches by default. To get exact word, use spaces; grep will tell me anywhere vs %in% which is an exact match

grepl(‘contain’, words) grepl(‘contain’, words)

Note: You cannot give a vector a number, e.g., in scores[2,4], R is looking for the second row and fourth column; would need to write:

rr scores <- c(John=25, Marge=34, Dan=24, Emily=29) scores[c(2,4)]

Return individuals with scores greater than 30:

rr scores[scores > 30]

DATA STRUCTURES (create a list, matrix, dataframe, identify variables)

Name each vector in the list:

rr mary_info <- list(classes=c(, , , ), friends=c(, , ), SAT=c(1450,1000,1600))

Return the second vector:

rr mary_info[[2]]

Return the vector named “SAT”

rr mary_info$SAT

Return the second element of friends vector

rr mary_info$friends[2]

Make a list that contains the three departments at Luskin as well as the number of enrolled students in each. Assume the numbers are 60, 80, and 100.

rr list0 <- list(depts = c(, , ), enrolled = c(60 80, 100))

For these next questions, use the list created in the below code.

rr list1 <- list(names=c(‘John’, ‘Kate’, ‘Jessica’, ‘Carl’), grades=c(80,85,60,100), sex=c(‘Male’, ‘Female’, ‘Female’, ‘Male’))

Who has grades greater than 83?

rr list1\(names[list1\)grades > 83]

Which people are female?

rr list1\(names[list1\)sex == ]

Create a 2x3 matrix, filling down columns:

rr a <- matrix(1:6, nrow=2)

Fill across rows:

rr b <- matrix(5:14, nrow=2, byrow=TRUE)

Row 2, column 3:

rr a[2,3]

All rows, column 2:

rr b[,2]

All columns, row 1:

rr a[1,]

Other ways to create matrices:

rr matrix(c(0,1,1,1,0,1,0,0,1), nrow=3)

matrix(0:1, nrow=10, ncol=10)

Create a matrix of 10 rows and 10 columns where each entry is the number 0 or 1 (multiple options).

rr m1 <- matrix(rbinom(100, 1, 0.5), nrow = 10) m <- matrix(0:1, nrow=10, ncol=10)

Show me the first row of the matrix.

rr m1[1, ]

Show the first six rows of the matrix (two options).

rr head(m1) m1[1:6, ]

Show rows 4 through 8.

rr m1[4:8, ]

Show rows 3, 5, 7, and 9.

rr m1[c(3, 5, 7, 9), ]

Show the third column of the matrix.

rr m1[, 3]

Show the third, fifth, and seventh columns of the matrix.

rr m1[, c(3, 5, 7)]

Convert the grades list from above into a data frame (four options).

rr d1 <- data.frame(list1)

as.data.frame(list1)

d1b <- data.frame(names=c(‘John’, ‘Kate’, ‘Jessica’, ‘Carl’), grades=c(80,85,60,100), sex=c(‘Male’, ‘Female’, ‘Female’, ‘Male’))

data.frame(list1)

Who has grades greater than 90?

rr d1[d1$grades > 90, ]

Which people are male?

rr d1[d1$sex == , ]

What are the first six rows and nine columns in the SCAD data?

rr data[1:6, 1:9]

What are the last five rows in the SCAD data (two options)?

rr tail(data, 5) data[(nrow(data)-5):nrow(data),]

Show rows 100 to 107 in the SCAD data.

rr data[100:107, ]

What are the names of the columns in the SCAD data?

rr names(data)

How many times does each country appear in the dataset using ‘table’ in the SCAD data?

table(SCADdata$countryname)

         Costa Rica                Cuba  Dominican Republic         El Salvador 
                 35                 278                 214                 135 
          Guatemala               Haiti            Honduras             Jamaica 
                348                1322                 214                 275 
             Mexico           Nicaragua              Panama Trinidad and Tobago 
               1905                 165                  97                  45 

How many events started in each year?

rr table(data\(styr) table(data\)eyr)

Write code to save only events that started in 2006 as a new object.

rr new <- data[data$styr == 2006, ]

How many events started in 2006?

rr nrow(new)

Make a table showing the number of events per country that started in 2006.

rr table(new$countryname)

Drop all events where the number of participants is -99.

rr new2 <- data[data$npart != -99, ]

Create a logical vector and numeric vector of equal length:

rr mydata <- data.frame(diabetic = c(TRUE, FALSE, TRUE, FALSE), height = c(65, 69, 71, 73), sex=c(‘Male’, ‘Female’, ‘Female’, ‘Male’), minutesExercise = c(20,30,22,80))

Identify row 3, column 2 height of third male:

rr mydata[3,2]

Using column name, identify the first two rows:

rr mydata[1:2, ]

List all rows of column “diabetic”:

rr mydata[,]

Subsetting creates a numeric vector:

rr mydata$height[2:3]

Numeric vector:

rr mydata[[]]

What are the names of my variables?

rr colnames(mydata)

Assign column names:

rr colnames(mydata) <- c(, , , of Exercise per Week)

#Prof recs alphabetizing

To change the first variable’s name, just use indexing:

rr colnames(mydata)[1] <-

#to verify: colnames(mydata) == ‘Diabetes’

#If you do not now the index but know the name, you can do that too: colnames(mydata)[colnames(mydata) == ‘Diabetes’] <- ‘Diabetic’
#^Where the column name is , change it to .

What are the dimensions of my data frame?

rr dim(mydata)

What are the first five rows?

rr head(mydata)

What are the last five rows?

rr tail(mydata)

Get information about the dataframe, structure:

rr str(mydata)

Get summary statistics:

rr summary(mydata)

summary(mydata$height)

Get number of rows:

rr nrow(mydata)

#Don’t select on col numbers, they won’t stay the same; use colnames(mydata)[colnames(mydata) == ‘Diabetes’] (can also use grep)

Create a dataframe that is five rows tall and three columns wide. The first column is people’s names: Amanda, Brandon, Carl, David, and Eleanor. The second column is their birth year: 1981, 1928, 1997, 2002, 1989. The third column is their height in centimeters: 200, 180, 176, 190, 210.

rr mydata <- data.frame(diabetic = c(TRUE, FALSE, TRUE, FALSE), height = c(65, 69, 71, 73), sex=c(‘Male’, ‘Female’, ‘Female’, ‘Male’), minutesExercise = c(20,30,22,80))

d2 <- data.frame(names=c(‘Amanda’, ‘Brandon’, ‘Carl’, ‘David’, ‘Eleanor’), year=c(1981, 1928, 1997, 2002, 1989), height = c(200,180,176,190,210))

Select rows three through five of the dataframe.

rr d2[3:5,]

Rename the name column to “First_Name”. Rename the birthyear column to “Year”.

rr colnames(d2)[colnames(d2)==‘names’] <- ‘First_Name’ colnames(d2)[colnames(d2)==‘year’] <- ‘Year’

Create a new dataframe of only the individuals who are at least 200 centimeters tall.

rr dtall <- d2[d2$height > 200,]

Create a four row, four column matrix. Multiple that matrix by 3.

rr m <- matrix(1:3, nrow=4, ncol=4) m2 <- matrix(1:16, nrow=4)

m2*3

From that matrix, take the sum of the first column. Take the sum of the third row.

rr sum(m2[,1]) m2[,1] temp <- m2[,1] sum(temp)

sum(m2[3,])

How many columns? ncol(data)

How many rows? nrow(data)

What are the column names? names(data)

What are the row names? rownames(data)

What are the first 6 rows? head(data)

What are the last 6 rows? tail(data)

Note that you can tell head() and tail() how many rows you want to see. 5 is the default. head(data, 10) head(data, 2)

tail(data, 8) tail(data, 2)

If you want to know the set of values a variable takes, use unique() unique(data\(COUNTRY) unique(data\)BYEAR)

If you want to get a sense of how many observations there are for each value of a variable, use table() table(data\(COUNTRY) table(data\)BYEAR)

Explore how to access different data points, based on variable values. We will then modify some of these variables and save the result to a new file.

rr # Select by year.
temp <- data[data$BYEAR == ‘2010’,]
# The events from 2010 are now in this dataframe dim(temp) nrow(temp) - nrow(data)

temp <- data[data$BYEAR == ‘2011’,]
# Now, the  dataframe is for 2011.

temp <- data[data$BYEAR < 2010,]
temp2 <- data[data$BYEAR <= 2010,]

nodeaths <- data[data$NDEATH == 0,] january <- data[data$BMONTH == 1,] notjanuary <- data[data$BMONTH != 1,] # != means equal to. notjanuary2 <- data[data$BMONTH > 1,]

Manipulate variables and then save the dataframe. Remember, each column is just a vector, so the vector operations we learned earlier are what we will do here. Note that I do some transformations, like multiple BDAY by BMONTH, just to demonstrate capabilities.

rr data\(newnpart <- data\)NPART*10 data\(newnpart2 <- ifelse(data\)NPART != 99, data\(NPART, NA) data\)npart <- data$NPART/1000
names(data) # Notice that the above line created a new variable.

data\(NPART <- data\)NPART/1000
# Creating a new variable with the same name as an existing one replaces that variable

data\(deathRate <- data\)NDEATH/data\(NPART # What does this do? summary(data\)deathRate) # It did nothing because NDEATH is weirdly constructed.
# We will not discuss here how to fix it.

data\(nonsense <- data\)BDAY*data\(BMONTH data\)COUNTRY[data$COUNTRY == ‘Cote d'Ivoire’] <- ‘Ivory Coast’
# Fails because R treats country name as a factor.

Load the dataframe from earlier. Add the argument `stringsAsFactors=FALSE’.

rr data <- read.csv(‘~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv’, stringsAsFactors = FALSE) data\(COUNTRY[data\)COUNTRY == ‘Cote d'Ivoire’] <- ‘Ivory Coast’

How could you verify that the change happened?

rr # Save the file. Make sure to change the file path for your computer. write.csv(data, ‘~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events_v3.csv’)

write.table(data, ‘~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events_v3.txt’, sep=’)

What are the first five rows?

rr head(mtcars, 5) mtcars[1:5,]

What are the last five rows?

rr tail(mtcars, 5)

What are rows 12 through 21?

rr mtcars[12:21,]

What are the column names?

rr names(mtcars) sort(names(mtcars))

Using mtcars: create a new variable that is horsepower per cylinder.

rr mtcars\(hpc <- mtcars\)hp/mtcars$cyl

Using mtcars: Weight is in thousands of pounds. Create a new variable that is weight in pounds.

rr mtcars\(wt_new <- mtcars\)wt*1000

Using mtcars: Create a new dataframe of cars that get at least 20 miles per gallon. How many cars is that?

rr new_mtcars <- mtcars[mtcars$mpg >= 20,]

Using mtcars: Create a new dataframe of cars that get fewer than 20 miles per gallon.

rr new_mtcars2 <- mtcars[mtcars$mpg < 20,]

PLOTS (histograms, plots, save pdf)

Make a plot of the number of participants at each event using the SCAD data. Label the x and y axis.

Save that plot

rr pdf(~/Desktop/UCLA/Courses/4_Fall 2019/PP 290/Assignments/assn1plot1.pdf) plot(data\(npart[data\)npart != -99], xlab = Index, ylab = of Participants) dev.off()

Make a histogram of the number of participants at each event. Make sure to label the axes.

Make a barplot of the number of participants per event. Hint: you will need to use table as well.

Load data, explore it.

Now plot it:

barplot(table(SCADdata$BYEAR))
no non-missing arguments to min; returning Infno non-missing arguments to max; returning -Infno non-missing arguments to min; returning Infno non-missing arguments to max; returning -InfError in plot.window(xlim, ylim, log = log, ...) : 
  need finite 'xlim' values

Using mcars: Make a table of the gear variable.

table(mtcars$gear)

 3  4  5 
15 12  5 

Using mcars: Make a proportion table of the gear variable.

prop.table(table(mtcars$gear))

      3       4       5 
0.46875 0.37500 0.15625 

Using mcars: Make a two-way table of the number of cylinders and number of gears.

table(mtcars$gear, mtcars$cyl)
   
     4  6  8
  3  1  2 12
  4  8  4  0
  5  2  1  2

Using mcars: Make a two-way proportion table of the number of cylinders and number of gears. Make one table where the cells add to 1, another where each column adds to 1, and another where each row does.

prop.table(table(mtcars$gear, mtcars$cyl), margin=1)
   
             4          6          8
  3 0.06666667 0.13333333 0.80000000
  4 0.66666667 0.33333333 0.00000000
  5 0.40000000 0.20000000 0.40000000

Using mcars: Plot the miles per gallon against the number of cylinders.

Using mcars: Make a barplot of the cars by the carb variable.

Using mcars: Make a histogram of miles per gallon.

#Non-Network Plots

names(mtcars)  # The variables
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"
[12] "car" 

Base Plotting

Base Plotting, with Multiple Variables: As we learned in lecture, a two-dimensional plot is often too simplistic for the relationships we want to inspect. We will now learn how to change the size, shape, and color of points.

Change the size with the cex argument

legend('topright', legend='Sized by displacement')
Error in strwidth(legend, units = "user", cex = cex, font = text.font) : 
  plot.new has not been called yet

Change the size with the cex argument

Why would a square root result in dots that are too large? mtcars is very large! Because you do not want the cex numbers to be too large, it is common to take the square root or logarithm. Here, square roots will still create dots that are too large, so I divide by 100 as well as take the logarithm.

What does sizing the dots by displacement reveal about the relationship of displacement with horsepower? What about with miles per gallon? decreased horsepower may be associated with increased miles per gallon.

Now, let us change the shape of points. The key is to use the pch argument, which stands for “point character”. If you look at the help documentation - ?par, then search for pch - you will see the various shapes base R has. The code below first shows you how to change the shape of each entry, and it then shows you how to create a variable enumerating the shape by some other variable.

Change the shape with the pch argument

Why are some dots circles but others triangles? the 1 and 2 in the pch line

Change the shape with the pch argument

This variable is convenient because it is already coded as a number

mtcars$am
 [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

Change shape to be Mazda or not Mazda

Now, let us color the points. The approach is the same as we have seen - pass R a vector of colors and R will make the points those colors. Notice that I have double the size of each point with the cex argument; I do that simply to make the chart more legible.

Change the color with the col argument

What do you notice about the relationship between horsepower, miles per gallon, and the weight of a car?

Plot quarter mile second time against miles per gallon. Change the size based on the car’s weight.

Plot quarter mile second time against miles per gallon. Change the shape based on whether car has an automatic transmission.

Plot quarter mile second time against miles per gallon. Change the color based on whether the car has more than 4 cylinders.

Make one plot that combines the three plots above. That is, change the size, shape, and color of each point based on the car’s weight, transmission, and number of cylinders, respectively.

Modiying the size, shape, and color of nodes is identical. First, let us explore the karate club dataset.

NETWORK LAYOUT

Finally, let us look at how our networks look different depending on the layout we use. Frankly, igraph is very frustrating here, as the arguments to layout do not possess the same structure. If you want to try a different layout, I suggest googling.

Below is another way to generate coordinates. You generate the coordinates before calling plot, save that as an object, then pass that object to plot. This approach is what you would do if you want to manually change coordinates that igraph generates. You can also use this approach to generate your own layout.

CENTRALITY

Degree Centrality: Let us use a directed dataset, UKfaculty to explore different measures of degree centrality.

# The argument total, all is the same
degreeCentrality == dC2
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[17] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[33] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[65] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[81] TRUE

Now, let us size nodes by their centrality:

It is often difficult to see how different centrality measures covary, since it is unclear what the exact number for each size is. It is therefore a good idea to visualize the network in other ways.

We will use the USairports data dataset, which represents flights in December 2010. Each edge is an airline route between two airports that uses the same aircraft type; there can be multiple edges between each node. You will notice that the edges have attributes as well.

Neighbor Cumulative Degree Centrality

In this section, we will calculate neighbor cumulative degree centrality. There is no default function in igraph to do this, so we will have to do it manually. We will make a directed network undirected and then loop over every node.

V(g)$ncdc == V(g)$ncdc2
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[17] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[33] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[65] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[81] TRUE

Where do I create the vertex attribute for the NCDC measure? For each node, we have to get its neighbors and for each neighbor, get is degree. We then sum the degree of each neighbor and make that the vertex attribute so we can easily call on it for the NCDC measure.

Why do I reset total <- 0 for each new loop? We have to make the directed network undirected and then loop over every node.

Now, let us make plots.

Load the karate club data:

All Steps Centrality

In this section, we will examine the betweenness and eigenvector centrality for the karate club network. See page 95 of Luke 2015 for a list of the igraph centrality commands.

# Correlation of multiple variables
cor(data.frame(V(karate)$bc, V(karate)$ec, degree(karate)))
               V.karate..bc V.karate..ec degree.karate.
V.karate..bc      1.0000000    0.5180063      0.7882018
V.karate..ec      0.5180063    1.0000000      0.8783299
degree.karate.    0.7882018    0.8783299      1.0000000

Now, we will make plots:

Which nodes have high betweenness and eigenvector centrality?

Which nodes have low betweenness but high eigenvector centrality?

Which nodes have high betweeness but low eigenvector ventrality?

The following questions use the UKfaculty data. Pretend like you have not loaded it already. Note that you could make the graph undirected or you can tell each function to assume the graph is not directed:

LOOPS

Write a loop that prints the phrase “Hello World” ten times.

for (index in 1:10) { 
  print("Hello World")
}
[1] "Hello World"
[1] "Hello World"
[1] "Hello World"
[1] "Hello World"
[1] "Hello World"
[1] "Hello World"
[1] "Hello World"
[1] "Hello World"
[1] "Hello World"
[1] "Hello World"

Write a loop that prints the number of the loop multiplied by 10. For example, the first iteration prints 10, the second 20, and so on. Do this procedure for 20 iterations.

for (thisiteration in 1:20) {
print(thisiteration * 10) 
  }
[1] 10
[1] 20
[1] 30
[1] 40
[1] 50
[1] 60
[1] 70
[1] 80
[1] 90
[1] 100
[1] 110
[1] 120
[1] 130
[1] 140
[1] 150
[1] 160
[1] 170
[1] 180
[1] 190
[1] 200

for(i in 1:12){
  print('hello world')
  print(2*i)
}
[1] "hello world"
[1] 2
[1] "hello world"
[1] 4
[1] "hello world"
[1] 6
[1] "hello world"
[1] 8
[1] "hello world"
[1] 10
[1] "hello world"
[1] 12
[1] "hello world"
[1] 14
[1] "hello world"
[1] 16
[1] "hello world"
[1] 18
[1] "hello world"
[1] 20
[1] "hello world"
[1] 22
[1] "hello world"
[1] 24

How would you change the above loop to print a statement 20 times?

for(i in 1:20){  # For each of twenty iterations
  print('Hello world')  # print the phrase, "Hello world"
  print(2*i)
} 
[1] "Hello world"
[1] 2
[1] "Hello world"
[1] 4
[1] "Hello world"
[1] 6
[1] "Hello world"
[1] 8
[1] "Hello world"
[1] 10
[1] "Hello world"
[1] 12
[1] "Hello world"
[1] 14
[1] "Hello world"
[1] 16
[1] "Hello world"
[1] 18
[1] "Hello world"
[1] 20
[1] "Hello world"
[1] 22
[1] "Hello world"
[1] 24
[1] "Hello world"
[1] 26
[1] "Hello world"
[1] 28
[1] "Hello world"
[1] 30
[1] "Hello world"
[1] 32
[1] "Hello world"
[1] 34
[1] "Hello world"
[1] 36
[1] "Hello world"
[1] 38
[1] "Hello world"
[1] 40

Vectors are faster loops. If you are going to create a vector during a loop, it needs to be defined before the loop.

for(i in 1:length(two)){  # Notice how I dynamically changed
  print(result[i] <- one[i]*two[i])
}
[1] 10
[1] 100
[1] 60
[1] 162
[1] 1000
[1] 300

i is used to index place by convention, but you can use anything:

for(waytoolongindex in 1:5){
  print(waytoolongindex)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

You could even use it to make a lot of graphs. Do not forget to change the filepath to be your computer.

d <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbraw.csv", stringsAsFactors = FALSE)

ycolumns <- c('write', 'math', 'science', 'socst')


for(i in 1:length(ycolumns)){
  filename <- paste0('~/Desktop/loopplotexample_', ycolumns[i],'.pdf')
  pdf(filename)
  plot(d$read, d[[ycolumns[i]]], pch=20, xlab='Reading Score', ylab=ycolumns[i])
  dev.off()
}

q()

It is hard to keep each value straight, however. And if we want to return to these results later, we need to rerun the loop. Doing so is annoying and will create slightly different answers because of the random number generator. Let us learn how to save the data as they are created.

In the loop, I show two ways to create the data. The commented one, with # in front of it, does not run but may be easier to read. To use it, remove the # from each line and put # in front of lines that are currently executed. To add or remove comments to multiple lines, highlight the ones you want to change and type cmd+c; on a Windows machine, you probably will replace cmd with the flag key.

Use a loop and igraph to create random networks. Vary the probability of forming an edge 10 times. Put some resulting statistics in a dataframe.

remember2 <- data.frame(iteration=NULL, tieProbability=NULL, avgDegree=NULL, globalClustering=NULL, glbalClustering2=NULL, density=NULL)

theps <- c(.01, .1, .2, .3, .4, .5, .6, .7, .8, .9)
iterations <- length(theps)

for(i in 1:iterations){
  temp <- erdos.renyi.game(howmanyrows, p=randomness[i])
  
  # iteration <- i
  # tieProbability <- randomness[i]
  # avgDegree <- mean(degree(temp))
  # globalClustering <- transitivity(temp, type='global')
  # globalClustering2 <- mean(transitivity(temp, type='local'))
  # density <- graph.density(temp)
  # 
  d <- data.frame(iteration=i, tieProbability=randomness[i], avgDegree=mean(degree(temp)), globalClustering=transitivity(temp, type='global'), globalClustering2=mean(transitivity(temp, type='local')), density=graph.density(temp))
  
  # d <- data.frame(iteration=iteration, tieProbability=tieProbability, avgDegree=avgDegree, globalClustering=globalClustering, globalClustering2=globalClustering2, density=density)
  remember2 <- rbind(remember2, d)  # "rowbind", adds a new row, d, to whatever exists in remember.
  #print(remember)
}

remember2

nrow(remember2)
nrow(remember)

Vary parameters using a loop:

randomness <- c(.01, .1, .3, .5, .7, .9, 1)
iterations <- length(randomness)  # Why did I define iterations this way?
howmanyrows <- 100

# Will not show anything
for(i in 1:iterations){
  temp <- erdos.renyi.game(howmanyrows, p=randomness[i])
  summary(degree(temp))
  transitivity(temp, type='global')
  mean(transitivity(temp, type='local'))
  graph.density(temp)
}

# Need to print
for(i in 1:iterations){
  temp <- erdos.renyi.game(howmanyrows, p=randomness[i])
  print(paste0('On trial ', i))  # This line lets me know which iteration I am on
  print('Summary Statistics:')
  print(summary(degree(temp)))
  print(paste0('Global clustering is: ', transitivity(temp, type='global')))
  print('Average of the each node\'s clustering is: ')  # Maybe you find it easier to read the output this way
  print(mean(transitivity(temp, type='local'), na.rm=TRUE))
  print(paste0('The graph\'s density is: ', graph.density(temp)))
}

KEYWORDS

  1. Script - The text document that contains the commands. It can be in any format, though usually you will see .txt or .R.

  2. Console - Where you see the code run when it is executed. In R Studio, the console is the top right window by default.

  3. Vector - The basic building block of R, a vector is a sequence of items that you will manipulate. A vector can contain as few as one item

  4. Object - R is what is known as an “object-oriented language”. This means there are abstract entitites - the objects - that contain the data on which you work. An entity could be a dataset, a list of numbers, a database, or anything else.

  5. Dataframe - The primary type of object used to represent data in R. Think of a dataframe as a spreadsheet: it has rows and columns, where each row represents a unit of analysis with records for each of the columns (the measured variables). A dataframe is a collection of vectors. A dataframe is a matrix whose vectors can be different data types, such as strings and numbers.

  6. String - If something is a “string”, that means it is a word. A string is in contrast to numeric data.

  7. Package - A set of code that extends the functionality of base R. Each package will have a different purpose, and the clarity of the documentation, as well as functionality of the code, is at the whim of the package’s creator. All packages are free to use.

  8. Debug, Refactor - These two words are fancy ways of saying you are finding and correcting errors in your code.

  9. List - A list is a collection of vectors. Whereas a vector must contain data of the same type, a list can contain vectors of different types (each vector still must only contain one data type).

  10. Matrix - A matrix is a series of vectors of numeric data. A matrix has to contain numeric data. They are important because they can represent relationships in a network, so constructing and manipulating them is important for network analysis.Since we now have an object with two dimensions (rows and columns), we access them differently than we did vectors. A whole row is object[rownumber,], a whole column is object[,columnumber], and a specific cell is object[row, column].

  11. tidyverse - is actually a series of packages created by Hadley Wickham, the two most common of which are dplyr for data merging and aggregation and ggplot for generating graphics. To keep from overloading you with information, I will not actually use the tidyverse in this class, but it is quite common in the R world. igraph is the network package that we will use throughout the quarter.

In addition to using code, you can also use a graphical interface to install packages. In base R, click Packages & Data, then Package Installer. You can then search for the package by name. In R Studio, navigate to Tools, then Install Packages, and search for the package by name. Below’s code shows how to do this process in your script.1

  1. CSV - The main kind of data you will read is .csv, which stands for comma-separated values. It is a text file where each column is separated by a comma. Make sure you change the file path to be to where your data are. Note that read read. line works; the point is that read.csv is a special case of read.delim. This distinction matters because you could have data that are separated by a feature that is not a comma, like a tab or |.

  2. Loops - You use loops when you want to repeat a piece of code multiple times. For example, later in the course we will run network simulations where we want to see how an outcome diffuses over time; each unit of time will correspond to one iteration of the loop. Or you may have a list of files to read, so you will loop over that list in order to load them one at a time. While it is best to use vector operations whenever possible, loops can be more intuitive.


  1. If you look at the R Markdown file, you will notice the option “eval=FALSE”, whereas the other code chunks use “eval=TRUE”. eval tells R Markdown whether or not to execute the code in that chunk. Since I do not want to install these packages every time I compile the document, I set it to FALSE here.

---
title: Midterm Prep
output: html_notebook
---









##### BASICS 


Type 'q()' to quit R.

Asking for help:
```{r}
# When you know the function name, use one question mark.
?tail

# When you sort of know, use two question marks.
??average
??tail
```

Install igraph, haven, and tidyverse:
```{r}
install.packages('tidyverse')

install.packages('igraph')

install.packages("haven")
```

Set seed:
```{r}
set.seed(09032019)
```

Load igraph , haven, and tidyverse OR attach data already loaded in R:
```{r}
library(igraph)

library(tidyverse)

library(haven)

library(readxl) 

attach(mtcars)
```

Read in data from a desktop:
```{r}
data2 <- read.delim('~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv', sep=',')

data <- read.csv('~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv')
```

Read in data from a website:
```{r}
dat_csv <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")

dat_tab <- read.delim("https://stats.idre.ucla.edu/stat/data/hsb2.txt", sep="\t")

# You can also read proprietary data formats (`.xls`, `.dta`, and so on) using various R packages.  The most powerful is `haven`. 

# SPSS
dat_spss <- read_spss("https://stats.idre.ucla.edu/stat/data/hsb2.sav")

# Stata
dat_dta <- read_stata("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")

# This step only needed to read excel files from the internet:
download.file("https://stats.idre.ucla.edu/stat/data/hsb2.xls", "~/Downloads/myfile.xls", mode="wb")

dat_xls <- read_excel("~/Downloads/myfile.xls")
```

Use `read_graph` to load network data. Create a network from an edgelist. Load data at https://raw.githubusercontent.com/ZacharyST/Data/master/dolphins. Open that link first in your web browser so you know what the data look like. The data are undirected (two options):
```{r}
dataps2 <- read_graph('https://raw.githubusercontent.com/ZacharyST/Data/master/dolphins', directed=FALSE, format = c("edgelist"))

data <- read.table("https://raw.githubusercontent.com/ZacharyST/Data/master/dolphins",
header = FALSE)
data <- graph_from_edgelist(as.matrix(data), directed = FALSE)
```

Get the adjacency matrix from the graph above. Tell the function not to return a sparse matrix. 
```{r}
amps2 <- as_adjacency_matrix(dataps2, sparse = FALSE)
```

Load the dolphin data again, this time telling R the network is directed.
```{r}
dataps2d <- read_graph("https://raw.githubusercontent.com/ZacharyST/Data/master/dolphins", directed = TRUE) 
```

Load the SCAD data from my Github Data repository. The link is https://raw.githubusercontent.com/ZacharyST/Data/master/scad_latiname rica.
```{r}
SCADdata <- read.csv("https://raw.githubusercontent.com/ZacharyST/Data/master/scad_latinamerica")
```








  



##### INSPECTING A NETWORK (edges, nodes, components, size, degree)


How many nodes does the network have (two options)?
```{r}
vcount(dataps2)
length(V(dataps2))
```

How many nodes does the network have (two options) using an adjacency matrix?
```{r}
nrow(amps2)
ncol(amps2)
```

How many edges does the network have (three options)?
```{r}
ecount(dataps2)
sum(rowSums(amps2))/2 
sum(colSums(amps2))/2
```

Does the network have any components?
```{r}
components(dataps2)
count_components(dataps2)
```

Save the results from above as a new object. Call the entry of that object that tells you have many components there are.
```{r}
c <- components(dataps2)
c$no
```

What is the size of the component(s)?
```{r}
c$csize
```

What is the degree of each node?  Use the object created when you read in the file.
```{r}
degree(dataps2)
```

What is the degree of each node?  Use the adjacency matrix?
```{r}
rowSums(amps2)
```

What is the degree of the most social dolphin?
```{r}
max(degree(dataps2))
```

Which dolphin is the most social?
```{r}
V(dataps2)[degree(dataps2) == max(degree(dataps2))]
```

Repeat the last two queries above, but save the result of the most social dolphin as an object.
```{r}
social <- max(degree(dataps2)) 
V(dataps2)[degree(dataps2) == social]
```

Load and inspect the KARATE data. 
```{r}
d <- read.table('https://raw.githubusercontent.com/ZacharyST/Data/master/zachary_karate', header=FALSE)
dim(d)  
head(d)  # First six rows look fine, though I do not like the column names.
```

Make column names match the row numbers:
```{r}
names(d) <- as.character(1:nrow(d))  
head(d)
# For the numbers 1 to however many rows there are, make that the column name.  Must be a string.
```

Now, let us count how many nodes there are, the number of components, and the degree of each node:
```{r}
library(igraph)  # To load igraph functions
g <- graph(d)  # Fails.  Requires edgelist.
g <- graph_from_adjacency_matrix(d)  # Fails because d is a dataframe.
g <- graph_from_adjacency_matrix(as.matrix(d), mode="undirected")  # Works.  Do not forget to tell it the kind of network you have.
```

To just get the count of components:
```{r}
count_components(g)
```

The most connections and its node or nodes (two options):
```{r}
max(degree(g))
V(g)[degree(g) == max(degree(g))]

maximum <- max(degree(g))
maximum
V(g)[degree(g) == maximum]
```

Loop, check work, correct:
```{r}
degrees <- NULL
for(i in 1:ncol(d)){
  degrees[i] <- sum(d[,i])
}

# Using the apply function:  
# MARGIN=2 means columns.
apply(d, MARGIN=2, FUN=sum)  # You have to capitalize the argument names.
```

Find the nodes in the karate network that have the minimum degree.
```{r}
themin <- min(degree(g))
themin
V(g)[degree(g) == themin]

V(g)[which.min(degree(g))]
```












#### RANDOM NUMBER GENERATOR (igraph, adjacency matrix)


```{r}
set.seed(01082017)
```

A binomial distribution is one that produces a series of numbers according to parameters you pass it.We can easily make it produces 1s and 0s and then populate an adjacency matrix with them. The last argument controls the ratio of 1s and 0s.  So, half the output will be 1, half will be 0, on average.
```{r}
rbinom(1,1,.5)
rbinom(1,1,.5)
rbinom(1,1,.5)
rbinom(1,1,.5)
rbinom(1,1,.5)
```

Let's speed that up.  Why does the below line produce more numbers? 10 replaces 1.
```{r}
rbinom(10,1,.5)
```

Let's understand rbinom a little more. Second number is options. 
```{r}
rbinom(10,10,.5)
rbinom(10,3,.5)
```

We can now make a matrix of arbitrary size, so long as rbinom() creates a square amount of numbers.
```{r}
temp <- matrix(rbinom(81, 1, .5), nrow=9)
temp
```

Maybe this approach is a little more intuitive:
```{r}
howmanyrows <- 15
temp <- matrix(rbinom(howmanyrows^2, 1, .5), nrow=howmanyrows)
diag(temp) <- 0  # Make sure there are no self-loops.
temp
```

Convert to igraph object.
```{r}
g_temp <- graph_from_adjacency_matrix(temp, mode="directed")  
```

Note that this approach requires using a directed network.
```{r}
colSums(temp) + rowSums(temp)
degree(g_temp)
(colSums(temp) + rowSums(temp)) == degree(g_temp)
```

You can also make an undirected graph from the adjacency matrix. igraph will handle the construction of edges by assuming any edge is symmetric.
```{r}
g_temp2 <- graph_from_adjacency_matrix(temp, mode="undirected")  

colSums(temp) + rowSums(temp) - colSums(temp*t(temp))  # The last part gives you the number of bidirectional connections.
degree(g_temp2) # Gives the same as the above for graph
```

But calculation of other statistics will be tricky.  **It is better to just make the matrix symmetric right away**, which requires some massaging. For the bottom left half of the matrix, make it the same as the top right.
```{r}
temp2 <- temp
temp2[lower.tri(temp2)] <- t(temp2)[lower.tri(temp2)]
temp2
```














#### ADJACENCY MATRIX EXAMPLE


Create an adjacency matrix that has 25 rows and columns, where each cell has a .8 probability of containing a 1.  Set the diagonal to zero.
```{r}
probability <- .8
thisnumber <- 25
thismatrix <- matrix(rbinom(thisnumber^2, 1, probability), nrow=thisnumber)
diag(thismatrix) <- 0
thismatrix
```

Convert it to an undirected network.
```{r}
thismatrix[lower.tri(thismatrix)] <- t(thismatrix)[lower.tri(thismatrix)]
thismatrix
```

Convert the matrix to an igraph graph object.
```{r}
ig2 <- graph_from_adjacency_matrix(thismatrix, mode='undirected')
```

What is the average degree of the network?
```{r}
mean(degree(ig2))
summary(degree(ig2))
summary(degree(ig2))[4]
```

What is the density of the network?
```{r}
density(ig2) # fails
graph.density(ig2)  # works
edge_density(ig2)  # same
```

What is the global transitivity of the network?
```{r}
transitivity(ig2, type='globalundirected')
```

Use an adjacency matrix to create a fake network that has 30 individuals. 
```{r}
people <- 30
newmatrix <- matrix(rbinom(people^2, 1, .5), nrow=people)
newmatrix
```

Calculate the degree of each node using the adjacency matrix:
```{r}
degree(newmatrix)  # fails since not graph object
```

Degree for directed adjacency matrix:
```{r}
colSums(newmatrix) + rowSums(newmatrix)
colSums(newmatrix) + rowSums(newmatrix) - 2*diag(newmatrix) # No self loops
```

Degree for undirected
```{r}
colSums(newmatrix) + rowSums(newmatrix) - diag(newmatrix) # No self loops
```

Convert the adjacency matrix to an asymmetric graph object.  Use igraph to calculate the degree of each node.
```{r}
g3 <- graph_from_adjacency_matrix(newmatrix, mode='directed')
degree(g3)
```














##### DENSITY


What is the density of the dolphin network (two options)?
```{r}
graph.density(dataps2)
ecount(dataps2)/((vcount(dataps2) * (vcount(dataps2) - 1))/2)
```

What is the density of the directed dolphin data?
```{r}
graph.density(dataps2d)
```

Why is the density this value now?  How does it compare to what you found in Question 1?
```{r}
#The density of a directed network is half that of an undirected one because there are twice as many possible edges. In the directed network the density is 0.04070661, which is lower than the density in the undirected network (0.08141321). The directed network density is lower, because when we went from an undirected to a directed network, we are adding edges even though we did not add nodes.
```

Karate club density, automatically and manually:
```{r}
graph.density(g)

ecount(g)/((vcount(g)*(vcount(g)-1))/2)
```

Let's make this clearier to understand what's happening:
```{r}
totaledges <- ecount(g)
possibleedges <- (vcount(g)*(vcount(g)-1))/2
ecount(g)/possibleedges

totaledges/possibleedges
```

Same, counting directly from the adjacency matrix:
```{r}
sum(colSums(d))/(nrow(d)*ncol(d))

edges <- sum(colSums(d))  # no -diag(d) because the diagnoal is already 0
possibleedges2 <- nrow(d)*ncol(d) 
edges/possibleedges2
```

Our directed graph:
```{r}
graph.density(g_temp)

edges <- sum(temp)
possibleedges <- nrow(temp)*ncol(temp)-nrow(temp)
edges/possibleedges  
```

Our undirected graph:
```{r}
## Need a new graph because updated temp2 above to be undirected
g_temp2b <- graph_from_adjacency_matrix(temp2, mode='undirected')

graph.density(g_temp2b)

edges <- sum(temp2)/2
possibleedges <- (nrow(temp2)*nrow(temp2)-nrow(temp2))/2
edges/possibleedges
```

Do you notice anything about the density of the directed graph? It is lower. **Why is the density of the undirected graph higher than that of the directed one even though they use the same data?** Because when we went from undirected to directed adjacency matrices, we added a 1 to the bottom left or top right triangle if it did not exist before.  So, we are adding edges even though we are not adding nodes.

Generate a 10 x 10 matrix.
```{r}
number <- 10
thismatrix <- matrix(rbinom(number^2, 1, .5), nrow=number)
```

Calculate its directed density.
```{r}
tm_d <- graph_from_adjacency_matrix(thismatrix, mode='directed')
graph.density(tm_d)

edges <- sum(thismatrix)
possibleedges <- nrow(thismatrix)*ncol(thismatrix)-nrow(thismatrix)
edges/possibleedges 
```

Calculate its undirected density.  Remember, the calculation is the same once you define the edges correctly. First, make symmetric before passing to igraph and remove self loops.
```{r}
thismatrix2 <- thismatrix
thismatrix2[lower.tri(thismatrix2)] <- t(thismatrix2)[lower.tri(thismatrix2)]  
diag(thismatrix2) <- 0
  
tm_u <- graph_from_adjacency_matrix(thismatrix2, mode='undirected')
graph.density(tm_u, loops=FALSE)

edges <- sum(thismatrix2)/2
possibleedges <- (nrow(thismatrix2)*nrow(thismatrix2)-nrow(thismatrix2))/2
edges/possibleedges
```














#### CLUSTERING


What is the global clustering of the undirected dolphin data (two options)?
```{r}
transitivity(dataps2, type="globalundirected")
transitivity(dataps2, type = "global")
```

What is the node with the highest local clustering value (two options)?
```{r}
T <- transitivity(dataps2, type='local')
T1 <- replace(T, is.na(T), FALSE)
V(dataps2)[T1 == max(T1)]


clustering <- transitivity(dataps2, type = "local") 
clustering[is.na(clustering)] <- 0
dataps2 <- set_vertex_attr(dataps2, "clustering", value = clustering) 
clusteringMax <- max(clustering, na.rm = TRUE)
tokeep <- V(dataps2)$clustering == clusteringMax 
tokeep[is.na(tokeep)] <- FALSE 
V(dataps2)[tokeep]

# ~Below fails because of NA in the Boolean sequence~
V(data)[V(data)$clustering == clusteringMax]
```

Note that one can specificy undirected but it does not change the calculation; `igraph` does not appear to calculate directed clustering, globally or locally:
```{r}
transitivity(g, type="global")
transitivity(g, type="local")

transitivity(g, type="globalundirected")
transitivity(g, type="localundirected")

plot(degree(g), transitivity(g, type='local'), xlab='Degree of Node', ylab='Local Clustering')

this <- transitivity(g, type='local')
plot(degree(g), this, xlab='Degree of Node', ylab='Local Clustering')
```

Calculate the global transitivity for our directed graph.
transitivity(tm_d, type='global')

Calculate the local transitivity for our directed graph.
transitivity(tm_d, type='local')

Calculate the global transitivity for our undirected graph.
transitivity(tm_u, type='global')
transitivity(tm_u, type='globalundirected')

Calculate the local transitivity for our undirected graph.
transitivity(tm_u, type='local')
transitivity(tm_u, type='localundirected')

Find the node with the highest clustering coefficient for the directed network.
V(tm_d)[transitivity(tm_d, type='local') == max(transitivity(tm_d, type='local'))]

mt <- max(transitivity(tm_d, type='local'))
V(tm_d)[transitivity(tm_d, type='local') == mt]

Find the node with the lowest clustering coefficient undirected network.
V(tm_u)[transitivity(tm_u, type='local') == min(transitivity(tm_u, type='local'))]

mint <- min(transitivity(tm_u, type='local'))
V(tm_u)[transitivity(tm_u, type='local') == mint]

Plot the relationship between degree and clustering for the 10x10 undirected network.
plot(degree(tm_u), transitivity(tm_u, type='local'), xlab='Node Degree', ylab='Local Clustering', pch=20)















#### RANDOM NETWORK


Generate a random network containing 200 nodes with a probability of .2 that any two nodes are connected.  Use `igraph` to convert it to an undirected matrix. Make a histogram of the degree distribution, label the axes, give it a title, and save it to file (two options).
```{r}
nodes <- 200
randnet <- matrix(rbinom(nodes^2, 1, .2), nrow=nodes)
randnet[lower.tri(randnet)] <- t(randnet)[lower.tri(randnet)]

randnet_u <- graph_from_adjacency_matrix(randnet, mode='upper') 


nodes <- 200
prob <- 0.2
thenetwork <- matrix(rbinom(nodes^2, 1, p = prob), nrow = nodes)
diag(thenetwork) <- 0

thenetwork <- graph_from_adjacency_matrix(thenetwork, mode = "upper")


pdf('~/Desktop/UCLA/Courses/4_Fall 2019/PP 290/Assignments/assn3plot1.pdf')
hist(degree(thenetwork), xlab = 'Degree', ylab = 'Count', breaks = 50, main = 'Distribution of Degree') 
dev.off()
```

Repeat Problem 1, except vary the probability of ties from .1 to 1 in increments of .1.  Save each histogram in a file.  What do you notice as p changes?
```{r}
nodes <- 200
prob <- seq(.1,1,by=.1)

for(i in 1:length(prob)){
  thenetwork <- matrix(rbinom(nodes^2, 1, p=prob[i]), nrow=nodes)
  diag(thenetwork) <- 0
  thenetwork <- graph_from_adjacency_matrix(thenetwork, mode='upper')
  
  pdf(paste0('~/Desktop/UCLA', prob[i], '.pdf'))
  hist(degree(thenetwork), xlab='Degree', ylab='Count', main=paste0('Distribution of Degree, p=', prob[i]), breaks=50)
  dev.off()
}

```

#Making Symmetric Networks From an Adjacency Matrix

Convert an unbalaned adjacency matrix to an undirected adjacency matrix. Start by making the matrix:
```{r}
probability <- .2
thisnumber <- 12
thismatrix <- matrix(rbinom(thisnumber^2, 1, probability), nrow=thisnumber)
diag(thismatrix) <- 0
```

Convert it to an undirected network.
```{r}
t(thismatrix) 
# transposes this matrix, i.e. cell2,1 -> cell1,2; cell3,1->cell1,3, and so on.  

# I decide to make the lower triangle equal the upper triangle.  So the transpose does that:
thismatrix[lower.tri(thismatrix)] <- t(thismatrix)[lower.tri(thismatrix)]
```
















#### RANDOM GRAPH


Build random graphs from matrices. Then, we use igraph to build a random graph.  In both cases, we will learn about the network via some statistics.  At the end of this section, we will learn how to automatically vary parameters and view the statistics of the resulting networks. Create using matrix.
```{r}
howmanyrows <- 100
temp <- matrix(rbinom(howmanyrows^2, 1, .5), nrow=howmanyrows)
diag(temp) <- 0  # Make sure there are no self-loops.

# Do not forget to make symmetric
temp[lower.tri(temp)] <- t(temp)[lower.tri(temp)]

bus720 <- graph_from_adjacency_matrix(temp, mode='undirected')  # I have chosen a weird object name to emphasize that they are arbitrary.  Most documentation for igraph calls graph objects g, but you do not have to.

```

Explore degree:
```{r}
adj_degrees <- colSums(temp)
adj_degrees[1:10]
degree(bus720)[1:10]

summary(adj_degrees)
summary(degree(bus720))

```

Explore clustering:
```{r}
transitivity(bus720, type="global")
transitivity(bus720, type="local")[1:10]
```

Explore density:
```{r}
graph.density(bus720)
```


Now, let us learn how to have `igraph` make a random graph for us.  I will then show how you can vary the density of the graph using a loop; you could do the same for nodes, but I leave that exercise for the problem set.  I will then show how to put the output from each variation into a dataframe; otherwise, you lose the results as each new loop runs. Create using igraph package, explore degree, clustering, density.
```{r}
ibus720 <- erdos.renyi.game(howmanyrows,p=.5)

summary(degree(ibus720))
transitivity(ibus720, type='global')
transitivity(ibus720, type='local')[1:10]
graph.density(ibus720)
```
















#### SMALL-WORLD NETWORKS


Recreate the small-world chart I showed in class, the one that shows how average path length and clustering change as the probability of rewiring edges increases. Make the network a ring with 1,000 nodes where each node is connected to ten neighbors. As you work on this problem, keep in mind six things: 
    a. You will need a baseline model by which to normalize (divide) the
    values of each simulation.  That is, you will need a model that does
    not change for each iteration of the loop.
    b. Make sure you have very small values of p, smaller than the class
    example.
    c. Make sure the x-axis is the logarithm of p.
      1. If one of your values of p is 0, you should add a small amount,
      such as .01 or .1, to each value of the rewiring probability when
      you plot.  `log(0)` is undefined, but you need to see the result
      for that value.
      2. When you do this, you may notice the x-axis values become
      negative.  It is acceptable if you leave the values like that, but
      I encourage you to explore how to fix them.  Subhint: look at
      `xaxt` in `plot()` and `axis()` after `plot()`.  Subhint2: You
      could also look at the `log` argument in `plot()`.
    d. To add a second line to a graph, use `lines()` after `plot()`.
    e. Save the chart to your computer.
    f. How will you distinguish which line corresponds to which
    statistic?  Use the `lty` argument to lines() and plot().  If you
    are adventurous, add a legend to the plot as well. 
```{r}
probs <- seq(0, 1, by=.001)
nodes <- 1000
neighbors <- 10

baseline <- sample_smallworld(dim=1, size=nodes, nei=neighbors, p=0)
b_gC <- mean(transitivity(baseline, type='local'))
g_aPL <- average.path.length(baseline)
  
remember <- data.frame(iteration=NULL, rewireProbability=NULL, globalClustering=NULL, avgPathLength=NULL)

for(i in 1:length(probs)){
  print(paste('On iteration', i))
  
  temp <- sample_smallworld(dim=1, size=nodes, nei=neighbors, p=probs[i])

  globalClustering <- mean(transitivity(temp, type='local'))/b_gC  
  avgPathLength <- average.path.length(temp)/g_aPL
  
  d <- data.frame(iteration=i, rewireProbability=probs[i], globalClustering=globalClustering, avgPathLength=avgPathLength)
  
  remember <- rbind(remember, d)
}

# The below line adds a little bit to each value because there is a 0 that I want to keep, but log(0) is undefined.

remember$rewireProbability <- remember$rewireProbability+.00001

pdf('~/Desktop/UCLA.pdf')
plot(x=remember$rewireProbability, y=remember$avgPathLength, xlab='Rewire Probability', ylab='Change in Path Length, Clustering', main='Small-world Networks', type='l', log='x')
lines(x=remember$rewireProbability, y=remember$globalClustering, lty=3)
legend('topright', legend=c('Average Path Length', 'Clustering'), lty=c(1,3))
dev.off()
```


Use igraph: From here on out, we will rely on `igraph`'s functions to build networks.  While we could build them manually from adjacency matrices, creating the proper adjacency matrix or edgelist is a bit more complicated.  It is doable but not necessary for the aims of this class. The `igraph` function `sample_smallworld` can be confusing.  You define the number of dimensions of the lattice; 1 means a line, 2 means a matrix, and so on.  The dimension will raise the number of nodes you want to the power of the dimension.  It is therefore easiest to just define the number of nodes you want and assume one dimension.  `nei` is how far (many steps) in each direction to make connections between nodes; the higher it is, the greater the average degree.


Create a small-worl network using igraph package, explore degree, clustering, density:
```{r}
nodes <- 100

bus20 <- sample_smallworld(dim=1, size=nodes, nei=3, p=.2)
summary(degree(bus20))
transitivity(bus20, type='global')
mean(transitivity(bus20, type='local'))
graph.density(bus20)

```

Let's make the printed output a little easier to read.
```{r}
summary(degree(bus20))
paste('The graph\'s transitivity is', round(transitivity(bus20, type='global'),2))
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(bus20, type='local')))
paste('The density of the graph is', graph.density(bus20))
paste('There are', ecount(bus20), 'edges in the graph')

```

Let's do that again, changing the size of the network fro 100 to 1,000.
```{r}
nodes <- 1000

bus20 <- sample_smallworld(dim=1, size=nodes, nei=3, p=.2)
```

Let's make the printed output a little easier to read.
```{r}
summary(degree(bus20))
paste('The graph\'s transitivity is', round(transitivity(bus20, type='global'), 5))
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(bus20, type='local')))
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(bus20, type='local'), na.rm=TRUE))
paste('The density of the graph is', graph.density(bus20))
paste('There are', ecount(bus20), 'edges in the graph')
```

Now, we can vary the parameters like we did for random graphs.  Since we are familiar with this process, I will **vary `p` and save the results to a dataframe** immediately.  Notice that I define how many iterations of the loop differently than for random graphs; they both work the same.  I am going to introduce a new way to generate sequences of numbers as well.

Vary p using sequence.  There are two ways to do this.
```{r}
p <- seq(from=0, to=1, by=.1)
p2 <- seq(from=0, to=1, length.out=11)
p == p2  # Great

nodes <- 250

remember <- data.frame(iteration=NULL, rewireProbability=NULL, avgDegree=NULL, globalClustering=NULL, glbalClustering2=NULL, density=NULL, diameter=NULL, avgPathLength=NULL)

for(i in 1:length(p)){
  temp <- sample_smallworld(dim=1, size=nodes, nei=3, p=p[i])
  
  print(paste('We are on loop', i))
  iteration <- i
  rewireProbability <- p[i]
  avgDegree <- mean(degree(temp))
  globalClustering <- transitivity(temp, type='global')
  globalClustering2 <- mean(transitivity(temp, type='local'), na.rm=TRUE)
  density <- graph.density(temp)
  diameter <- diameter(temp)
  avgPathLength <- average.path.length(temp)

  print('Just finished calculations.  Making dataframe.')
  
  d <- data.frame(iteration=iteration, rewireProbability=rewireProbability, avgDegree=avgDegree, globalClustering=globalClustering, globalClustering2=globalClustering2, density=density, diameter=diameter, avgPathLength=avgPathLength)
  
  remember <- rbind(remember, d)  # "rowbind", adds a new row, d, to whatever exists in remember.
}

remember  # Woo!
```

**Why does the density never change?** Because it is the percentage of possible edges that actually exist; density tells you about the network overall (while degree tells you something about individual nodes)

Create a small-world ring network with 100 nodes that each start with connections to 10 neighbors.  Randomly rewire with probability of .03.
```{r}
sw <- sample_smallworld(dim=1, size=100, nei=5, p=.03)
sw
```

Write a loop that creates a new small-world network each iteration.  Vary the rewiring probability from 0 to .2 in increments of .01.  Save the results in a dataframe, tracking at least the overall clustering, diameter, and average path length.
```{r}
theps <- seq(0, .2, by=.01)

remember3 <- data.frame(iteration=NULL, rewireProbability=NULL, avgDegree=NULL, globalClustering=NULL, glbalClustering2=NULL, density=NULL, diameter=NULL, avgPathLength=NULL)

nodes <- 100
for(i in 1:length(theps)){
  temp <- sample_smallworld(dim=1, size=nodes, nei=3, p=theps[i])
  
  print(paste('We are on loop', i))
  iteration <- i
  rewireProbability <- theps[i]
  avgDegree <- mean(degree(temp))
  globalClustering <- transitivity(temp, type='global')
  globalClustering2 <- mean(transitivity(temp, type='local'), na.rm=TRUE)
  density <- graph.density(temp)
  diameter <- diameter(temp)
  avgPathLength <- average.path.length(temp)

  #print('Just finished calculations.  Making dataframe.')
  
  d <- data.frame(iteration=iteration, rewireProbability=rewireProbability, avgDegree=avgDegree, globalClustering=globalClustering, globalClustering2=globalClustering2, density=density, diameter=diameter, avgPathLength=avgPathLength)
  
  remember3 <- rbind(remember3, d)  # "rowbind", adds a new row, d, to whatever exists in remember.
}

remember3
```














#### SCALE FREE NETWORKS


Create a scale free network with 12,000 nodes where each node adds 20 edges.  
    a. Make a histogram of the degree distribution.
    b. What is the ratio of the mean to median degree?
    c. Is the density less than a small-world network?  Why?
```{r}
nodes <- 12000
newedges <- 20
sf <- sample_pa(n=12000, power=1, m=newedges, directed=FALSE)

pdf('Figures/PS3_scalefree.pdf')
hist(degree(sf), breaks=100, xlab='Degree', ylab='Count', main='Scale-free Network')
dev.off()

mean(degree(sf))/median(degree(sf))

graph.density(sf)

# Density is less in a scale-free network because edges are added randomly, whereas small-world networks start from networks that have clustering values of 1. The density is less in a scale-free network than in a small-world network, because edges are added at random, i.e., small-world networks start from networks that have a clustering value of 1 and will therefore always have a greater density.

#The rule above applies to global clustering as well, i.e., global clustering is less in a scale-free network than in a small-world network.

```

Now, we will explore scale-free networks, using Barabasi's model of preferential attachment.  While there are many mechanisms that can generate scale-free distributions, his is the most commonly used and therefore the one most commonly included out of the box in software. We will focus on varying the number of edges added to a network.  A key feature of scale-free networks is that the mean degree is much higher than the median; we will track that ratio.
```{r}
Nodes <- 1000
sf <- sample_pa(n=nodes, power=1, m=10, directed=FALSE)

degree_distribution(sf)  # As long as max(degree(sf)).  Gives probability of node having that degree.
```

Let's make the printed output a little easier to read.
```{r}
summary(degree(sf))
paste('The graph\'s transitivity is', transitivity(sf, type='global'))
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(sf, type='local')))
paste('The density of the graph is', graph.density(sf))
paste('There are', ecount(sf), 'edges in the graph')
paste('The ratio of mean to median degree is', mean(degree(sf))/median(degree(sf)))

# Let's change m
sf <- sample_pa(n=nodes, power=1, m=1, directed=FALSE)

degree_distribution(sf)  # As long as max(degree(sf)).  Gives probability of node having that degree.

# Let's make the printed output a little easier to read.
summary(degree(sf))
paste('The graph\'s transitivity is', transitivity(sf, type='global'))
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(sf, type='local')))
paste('The density of the graph is', graph.density(sf))
paste('There are', ecount(sf), 'edges in the graph')
paste('The ratio of mean to median degree is', mean(degree(sf))/median(degree(sf)))

# Let's change m AGAIN
sf <- sample_pa(n=nodes, power=1, m=20, directed=FALSE)

degree_distribution(sf)  # As long as max(degree(sf)).  Gives probability of node having that degree.

# Let's make the printed output a little easier to read.
summary(degree(sf))
paste('The graph\'s transitivity is', transitivity(sf, type='global'))
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(sf, type='local')))
paste('The density of the graph is', graph.density(sf))
paste('There are', ecount(sf), 'edges in the graph')
paste('The ratio of mean to median degree is', mean(degree(sf))/median(degree(sf)))

# Let's make a histogram of the degree distribution
#pdf(/change/this/filename/to/save/out.pdf)
hist(degree(sf), xlab='Node Degree', ylab='Count', breaks=50, main='Distribution of Degree in Scale-Free Graph')
#dev.off()
```

Now, let's use a loop to see how these measures change as we increase the number of nodes in the network.  Let us increase it in units of 1,000. Vary network size using a loop:
```{r}
nodes <- seq(from=1000,to=10000,by=1000)

remember <- data.frame(iteration=NULL, networkSize=NULL, avgDegree=NULL, globalClustering=NULL, glbalClustering2=NULL, density=NULL, ratio=NULL, avgPathLength=NULL)

for(i in 1:length(nodes)){
  temp <- sample_pa(n=nodes[i], power=1, m=10, directed=FALSE)
  
  print(paste('This is loop', i, 'of', length(nodes)))
  
  d <- data.frame(iteration=i, size=nodes[i], avgDegree=mean(degree(temp)), globalClustering=transitivity(temp, type='global'), globalClustering2=mean(transitivity(temp, type='local')), density=graph.density(temp), ratio=mean(degree(temp))/median(degree(temp)), avgPathLength=average.path.length(temp))

  remember <- rbind(remember, d)  # "rowbind", adds a new row, d, to whatever exists in remember.
}

remember # Not bad.
```












#### REAL NETWORKS

In this section, we will explore actual network data.  The first dataset is e-mails from Enron, an energy firm that imploded when its accounting manipulations were discovered.  The second is a sample of individuals from Facebook.

Load Enron:
```{r}
enron <- read_graph('https://raw.githubusercontent.com/ZacharyST/Data/master/enron_email', format=c('edgelist'))
#Another way: load Enron data using read.table, convert to igraph object.

# Explore
is.directed(enron)
enron <- as.undirected(enron) #make it undirected
is.directed(enron)

length(V(enron)) #nodes
vcount(enron) #same as above

ecount(enron) #edges
```

```{r}
# Are there actually negative values?
summary(enron)  # No negative values.

# So, add one to each vertex id.  This is fine because they are just ID numbers.
# NB: I create a new object, enron2, to show you the difference.  If I was not teaching, I would just have written enron <- enron + 1.
enron2 <- enron + 1
head(enron2)
head(enron)

enron2 <- graph_from_edgelist(as.matrix(enron2), directed=FALSE)
is.directed(enron2)
vcount(enron2)
ecount(enron2)

# Let's look at summary statisics.
summary(degree(enron2))
paste('The graph\'s transitivity is', transitivity(enron2, type='global'))
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(enron2, type='local')))
paste('The density of the graph is', graph.density(enron2))
paste('There are', ecount(enron2), 'edges in the graph')
paste('The diameter is', diameter(enron2))  # Takes awhile to run.
paste('The average path length is', average.path.length(enron2))  # Takes awhile to run.
```

We can also take subsets of a network based on node features.  Below, I will show you how to keep only nodes meeting degree thresholds.
```{r}
# Show how to select only high-degree nodes
degree10 <- induced_subgraph(enron2, degree(enron2)>=10)
vcount(degree10)
ecount(degree10)

vcount(degree10)/vcount(enron2)  # 8.94% of nodes
ecount(degree10)/ecount(enron2)  # 62.45% of edges

summary(degree(degree10))
paste('The graph\'s transitivity is', transitivity(degree10, type='global'))
paste('The other way to calculate transitivity suggests the value is', mean(transitivity(degree10, type='local'), na.rm=TRUE))
paste('The density of the graph is', graph.density(degree10))
paste('There are', ecount(degree10), 'edges in the graph')
paste('The diameter is', diameter(degree10))  # Takes awhile to run.
paste('The average path length is', average.path.length(degree10))  # Takes awhile to run.
```

You can keep nodes that match any sort of filter you want.  Next, I show based on clustering. Show how to select nodes with high clustering
```{r}
highclustering <- max(transitivity(enron2, type='local'))
highclustering # Hm?  
transitivity(enron2, type="local")[1:100] #Ok, the code works.  Sometimes, this means there is a NA value, and many R functions need help with that.

highclustering <- max(transitivity(enron2, type='local'), na.rm=TRUE)

# Take nodes with half the max clustering value or more.
# I have to add the extra steps because of the NA values the first line returns.
keep <- transitivity(enron2, type='local') > highclustering*.5
keep[is.na(keep)] <- FALSE  # R will not know what to do with the NA values.
keep[1:3000] # So we know what keep looks like.
highclustering_g <- induced_subgraph(enron2, V(enron2)[keep])

vcount(highclustering_g)
ecount(highclustering_g)  # Very low.  Why?

table(degree(highclustering_g))  # This is why.
```











#### CALCULATIONS (vectors, objects, phrases)


What is 3819 times 102?
```{r}
3819 * 102
```

What is 3819 minus 1892?
```{r}
3819 - 1892
```

Repeat the above, but save each number as an object.
```{r}
obj1 <- 3819 
obj2 <- 1892 
obj1 * obj2
obj1 - obj2
```

Make a vector of the numbers 10, 20, 30, 40, and 50. Multiply that vector # by another of the numbers 3, 4, 5, 6, and 7.
```{r}
v1 <- c(10, 20, 30, 40, 50)
v1_alt <- seq(from = 10, to = 50, by = 10)
v2 <- c(3, 4, 5, 6, 7) v2_alt <- 3:7

v1 * v2
v1_alt * v2_alt
```

Repeat above, except make the second vector the numbers 5 and 10. Show the results and explain, in words, what R is doing with this shorter vector.
```{r}
v2 <- c(5, 10)
v1 * v2

#(Warning in v1 * v2: longer object length is not a multiple of shorter object length): R repeats the v2 vector as many times as it takes to match the length of v1. So, it creates a vector that is (5,10,5,10,5)
```

Does the phrase, "I enjoy learning to program" contain the word "learning"?  What about "learn"?  What about "towards"? 
```{r}
> phrase <- c('I enjoy learning to program')
> grepl('learning', phrase)
[1] TRUE 
> grepl('learn', phrase)
[1] TRUE
> grepl('towards', phrase)
[1] FALSE
```

Why does R return `[1]`?
```{r}
#Because it is telling the result is a vector of length 1. Vectors and calculation become especially useful when your vector has length greater than 1, such as when it represents a variable with multiple observations.  You can then easily manipulate all entries in one vector or perform operations on multiple vectors.  To make a vector of length greater than 1, you need `c()`.  The "c" stands for "concatenate".
```

Practice with vectors of length greater than 1:
```{r}
first <- c(100,19,20,89,76,102,74,81,65)
first * 2  
#This multiplies each entry by 2 (great to convert inches to centimeters)
first - 20  
#This subtracts each entry by 20

second <- c(1,2,4,10,10,1,4,5,10) 
#Do not need to have the exact same number of entries again, but made it more simple)

first * second  
#Multiplies each entry in first by second in order, ie 1*100; multiplying each entry is weighting data
first / second
```

Example using vectors to calculate a more complicated outcome:
```{r}
observed <- first 
# I have created a new object that has the values of first, i.e., is mapped to first above.

observed == first  
# Returns TRUE for as many entries as there are in each vector. (Need two equal marks, one equal will just make a copy like <-, while this compares, good way to check data after doing a transformation)

observed != first 
#(booleian operation for does not equal)
```

Chi-square
```{r}
expected <- c(90,18,30,79,80,88,80,83,70) 
numerator <- (observed-expected)^2
sum(numerator/expected)  
```

Vectors do not have to be numbers. For example, you may have a variable that is a respondent's favorite sports team, so the entries are likely to be words.  If you have a dataframe, the column names are a string vector:
```{r}
char_vec <- c("these", "are", "some", "words")
length(char_vec)

'words' %in% char_vec
'words' == char_vec
#for how often
'word' %in%  char_vec
#^good for searching larger texts

sametype <- c('this', 20, 'converts to', 5, 'strings')
#^if R is unsure of a thing it converts it into a string
```

Create an object that contains five numbers. Multiply those numbers by 2.  Add 82 to those numbers.
```{r}
this <- c(1,2,3,4,5)
this*2
this+82
```

Create a second object that contains five numbers. Multiply the two objects together and save the result as a new object. Divide the two objects, saving the result as a new object.
```{r}
arbitrary <- c(18,18,93,103, 45)
new <- this*arbitrary
new <- this/arbitrary
```

Does the vector contain the word "tomato"?
```{r}
characters <- c('this', 'vector', 'contains', 'words', 'that', 'have', 'words', 'with', 'letters', 'and', 'numbers', 29, '30' )

'tomato' %in% characters
```

How many times does the vector contain the word "words"?
```{r}
grep('words', characters)
length(grep('words', characters))
```

Search the vector for the number 29. 
```{r}
grep('29', characters)
```

Search the vector for the number 100.
```{r}
grep('100', characters)

#integer(0) --> way R says no
```

It works differently if the words are not separated in a vector:
```{r}
words <- c('this vector contains words that have words with letters and numbers 29 30')
'this' %in% words

#FALSE because the word this does not = the first unit in vector words 

length(words)

grep('this', words)
grepl('this', words)

# Notice that grep partial matches by default. To get exact word, use spaces; grep will tell me anywhere vs %in% which is an exact match

grepl('contain', words)
grepl('contain ', words)
```

Note: You cannot give a vector a number, e.g., in scores[2,4], R is looking for the second row and fourth column; would need to write:
```{r}
scores <- c(John=25, Marge=34, Dan=24, Emily=29)
scores[c(2,4)]
```

Return individuals with scores greater than 30:
```{r}
scores[scores > 30]  
```










#### DATA STRUCTURES (create a list, matrix, dataframe, identify variables)


Name each vector in the list:
```{r}
mary_info <- list(classes=c("Biology", "Math", "Music",
                            "Physics"),
                  friends=c("John", "Dan", "Emily"),
                  SAT=c(1450,1000,1600))
```

Return the second vector:
```{r}
mary_info[[2]]
```

Return the vector named "SAT"
```{r}
mary_info$SAT 
```

Return the second element of friends vector
```{r}
mary_info$friends[2]
```

Make a list that contains the three departments at Luskin as well as the number of enrolled students in each.  Assume the numbers are 60, 80, and 100.
```{r}
list0 <- list(depts = c("PublicPolicy", "SocialWelfare", "UrbanPlanning"), enrolled = c(60 80, 100))
```

For these next questions, use the list created in the below code.
```{r}
list1 <- list(names=c('John', 'Kate', 'Jessica', 'Carl'), grades=c(80,85,60,100), sex=c('Male', 'Female', 'Female', 'Male'))
```

Who has grades greater than 83?
```{r}
list1$names[list1$grades > 83]
```

Which people are female?
```{r}
list1$names[list1$sex == "Female"]
```

Create a 2x3 matrix, filling down columns:
```{r}
a <- matrix(1:6, nrow=2)
```

Fill across rows:
```{r}
b <- matrix(5:14, nrow=2, byrow=TRUE)
```

Row 2, column 3:
```{r}
a[2,3]
```

All rows, column 2:
```{r}
b[,2]
```

All columns, row 1:
```{r}
a[1,]
```

Other ways to create matrices:
```{r}
matrix(c(0,1,1,1,0,1,0,0,1), nrow=3)

matrix(0:1, nrow=10, ncol=10)
```

Create a matrix of 10 rows and 10 columns where each entry is the number 0 or 1 (multiple options).
```{r}
m1 <- matrix(rbinom(100, 1, 0.5), nrow = 10)
m <- matrix(0:1, nrow=10, ncol=10)
```

Show me the first row of the matrix.
```{r}
m1[1, ]
```

Show the first six rows of the matrix (two options).
```{r}
head(m1)
m1[1:6, ]
```

Show rows 4 through 8.
```{r}
m1[4:8, ]
```

Show rows 3, 5, 7, and 9.
```{r}
m1[c(3, 5, 7, 9), ]
```

Show the third column of the matrix.
```{r}
m1[, 3]
```

Show the third, fifth, and seventh columns of the matrix.
```{r}
m1[, c(3, 5, 7)]
```

Convert the grades list from above into a data frame (four options).
```{r}
d1 <- data.frame(list1)

as.data.frame(list1)

d1b <- data.frame(names=c('John', 'Kate', 'Jessica', 'Carl'), grades=c(80,85,60,100), sex=c('Male', 'Female', 'Female', 'Male'))

data.frame(list1)
```

Who has grades greater than 90?
```{r}
d1[d1$grades > 90, ]
```

Which people are male?
```{r}
d1[d1$sex == "Male", ]
```

What are the first six rows and nine columns in the SCAD data?
```{r}
SCADdata[1:6, 1:9]
```

What are the last five rows in the SCAD data (two options)?
```{r}
tail(SCADdata, 5)
SCADdata[(nrow(SCADdata)-5):nrow(SCADdata),]
```

Show rows 100 to 107 in the SCAD data.
```{r}
SCADdata[100:107, ]
```

What are the names of the columns in the SCAD data?
```{r}
names(SCADdata)
```

How many times does each country appear in the dataset using 'table' in the SCAD data?
```{r}
table(SCADdata$countryname)
```

How many events started in each year?
```{r}
table(SCADdata$styr)
table(SCADdata$eyr)
```

Write code to save only events that started in 2006 as a new object.
```{r}
new <- SCADdata[SCADdata$styr == 2006, ]
```

How many events started in 2006?
```{r}
nrow(new)
```

Make a table showing the number of events per country that started in 2006.
```{r}
table(new$countryname)
```

Drop all events where the number of participants is `-99`.
```{r}
new2 <- SCADdata[SCADdata$npart != -99, ]
```

Create a logical vector and numeric vector of equal length:
```{r}
mydata <- data.frame(diabetic = c(TRUE, FALSE, TRUE, FALSE),
                     height = c(65, 69, 71, 73), sex=c('Male', 'Female', 'Female', 'Male'), minutesExercise = c(20,30,22,80))
```

Identify row 3, column 2 height of third male:
```{r}
mydata[3,2]
```

Using column name, identify the first two rows:
```{r}
mydata[1:2, "height"]
```

List all rows of column "diabetic":
```{r}
mydata[,"diabetic"]
```

Subsetting creates a numeric vector:
```{r}
mydata$height[2:3]
```

Numeric vector:
```{r}
mydata[["height"]]
```

What are the names of my variables?
```{r}
colnames(mydata)
```

Assign column names:
```{r}
colnames(mydata) <- c("Diabetic", "Height", "Sex", "Minutes of Exercise per Week")

#Prof recs alphabetizing
```

To change the first variable's name, just use indexing:
```{r}
colnames(mydata)[1] <- "Diabetes"

#to verify:
colnames(mydata) == 'Diabetes'

#If you do not now the index but know the name, you can do that too:
colnames(mydata)[colnames(mydata) == 'Diabetes'] <- 'Diabetic'  
#^Where the column name is "Diabetes", change it to "Diabetic".
```

What are the dimensions of my data frame?
```{r}
dim(mydata)
```

What are the first five rows?
```{r}
head(mydata)
```

What are the last five rows?
```{r}
tail(mydata)
```

Get information about the dataframe, structure:
```{r}
str(mydata)
```

Get summary statistics:
```{r}
summary(mydata)

summary(mydata$height)
```

Get number of rows:
```{r}
nrow(mydata)

#Don't select on col numbers, they won't stay the same; use colnames(mydata)[colnames(mydata) == 'Diabetes'] (can also use grep)
```

Create a dataframe that is five rows tall and three columns wide.  The first column is people's names: Amanda, Brandon, Carl, David, and Eleanor. The second column is their birth year: 1981, 1928, 1997, 2002, 1989.  The third column is their height in centimeters: 200, 180, 176, 190, 210.

```{r}
mydata <- data.frame(diabetic = c(TRUE, FALSE, TRUE, FALSE), 
                     height = c(65, 69, 71, 73), sex=c('Male', 'Female', 'Female', 'Male'), 
                     minutesExercise = c(20,30,22,80))

d2 <- data.frame(names=c('Amanda', 'Brandon', 'Carl', 'David', 'Eleanor'), year=c(1981, 1928, 1997, 2002, 1989), height = c(200,180,176,190,210))
```

Select rows three through five of the dataframe.
```{r}
d2[3:5,]
```

Rename the name column to "First_Name".  Rename the birthyear column to "Year".
```{r}
colnames(d2)[colnames(d2)=='names'] <- 'First_Name'
colnames(d2)[colnames(d2)=='year'] <- 'Year'
```

Create a new dataframe of only the individuals who are at least 200 centimeters tall.
```{r}
dtall <- d2[d2$height > 200,]
```

Create a four row, four column matrix.  Multiple that matrix by 3.
```{r}
m <- matrix(1:3, nrow=4, ncol=4)
m2 <- matrix(1:16, nrow=4)

m2*3
```

From that matrix, take the sum of the first column.  Take the sum of the third row. 
```{r}
sum(m2[,1])
m2[,1]
temp <- m2[,1]
sum(temp)

sum(m2[3,])
```

How many columns?
ncol(data)

How many rows?
nrow(data)

What are the column names?
names(data)  

What are the row names?
rownames(data)

What are the first 6 rows?
head(data)

What are the last 6 rows?
tail(data)

Note that you can tell head() and tail() how many rows you want to see.  5 is the default.
head(data, 10)
head(data, 2)

tail(data, 8)
tail(data, 2)

If you want to know the set of values a variable takes, use unique()
unique(data$COUNTRY)
unique(data$BYEAR)

If you want to get a sense of how many observations there are for each value of a variable, use table()
table(data$COUNTRY)
table(data$BYEAR)

Explore how to access different data points, based on variable values.  We will then modify some of these variables and save the result to a new file.
```{r}
# Select by year.  
temp <- data[data$BYEAR == '2010',]  
# The events from 2010 are now in this dataframe
dim(temp)
nrow(temp) - nrow(data)

temp <- data[data$BYEAR == '2011',]  
# Now, the "temp" dataframe is for 2011.

temp <- data[data$BYEAR < 2010,]  
temp2 <- data[data$BYEAR <= 2010,]

nodeaths <- data[data$NDEATH == 0,]
january <- data[data$BMONTH == 1,]
notjanuary <- data[data$BMONTH != 1,]  # != means "not equal to".
notjanuary2 <- data[data$BMONTH > 1,]
```

Manipulate variables and then save the dataframe. Remember, each column is just a vector, so the vector operations we learned earlier are what we will do here.  Note that I do some transformations, like multiple `BDAY` by `BMONTH`, just to demonstrate capabilities.
```{r}
data$newnpart <- data$NPART*10
data$newnpart2 <- ifelse(data$NPART != 99, data$NPART, NA)
data$npart <- data$NPART/1000  
names(data)  # Notice that the above line created a new variable.

data$NPART <- data$NPART/1000  
# Creating a new variable with the same name as an existing one replaces that variable

data$deathRate <- data$NDEATH/data$NPART  # What does this do?  
summary(data$deathRate)  # It did nothing because NDEATH is weirdly constructed.  
# We will not discuss here how to fix it.

data$nonsense <- data$BDAY*data$BMONTH
data$COUNTRY[data$COUNTRY == 'Cote d\'Ivoire'] <- 'Ivory Coast'  
# Fails because R treats country name as a factor.  

```

Load the dataframe from earlier.  Add the argument `stringsAsFactors=FALSE'.
```{r}
data <- read.csv('~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events.csv', stringsAsFactors = FALSE)
data$COUNTRY[data$COUNTRY == 'Cote d\'Ivoire'] <- 'Ivory Coast'   
```

How could you verify that the change happened?  
```{r}
# Save the file. Make sure to change the file path for your computer.
write.csv(data, '~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events_v3.csv')

write.table(data, '~/Documents/UCLA/Data/PRIO_UrbanSocialDisorder/events_v3.txt', sep='\t') 
```

What are the first five rows? 
```{r}
head(mtcars, 5)
mtcars[1:5,]
```

What are the last five rows?
```{r}
tail(mtcars, 5)
```

What are rows 12 through 21?
```{r}
mtcars[12:21,]
```

What are the column names?
```{r}
names(mtcars)
sort(names(mtcars))
```

Using mtcars: create a new variable that is horsepower per cylinder.
```{r}
mtcars$hpc <- mtcars$hp/mtcars$cyl
```

Using mtcars: Weight is in thousands of pounds. Create a new variable that is weight in pounds. 
```{r}
mtcars$wt_new <- mtcars$wt*1000
```

Using mtcars: Create a new dataframe of cars that get at least 20 miles per gallon.  How many cars is that?
```{r}
new_mtcars <- mtcars[mtcars$mpg >= 20,]
```

Using mtcars: Create a new dataframe of cars that get fewer than 20 miles per gallon.
```{r}
new_mtcars2 <- mtcars[mtcars$mpg < 20,]
```













#### PLOTS (histograms, plots, save pdf)

Make a plot of the number of participants at each event using the SCAD data. Label the x and y axis.
```{r}
plot(SCADdata$npart[SCADdata$npart != -99], xlab = "Event Index", ylab = "Number of Participants")
```

Save that plot
```{r}
pdf("~/Desktop/UCLA/Courses/4_Fall 2019/PP 290/Assignments/assn1plot1.pdf")
plot(SCADdata$npart[SCADdata$npart != -99], xlab = "Event Index", ylab = "Number of Participants") dev.off()
```

Make a histogram of the number of participants at each event. Make sure to label the axes.
```{r}
hist(SCADdata$npart[SCADdata$npart != -99], xlab = "Number of Participants", main = "Histogram of Participants", ylab = "Probability", freq = FALSE)
```

Make a barplot of the number of participants per event.  Hint: you will need to use `table` as well.
```{r}
barplot(table(SCADdata$npart), xlab = "Number of Participants", ylab = "Frequency")
```

Load data, explore it.
```{r}
d <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbraw.csv", stringsAsFactors = FALSE)
d[1:10,] #

mean(d$read)
median(d$read)
var(d$read)
summary(d$read)

# table() produces counts
table(d$female)
table(d$ses)

# For proportions, use output of table() as input to prop.table()
prop.table(table(d$female))
prop.table(table(d$ses))

# You could do this using objects
t1 <- table(d$female)
prop.table(t1)

# Two-way tables
t1 <- table(d$female, d$ses)
t1
prop.table(t1)  # Percentages by cell
prop.table(t1, margin=2)  # Percentages by column
prop.table(t1, margin=1)  # Percentages by row
```

Now plot it:
```{r}
plot(d$read)  # Ew.  By default, R uses open circles.  
plot(d$read, type='l')  # The x-axis is just the row index.
plot(d$read, d$write, type='l')  # This looks weird because R is trying to draw lines 
# between the points in each row.
plot(d$read, d$write)  
plot(d$read, d$write, pch=20)  # I find the open circles unappealing.

# The `pch` argument lets you change the style.
plot(d$read, d$write, pch=16, xlab='Reading Score', ylab='Writing Score', main='Correlation of Reading and Writing')
plot(as.factor(d$female), d$write)  

hist(d$read, xlab='Reading Score', main='')  
barplot(table(d$female))  # Bar chart
barplot(table(data$BYEAR))
# Saving.  Do not forget to change the path to reflect your computer.
```

Using mcars: Make a table of the gear variable.
```{r}
table(mtcars$gear)
```

Using mcars: Make a proportion table of the gear variable.
```{r}
prop.table(table(mtcars$gear))
```

Using mcars: Make a two-way table of the number of cylinders and number of gears.
```{r}
table(mtcars$gear, mtcars$cyl)
```

Using mcars: Make a two-way proportion table of the number of cylinders and number of gears. Make one table where the cells add to 1, another where each column adds to 1, and another where each row does.
```{r}
prop.table(table(mtcars$gear, mtcars$cyl))
prop.table(table(mtcars$gear, mtcars$cyl), margin=2)
prop.table(table(mtcars$gear, mtcars$cyl), margin=1)
```

Using mcars: Plot the miles per gallon against the number of cylinders. 
```{r}
plot(mtcars$cyl, mtcars$mpg)
plot(as.factor(mtcars$cyl), mtcars$mpg) #box and whiskers
```

Using mcars: Make a barplot of the cars by the carb variable.
```{r}
barplot(table(mtcars$carb))
```

Using mcars: Make a histogram of miles per gallon.
```{r}
hist(mtcars$mpg)
```

#Non-Network Plots
```{r}
attach(mtcars)  # Use the mtcars dataset

names(mtcars)  # The variables
dim(mtcars) # Rows and columns
mtcars[1:7,] # The first seven rows

summary(mtcars) # Summary statistics of the variables

# We may want the full car name later
mtcars$car <- rownames(mtcars)  

# We may want the car brand
mtcars$brand <- sapply(mtcars$car, function(x) strsplit(x, split=' ')[[1]][1]) 

#What does the last line of the above code chunk do?  It takes the `mtcars$car` vector and splits each observation using `strsplit`. `strsplit` returns a list of lists, where each entry is the number of words in the car variable.  `[[1]]` accesses the list, and `[1]` returns the first value in that list.  The first value is always the car brand, which I noticed just from visually inspecting the data.  `sapply` then performs that operation on each element of `mtcars$car`; it has the functionality of a loop, in one line.
```

Base Plotting
```{r}
plot(mtcars$hp, mtcars$mpg, pch=20, xlab='Horsepower', ylab='Miles per Gallon')

plot(mtcars$hp, mtcars$qsec, pch=16, xlab='Horsepower', ylab='Quarter Mile Speed')

# Remember, we can save a plot in the following way.
pdf('Week5_horsepowerMPG.pdf')
plot(mtcars$disp, mtcars$mpg, pch=10, xlab='Engine Size', ylab='Miles per Gallon')
dev.off()
```

Base Plotting, with Multiple Variables: As we learned in lecture, a two-dimensional plot is often too simplistic for the relationships we want to inspect.  We will now learn how to change the size, shape, and color of points.
```{r}
# Change the size with the cex argument
plot(mtcars$hp, mtcars$mpg, pch=20, xlab='Horsepower', ylab='Miles per Gallon', cex=mtcars$disp)

#Why does that code create a black plot?  Because `cex` multiplies the default size of the point by the argument passed to it.  (`cex` stands for ``character expansion'')  Since `mtcars$disp has very large values, `R` is showing us very large dots; they are so large, in fact, that no white space remains.  

#The code below fixes that issue.  How to size points is more art than size.  Because you do not want the `cex` numbers to be too large, it is common to take the square root or logarithm.  Here, square roots will still create dots that are too large, so I divide by 100 as well as take the logarithm.
```

Change the size with the cex argument
```{r}
plot(mtcars$hp, mtcars$mpg, pch=20, xlab='Horsepower', ylab='Miles per Gallon', cex=mtcars$disp/100)
legend('topright', legend='Sized by displacement')
```

Change the size with the cex argument
```{r}
plot(mtcars$hp, mtcars$mpg, pch=20, xlab='Horsepower', ylab='Miles per Gallon', cex=log(mtcars$disp, base=10))
legend('topright', legend='Sized by displacement')
```

**Why would a square root result in dots that are too large?** mtcars is very large! Because you do not want the `cex` numbers to be too large, it is common to take the square root or logarithm.  Here, square roots will still create dots that are too large, so I divide by 100 as well as take the logarithm.

**What does sizing the dots by displacement reveal about the relationship of displacement with horsepower?  What about with miles per gallon?** decreased horsepower may be associated with increased miles per gallon.

Now, let us change the shape of points.  The key is to use the `pch` argument, which stands for "point character".  If you look at the help documentation - `?par`, then search for `pch` - you will see the various shapes base R has.  The code below first shows you how to change the shape of each entry, and it then shows you how to create a variable enumerating the shape by some other variable.

Change the shape with the pch argument
```{r}
plot(mtcars$hp, mtcars$mpg, xlab='Horsepower', ylab='Miles per Gallon', pch=c(1,2))
```

**Why are some dots circles but others triangles?** the 1 and 2 in the pch line

Change the shape with the pch argument
```{r}
plot(mtcars$hp, mtcars$mpg, xlab='Horsepower', ylab='Miles per Gallon', pch=mtcars$am)
```

This variable is convenient because it is already coded as a number
```{r}
summary(mtcars$am)  
mtcars$am
```

Change shape to be Mazda or not Mazda
```{r}
mtcars$mazda <- ifelse(mtcars$brand == 'Mazda', 5, 18)
plot(mtcars$hp, mtcars$mpg, xlab='Horsepower', ylab='Miles per Gallon', pch=mtcars$mazda)
```

Now, let us color the points.  The approach is the same as we have seen - pass R a vector of colors and R will make the points those colors.  Notice that I have double the size of each point with the `cex` argument; I do that simply to make the chart more legible.

Change the color with the col argument
```{r}
plot(mtcars$hp, mtcars$mpg, xlab='Horsepower', ylab='Miles per Gallon', pch=20, col='blue', cex=2)

## Change the color with the col argument
plot(mtcars$hp, mtcars$mpg, xlab='Horsepower', ylab='Miles per Gallon', pch=20, col=c('blue', 'purple'), cex=2)

## Make the color argument make sense
colors <- ifelse(mtcars$wt<3, 'green', 'red')
plot(mtcars$hp, mtcars$mpg, xlab='Horsepower', ylab='Miles per Gallon', pch=20, col=colors, cex=2)
legend('topright', legend='Green: Weight < 3,000 lbs')
```

**What do you notice about the relationship between horsepower, miles per gallon, and the weight of a car?**

Plot quarter mile second time against miles per gallon.  Change the size based on the car's weight.
```{r}
plot(mtcars$qsec, mtcars$mpg, pch=15, cex=mtcars$wt)
plot(mtcars$qsec, mtcars$mpg, pch=15, cex=mtcars$wt/2)  #/2 to make size a bit smaller
```

Plot quarter mile second time against miles per gallon.  Change the shape based on whether car has an automatic transmission.
```{r}
shape <- ifelse(mtcars$am == 1, 20, 5)
plot(mtcars$qsec, mtcars$mpg, pch=shape)
```

Plot quarter mile second time against miles per gallon.  Change the color based on whether the car has more than 4 cylinders.
```{r}
newcol <- ifelse(mtcars$cyl > 4, 'blue', 'orange')
plot(mtcars$qsec, mtcars$mpg, col=newcol)
```

Make one plot that combines the three plots above.  That is, change the size, shape, and color of each point based on the car's weight, transmission, and number of cylinders, respectively.
```{r}
plot(mtcars$qsec, mtcars$mpg, col=newcol, pch=shape, cex=mtcars$wt/2)
```

Modiying the size, shape, and color of nodes is identical.  First, let us explore the karate club dataset.
```{r}
install.packages('igraphdata')
library(igraphdata)  

data(karate)  # Let us use the karate club data.
vcount(karate)
ecount(karate)
head(degree(karate))

#Now, let us modify the size.  We do that by making node attributes and calling those attributes when plotting.  Separately, notice how the layout of each graph changes, which emphasizes the arbitrariness of (x,y) coordinates for network graphs.

set.seed(01082017)

size <- 1:vcount(karate)
V(karate)$size <- sqrt(1:vcount(karate))
V(karate)$size

plot(karate, vertex.size=V(karate)$size)
plot(karate, size=size)
# The labels get in the way for now
plot(karate, vertex.size=V(karate)$size, vertex.label="") 

# The size vector can be separate from node attributers
size <- 1:vcount(karate)
plot(karate, vertex.size=size, vertex.label="")

# It is common to size by degree
plot(karate, vertex.size=degree(karate), vertex.label="")

#We will now modify the shape and color of nodes.  See <http://igraph.org/r/doc/plot.common.html>[link](this documentation) for valid shapes.  I will first show how to arbitrarily change the shape, then I will show how to have the shape reflect a variable you may care about.  I will randomly assign gender to each node and change shape by gender.

V(karate)$gender <- c('male','female')
shape <- ifelse(V(karate)$gender == 'male', 'square', 'vrectangle')
pdf('test.pdf')
plot.igraph(karate, vertex.shape=shape, vertex.label="")

# I have changed shape for the fun of it.
plot(karate, vertex.shape='square', vertex.label="", vertex.size=20)  


You may have noticed that the nodes already have colors.  Let us figure out why that is.
# So, there is already a color attribute. plot will use that by default
get.vertex.attribute(karate)  

# Let us change the color.
plot(karate, vertex.color='black', vertex.label="", vertex.size=10)  

genderColor <- ifelse(V(karate)$gender=='male', 'pink', 'blue')
plot(karate, vertex.color=genderColor, vertex.label="", vertex.size=10)

# The experience with `color` shows a shortcut that can sometimes be confusing.  If you have node attributes that are called `label`, `color`, or `size`, `plot` will use those unless overridden.

# Plot the karate network with orange rectangles.
plot(karate, vertex.color='orange', vertex.size=20, vertex.shape='crectangle')
plot(karate, vertex.color='orange', vertex.size=20, vertex.shape='crectangle', vertex.label="")
plot(karate, vertex.color='orange', vertex.size=20, vertex.shape='crectangle', vertex.label="arbitrary")

# Plot the karate network.  Size the nodes by their degree.  Color the nodes by their faction.  Give "Mr Hi" and "John A" different shapes from everyone else; there will be three shapes.
newshape <- ifelse(karate$name == 'Mr Hi', 'circle', ifelse(karate$name == "John A", 'crectangle', 'pie'))  # Why did that not work?
newshape <- ifelse(V(karate)$name == 'Mr Hi', 'circle', ifelse(V(karate)$name == "John A", 'crectangle', 'pie'))  # Called vertex attribute incorrectly

# Below is a second way
newshape2 <- rep('pie', vcount(karate))
newshape2[V(karate)$name == 'Mr Hi'] <- 'circle'
newshape2[V(karate)$name == 'John A'] <- 'crectangle'

plot(karate, vertex.size=degree(karate), vertex.color=karate$faction, vertex.shape=newshape)

pdf('test2.pdf')
plot(karate, vertex.size=degree(karate), vertex.color=karate$faction, vertex.shape=newshape, vertex.label="")
dev.off()
```














# NETWORK LAYOUT


Finally, let us look at how our networks look different depending on the layout we use.  Frankly, `igraph` is very frustrating here, as the arguments to layout do not possess the same structure.  If you want to try a different layout, I suggest googling. 
```{r}
plot(karate, layout=layout_randomly, vertex.label="", main="Random", vertex.color='black', vertex.size=10)

plot(karate, layout=layout_in_circle, vertex.label="", main="Ring", vertex.color='black', vertex.size=10)

plot(karate, layout=layout_as_tree, vertex.label="", main="Tree", vertex.color='black', vertex.size=10)

plot(karate, layout=layout_as_tree(karate, circular=TRUE), vertex.label="", main="Circular Tree", vertex.color='black', vertex.size=10)

plot(karate, layout=layout.fruchterman.reingold(karate), vertex.label="", main="Fruchterman-Reingold", vertex.color='black', vertex.size=10)
```


Below is another way to generate coordinates.  You generate the coordinates before calling plot, save that as an object, then pass that object to plot.  This approach is what you would do if you want to manually change coordinates that `igraph` generates.  You can also use this approach to generate your own layout.
```{r}
coordinates <- layout.random(karate)
plot(karate, layout=coordinates, vertex.label="", main="Random", vertex.color='black', vertex.size=10)

coordinates <- layout.circle(karate)
plot(karate, layout=coordinates, vertex.label="", main="Ring", vertex.color='black', vertex.size=10)

## Let us figure out how to manually change coordinates.  
coordinates  #Ok, each row is (x,y) coordinates

# Compress the tops and bottoms
coordinates[,2][coordinates[,2] > .8] <- .8
coordinates[,2][coordinates[,2] < -.8] <- -.8

plot(karate, layout=coordinates, vertex.label="", main="Adjusted Ring", vertex.color='black', vertex.size=10)

# Let's pull out the two nodes with names
get.vertex.attribute(karate)
coordinates <- layout.circle(karate)

coordinates[grep('Mr Hi', get.vertex.attribute(karate)$name),] <- c(1,1)
coordinates[grep('John A', get.vertex.attribute(karate)$name),] <- c(-1,-1)

plot(karate, layout=coordinates, vertex.label="", main="Adjusted Ring", vertex.color='black', vertex.size=10)

# Make your own coordinates
x <- seq(-1, 1, length=vcount(karate))
y <- seq(1, -1, length=vcount(karate))

coordinates <- matrix(c(x,y),ncol=2)
                      
plot(karate, layout=coordinates, main="User Supplied Coordinates")


#The following questions use the `UKfaculty` data.  Go ahead and make it an undirected graph first, using `as.undirected` with `mode=c('collapse')`.


# Load the UKfaculty data and convert it to an undirected graph.
data(UKfaculty)
g <- as.undirected(UKfaculty, mode="collapse")

# What are the vertex attributes?
get.vertex.attribute(g)

# Plot the graph using a random layout.
plot(g, layout=layout_randomly)

# Plot the graph using a circle layout.
plot(g, layout=layout_in_circle)

# Plot the graph using a tree layout.
plot(g, layout=layout_as_tree)

# Plot the graph using a Fruchterman-Reingold layout.
plot(g, layout=layout_randomly)

# Color the nodes according to their group.  Change the node size based on the node degree.  Randomly determine if a node is a man or woman and change its shape accordingly.  Use the Fruchterman-Reingold layout. Save the file as a .pdf.

V(g)$gender <- c('male', 'female')
V(g)$shape <- ifelse(V(g)$gender == 'male', 'pie', 'crectangle')

pdf('finalplot.pdf')
plot(g, vertex.color=V(g)$Group, vertex.size=degree(g), layout=layout.fruchterman.reingold(g), vertex.label="")  
# shape is plotted because igraph recognizes the shape attribute
dev.off()
```
















#### CENTRALITY

Degree Centrality: Let us use a directed dataset, `UKfaculty` to explore different measures of degree centrality.
```{r}
library(igraphdata)
library(igraph)

# Load data
data(UKfaculty)

# Explore node attributes
get.vertex.attribute(UKfaculty)

# Make vectors of degrees
indegree <- degree(UKfaculty, mode=c('in'))
outdegree <- degree(UKfaculty, mode=c('out'))
degreeC <- indegree+outdegree
degreeCentrality <- degree(UKfaculty, mode=c('total'))
dC2 <- degree(UKfaculty, mode=c('all'))

# The argument total, all is the same
degreeCentrality == dC2
```

Now, let us size nodes by their centrality:
```{r}
set.seed(01082018)

# Saving the layout as an object ensures that the graphs will have the nodes in the same place.
# With a force-directed layout, you could get very different results if the measures of centrality differ noticably.  
l <- layout.fruchterman.reingold(UKfaculty)
plot(UKfaculty, vertex.size=indegree, main="Size by Indegree", layout=l)
plot(UKfaculty, vertex.size=outdegree, main="Size by Outdegree", layout=l)
plot(UKfaculty, vertex.size=degreeCentrality, main="Size by Degree", layout=l)

#You may have noticed that there are arrowheads on the edges now.  That is because the graph is directed, so the arrow points in the direction of the relationship.  We can change arrow attributes or not print the arrowheads.  For more detail, use `?plot.igraph`.  I will also show how to change the edge color or type.  Remember, you can vary all of these parameters by node by creating a vector with different values for edge type or color; I will show only how to make them all the same type or color.

# No arrow
plot(UKfaculty, vertex.size=indegree, main="No Arrow", edge.arrow.width=0)

# Small arrow
plot(UKfaculty, vertex.size=indegree, main="Small Arrow", layout=l, edge.arrow.width=.25)

# Dotted line
plot(UKfaculty, vertex.size=indegree, main="Dotted Edges", layout=l, edge.arrow.width=0, edge.lty=3)

# Colored edges
plot(UKfaculty, vertex.size=indegree, main="Colored Edges", layout=l, edge.arrow.width=0, edge.color='red')
```

It is often difficult to see how different centrality measures covary, since it is unclear what the exact number for each size is.  It is therefore a good idea to visualize the network in other ways.
```{r}
set.seed(01082018)
plot(indegree, outdegree, xlab='Indegree', ylab='Outdegree', pch=20)
# Let us replace the dots with vertex labels, making it easier to see who is which point.
# We do not have true vertex labels, so I have just used their names.  The key is the vector passed to the labels argument of text().
plot(indegree, outdegree, xlab='Indegree', ylab='Outdegree', pch='')
text(x=indegree, y=outdegree, labels=V(UKfaculty))
```

We will use the `USairports` data dataset, which represents flights in December 2010.  Each edge is an airline route between two airports that uses the same aircraft type; there can be multiple edges between each node.  You will notice that the edges have attributes as well.
```{r}
# Load the USairports data
data(USairports)

# How many airports are in this data?
vcount(USairports)

# How many aircraft-type-routes are in this data?
ecount(USairports)

# Make a network plot using a ring layout.
r <- layout_in_circle(USairports)
plot(USairports, layout=r)

# Make a network plot using a Fruchterman-Reingold layout, sized by the total degree of each node.  Do not label nodes.
f <- layout.fruchterman.reingold(USairports)
plot(USairports, layout=f)

# Plot the correlation between indegree and outdegree for each network.  
indegree <- degree(USairports, mode=c('in'))
outdegree <- degree(USairports, mode=c('out'))

plot(indegree, outdegree, pch=20, xlab='Indegree', ylab='Outdegree')

# Repeat the previous step, labeling each point with its vertex name.  You may need to make the size of the text smaller using the cex argument in text().
plot(indegree, outdegree, pch='', xlab='Indegree', ylab='Outdegree')
text(x=indegree, y=outdegree, label=V(USairports)$name, cex=.5)

# Let's add some jittering to make easier to read
i2 <- indegree+rnorm(n=length(indegree), mean=0, sd=20)
o2 <- outdegree + rnorm(n=length(outdegree), mean=0, sd=20)

plot(i2, o2, pch='', xlab='Indegree', ylab='Outdegree')
text(x=i2, y=o2, label=V(USairports)$name, cex=.5)
```

# Neighbor Cumulative Degree Centrality
In this section, we will calculate neighbor cumulative degree centrality.  There is no default function in `igraph` to do this, so we will have to do it manually.  We will make a directed network undirected and then loop over every node.
```{r}
# Make the graph undirected
g <- as.undirected(UKfaculty, mode="collapse")

# For each node, get its neighbors.
# For each neighbor, get is degree.
# Sum the degree of each neighbor.
# Set that as the vertex attribute.
V(g)$ncdc <- NA
for(i in 1:vcount(g)){
  total <- 0
  n <- neighbors(g, V(g)[i])
  for(j in 1:length(n)){
    d <- degree(g, v=n[j])
    total <- total+d
  }
  V(g)$ncdc[i] <- total
}

## In fact, there is an easier way.
# Get adjacency matrix
a <- matrix(get.adjacency(g), nrow=vcount(g))
# Get sum of neighbors' degree. %*% is matrix multiplication.
a2 <- a%*%matrix(degree(g), nrow=vcount(g))
# NCDC is the column sum of that matrix
V(g)$ncdc2 <- a2

#Confirm
V(g)$ncdc == a2

V(g)$ncdc == V(g)$ncdc2
```

**Where do I create the vertex attribute for the NCDC measure?** For each node, we have to get its neighbors and for each neighbor, get is degree. We then sum the degree of each neighbor and make that the vertex attribute so we can easily call on it for the NCDC measure.

**Why do I reset `total <- 0` for each new loop?** We have to make the directed network undirected and then loop over every node.

Now, let us make plots. 
```{r}
l <- layout.fruchterman.reingold(g)

# Below graph will be ugly.
plot(g, vertex.size=V(g)$ncdc, main="Size by NCDC", layout=l)

# Below graph will be better.
plot(g, vertex.size=V(g)$ncdc/50, main="Size by NCDC/50", layout=l)

# Plot NCDC against Degree
# Notice how to label points
plot(degreeCentrality, V(g)$ncdc, xlab='Degree', ylab='NCDC', pch='')
text(x=degreeCentrality, y=V(g)$ncdc, labels=V(g))

# Plot rank ordering.
rdC <- rank(degreeCentrality)
rNCDC <- rank(V(g)$ncdc)

plot(rdC, rNCDC, xlab='Degree Rank', ylab='NCDC Rank', pch='')
text(x=rdC, y=rNCDC, labels=V(g))
```

Load the karate club data:
```{r}
data(karate)
k <- as.undirected(karate, mode='collapse')
  
# Calculate NCDC for each node.
a <- matrix(get.adjacency(k), nrow=vcount(k))
degrees <- matrix(degree(k), nrow=vcount(k))

a2 <- a%*%degrees

V(k)$ncdc <- a2

# Same as above, with a loop
V(karate)$ncdc2 <- NA
for(i in 1:vcount(k)){
  total <- 0
  n <- neighbors(k, V(k)[i])
  for(j in 1:length(n)){
    d <- degree(k, v=n[j])
    total <- total+d
  }
  V(k)$ncdc2[i] <- total
}

# Confirm
V(k)$ncdc2 == V(k)$ncdc

# Make a network plot using a ring layout where each node is sized by NCDC.
r <- layout_in_circle(k)
plot(k, layout=r, vertex.size=V(k)$ncdc/5)

# Plot the correlation of NCDC and degree centrality.
plot(degree(k), V(k)$ncdc, pch='', xlab='Degree', ylab='NCDC')
text(degree(k), V(k)$ncdc, label=V(k)$name)
```


# All Steps Centrality
In this section, we will examine the betweenness and eigenvector centrality for the karate club network.  See page 95 of Luke 2015 for a list of the `igraph` centrality commands.
```{r}
data(karate)

# Get betweenness centrality
V(karate)$bc <- betweenness(karate, directed=FALSE)

# Get betweenness centrality, normalized
V(karate)$bc_norm <- betweenness(karate, directed=FALSE, normalized=TRUE)

# Get eigenvector centrality
V(karate)$ec <- evcent(karate)

# Hm, it fails.  Why?
evcent(karate)

# Ok, it returns a list, and one of the list entries is the eigenvector centrality.
V(karate)$ec <- evcent(karate)$vector

# Correlation
cor(V(karate)$bc, V(karate)$ec)

## Why does that fail? Let's inspect.
# Looks good
V(karate)$bc
# Still saved as a list.
V(karate)$ec

# Fix.  Have to remove the attribute so that R does not preserve the list structure.  That is, earlier I had not removed the list, so R just added an entry to that list.
karate <- remove.vertex.attribute(karate, 'ec')

#unlist() will flatten the list you give it.
V(karate)$ec <- unlist(evcent(karate)$vector)

# Correlation
cor(V(karate)$bc, V(karate)$ec)

# Correlation of multiple variables
cor(data.frame(V(karate)$bc, V(karate)$ec, degree(karate)))
```

Now, we will make plots:
```{r}
l <- layout.fruchterman.reingold(karate)

# Betweenness.  Nodes will be too big
plot(karate, vertex.size=V(karate)$bc, main="Size by Betweenness", layout=l)

# Betweenness.  Downsize nodes
plot(karate, vertex.size=V(karate)$bc/10, main="Size by Betweenness, Rescale", layout=l)

# Betweenness, normalized.
# Add one so that they are not too small.
plot(karate, vertex.size=V(karate)$bc_norm + 1, main="Size by Normalized Betweenness", layout=l)

# Eigenvector
plot(karate, vertex.size=V(karate)$ec*10, main="Size by Eigenvector*10", layout=l)

# Plot Betweenness Versus Eigenvector
# I change the xlimit so that the vertex labels are not cut off.
plot(V(karate)$bc, V(karate)$ec, xlab='Betweenness Centrality', ylab='Eigenvector Centrality', pch='', xlim=c(-20, max(V(karate)$bc)*1.1))
text(x=V(karate)$bc, y=V(karate)$ec, labels=V(karate)$name)

# What happens if I do not change the x limit?  We see the labels get cut off.
plot(V(karate)$bc, V(karate)$ec, xlab='Betweenness Centrality', ylab='Eigenvector Centrality', pch='')
text(x=V(karate)$bc, y=V(karate)$ec, labels=V(karate)$name)

# We can color the text labels
colors <- ifelse(V(karate)$color==1, 'red', 'black')

# Same plot as above, but let us also color by group
plot(V(karate)$bc, V(karate)$ec, xlab='Betweenness Centrality', ylab='Eigenvector Centrality', pch='', xlim=c(-20, max(V(karate)$bc)*1.1))
text(x=V(karate)$bc, y=V(karate)$ec, labels=V(karate)$name, col=colors)

```


**Which nodes have high betweenness and eigenvector centrality?**

**Which nodes have low betweenness but high eigenvector centrality?**

**Which nodes have high betweeness but low eigenvector ventrality?**


The following questions use the `UKfaculty` data.  Pretend like you have not loaded it already.  Note that you could make the graph undirected or you can tell each function to assume the graph is not directed:
```{r}
## Load the UKfaculty data
data(UKfaculty)
g <- as.undirected(UKfaculty, mode=c('collapse'))

## Calculate the betweenness centrality of each node.
V(g)$bc <- betweenness(g, directed=FALSE)
V(g)$bc_norm <- betweenness(g, directed=FALSE, normalized=TRUE)

## Calculate the eigenvector centrality of each node.
V(g)$ec <- evcent(g)$vector

## Draw a network graph, sizing each node by betweenness centrality.
f <- layout.fruchterman.reingold(g)
plot(g, layout=f, vertex.size=V(g)$bc/25)

## Draw a network graph, sizing each node by eigenvector centrality.
plot(g, layout=f, vertex.size=V(g)$ec*50)

## Make a scatterplot of eigenvector and betweenness centrality.  Label the axes.  Do not label the points.
plot(V(g)$bc, V(g)$ec, xlab='Betweenness Centraliy', ylab='Eigenvector Centrality')

## Remake the previous scatterplot, labeling the points.  Split them into 2 groups and color them by group.
colors <- ifelse(V(g)$Group <= 2, 'blue', 'orange')
plot(V(g)$bc, V(g)$ec, xlab='Betweenness Centraliy', ylab='Eigenvector Centrality', pch='')
text(V(g)$bc, V(g)$ec, label=V(g)$Group, col=colors)
```















#### LOOPS

Write a loop that prints the phrase "Hello World" ten times.
```{r}
for (index in 1:10) { 
  print("Hello World")
}
```

Write a loop that prints the number of the loop multiplied by 10. For example, the first iteration prints `10`, the second `20`, and so on.  Do this procedure for 20 iterations.
```{r}
for (thisiteration in 1:20) {
print(thisiteration * 10) 
  }
```

```{r}
for(i in 1:10){  # For each of ten iterations
  print('Hello world')  # print the phrase, "Hello world"
  print(2*i)
} 

for(i in 1:12){
  print('hello world')
  print(2*i)
}
```

How would you change the above loop to print a statement 20 times?
```{r}
for(i in 1:20){  # For each of twenty iterations
  print('Hello world')  # print the phrase, "Hello world"
  print(2*i)
} 
```

Vectors are faster loops. If you are going to create a vector during a loop, it needs to be defined before the loop.
```{r}
one <- c(10,20,20,18, 20, 30)
two <- c(1,5,3,9, 50, 10)
result <- NULL 

for(i in 1:length(two)){  # Notice how I dynamically changed
  print(result[i] <- one[i]*two[i])
}

asvectors <- one*two

result == asvectors
```

i is used to index place by convention, but you can use anything:
```{r}
for(z in 1:5){
  print(z)
}

for(waytoolongindex in 1:5){
  print(waytoolongindex)
}
```

You could even use it to make a lot of graphs. Do not forget to change the filepath to be your computer.
```{r}
d <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbraw.csv", stringsAsFactors = FALSE)

ycolumns <- c('write', 'math', 'science', 'socst')


for(i in 1:length(ycolumns)){
  filename <- paste0('~/Desktop/loopplotexample_', ycolumns[i],'.pdf')
  pdf(filename)
  plot(d$read, d[[ycolumns[i]]], pch=20, xlab='Reading Score', ylab=ycolumns[i])
  dev.off()
}

q()
```

It is hard to keep each value straight, however.  And if we want to return to these results later, we need to rerun the loop.  Doing so is annoying and will create slightly different answers because of the random number generator.  Let us learn how to save the data as they are created.

In the loop, I show two ways to create the data.  The commented one, with `#` in front of it, does not run but may be easier to read.  To use it, remove the `#` from each line and put `#` in front of lines that are currently executed.  To add or remove comments to multiple lines, highlight the ones you want to change and type `cmd`+`c`; on a Windows machine, you probably will replace `cmd` with the flag key.
```{r}
randomness <- c(.01, .1, .3, .5, .7, .9, 1)
iterations <- length(randomness)  # Why did I define iterations this way?
howmanyrows <- 100

remember <- data.frame(iteration=NULL, tieProbability=NULL, avgDegree=NULL, globalClustering=NULL, glbalClustering2=NULL, density=NULL)

for(i in 1:iterations){
  temp <- erdos.renyi.game(howmanyrows, p=randomness[i])
  
  iteration <- i
  tieProbability <- randomness[i]
  avgDegree <- mean(degree(temp))
  globalClustering <- transitivity(temp, type='global')
  globalClustering2 <- mean(transitivity(temp, type='local'))
  density <- graph.density(temp)
  # 
  # d <- data.frame(iteration=i, tieProbability=randomness[i], avgDegree=mean(degree(temp)), globalClustering=transitivity(temp, type='global'), globalClustering2=mean(transitivity(temp, type='local')), density=graph.density(temp))
  
  d <- data.frame(iteration=iteration, tieProbability=tieProbability, avgDegree=avgDegree, globalClustering=globalClustering, globalClustering2=globalClustering2, density=density)
  remember <- rbind(remember, d)  # "rowbind", adds a new row, d, to whatever exists in remember.
  #print(remember)
}

remember # Cool.

```

Use a loop and igraph to create random networks.  Vary the probability of forming an edge 10 times.  Put some resulting statistics in a dataframe. 
```{r}
remember2 <- data.frame(iteration=NULL, tieProbability=NULL, avgDegree=NULL, globalClustering=NULL, glbalClustering2=NULL, density=NULL)

theps <- c(.01, .1, .2, .3, .4, .5, .6, .7, .8, .9)
iterations <- length(theps)

for(i in 1:iterations){
  temp <- erdos.renyi.game(howmanyrows, p=randomness[i])
  
  # iteration <- i
  # tieProbability <- randomness[i]
  # avgDegree <- mean(degree(temp))
  # globalClustering <- transitivity(temp, type='global')
  # globalClustering2 <- mean(transitivity(temp, type='local'))
  # density <- graph.density(temp)
  # 
  d <- data.frame(iteration=i, tieProbability=randomness[i], avgDegree=mean(degree(temp)), globalClustering=transitivity(temp, type='global'), globalClustering2=mean(transitivity(temp, type='local')), density=graph.density(temp))
  
  # d <- data.frame(iteration=iteration, tieProbability=tieProbability, avgDegree=avgDegree, globalClustering=globalClustering, globalClustering2=globalClustering2, density=density)
  remember2 <- rbind(remember2, d)  # "rowbind", adds a new row, d, to whatever exists in remember.
  #print(remember)
}

remember2

nrow(remember2)
nrow(remember)
```

Vary parameters using a loop:
```{r}
randomness <- c(.01, .1, .3, .5, .7, .9, 1)
iterations <- length(randomness)  # Why did I define iterations this way?
howmanyrows <- 100

# Will not show anything
for(i in 1:iterations){
  temp <- erdos.renyi.game(howmanyrows, p=randomness[i])
  summary(degree(temp))
  transitivity(temp, type='global')
  mean(transitivity(temp, type='local'))
  graph.density(temp)
}

# Need to print
for(i in 1:iterations){
  temp <- erdos.renyi.game(howmanyrows, p=randomness[i])
  print(paste0('On trial ', i))  # This line lets me know which iteration I am on
  print('Summary Statistics:')
  print(summary(degree(temp)))
  print(paste0('Global clustering is: ', transitivity(temp, type='global')))
  print('Average of the each node\'s clustering is: ')  # Maybe you find it easier to read the output this way
  print(mean(transitivity(temp, type='local'), na.rm=TRUE))
  print(paste0('The graph\'s density is: ', graph.density(temp)))
}
```













#### KEYWORDS


1. **Script** - The text document that contains the commands.  It can be in any format, though usually you will see .txt or .R.

2. **Console** - Where you see the code run when it is executed.  In R Studio, the console is the top right window by default.

3. **Vector** - The basic building block of R, a vector is a sequence of items that you will manipulate.  A vector can contain as few as one item

4. **Object** - R is what is known as an "object-oriented language".  This means there are abstract entitites - the objects - that contain the data on which you work.  An entity could be a dataset, a list of numbers, a database, or anything else.

5. **Dataframe** - The primary type of object used to represent data in R.  Think of a dataframe as a spreadsheet: it has rows and columns, where each row represents a unit of analysis with records for each of the columns (the measured variables).  A dataframe is a collection of vectors. A dataframe is a matrix whose vectors can be different data types, such as strings and numbers. 

6. **String** - If something is a "string", that means it is a word.  A string is in contrast to numeric data.

7. **Package** - A set of code that extends the functionality of base R.  Each package will have a different purpose, and the clarity of the documentation, as well as functionality of the code, is at the whim of the package's creator.  All packages are free to use.

8. **Debug**, **Refactor** - These two words are fancy ways of saying you are finding and correcting errors in your code.

9. **List** - A list is a collection of vectors.  Whereas a vector must contain data of the same type, a list can contain vectors of different types (each vector still must only contain one data type).

10. **Matrix** - A matrix is a series of vectors of numeric data.  A matrix has to contain numeric data.  They are important because they can represent relationships in a network, so constructing and manipulating them is important for network analysis.Since we now have an object with two dimensions (rows and columns), we access them differently than we did vectors. A whole row is `object[rownumber,]`, a whole column is `object[,columnumber]`, and a specific cell is `object[row, column]`.

11. **tidyverse** - is actually a series of packages created by Hadley Wickham, the two most common of which are `dplyr` for data merging and aggregation and `ggplot` for generating graphics.  To keep from overloading you with information, I will not actually use the `tidyverse` in this class, but it is quite common in the R world.  `igraph` is the network package that we will use throughout the quarter.

In addition to using code, you can also use a graphical interface to install packages.  In base R, click `Packages & Data`, then `Package Installer`.  You can then search for the package by name.  In R Studio, navigate to `Tools`, then `Install Packages`, and search for the package by name.  Below's code shows how to do this process in your script.[^1]

[^1]: If you look at the R Markdown file, you will notice the option "eval=FALSE", whereas the other code chunks use "eval=TRUE".  `eval` tells R Markdown whether or not to execute the code in that chunk.  Since I do not want to install these packages every time I compile the document, I set it to `FALSE` here.

12. **CSV** - The main kind of data you will read is `.csv`, which stands for comma-separated values.  It is a text file where each column is separated by a comma.  Make sure you change the file path to be to where your data are.  Note that read `read.` line works; the point is that `read.csv` is a special case of `read.delim`.  This distinction matters because you could have data that are separated by a feature that is not a comma, like a tab or |.

13. **Loops** - You use loops when you want to repeat a piece of code multiple times.  For example, later in the course we will run network simulations where we want to see how an outcome diffuses over time; each unit of time will correspond to one iteration of the loop.  Or you may have a list of files to read, so you will loop over that list in order to load them one at a time.  While it is best to use vector operations whenever possible, loops can be more intuitive.  

