Description

In this assignment, you will be performing an analysis of a social network of bottle nose dolphins. The nodes in the network each represent a member of a bottle nose dolphin community living off Doubtful Sounds in New Zealand. An edge exists between two nodes if there is a frequent association between the dolphins represented by those nodes. The observations were gathered between 1994 and 2001.

Load Packages

Begin by loading the igraph, dplyr, and RColorBrewer packages.

rr require(igraph) require(dplyr) require(RColorBrewer)

Load Contents

The file nodes.txt contains the names of the dolphins, as well as a numerical code that has been assigned to each dolphin. The file edges.txt contains a list of edges.

Run the code below to read in and prepare the data for our analysis.

Perform the following steps in the cell below:

  1. Use the graph_from_data_frame() function to create an undirected graph from the edges.
  2. For the sake of reproducability, set the seed to 1 using set.seed().
  3. Plot the graph without vertex labels. Select an appropriate vertex size for your graph.

rr g <- graph_from_data_frame(edges,directed=FALSE) set.seed(1) plot(g, vertex.size = 4, vertex.label = NA)

Calculate Centrality Measures

Perform the following steps in the cell below:

  1. Calculate the degree centrality of the nodes in the graph. Store the results.
  2. Calculate the betweenness centrality of the nodes in the graph. Store the results.
  3. Calculate the closeness centrality of the nodes in the graph. Store the results.
  4. Create a data frame called nodes with five columns: Node (the number assigned to the dolphin), Name (the name of the dolphin), dCent (degree centrality), bCent (betweenness centrality), and cCent (closeness centrality).
  5. Print a summary of this data frame.

rr dC <- degree(g) bC <- betweenness(g) cC <- closeness(g) centDF <- data.frame(Node = names(dC), dCent = dC, bCent = bC, cCent = cC, stringsAsFactors = FALSE) nodes <- left_join(nodes, centDF, by=‘Node’) summary(nodes)

     Node               Name               dCent            bCent             cCent         
 Length:62          Length:62          Min.   : 1.000   Min.   :  0.000   Min.   :0.002924  
 Class :character   Class :character   1st Qu.: 3.000   1st Qu.:  5.641   1st Qu.:0.004288  
 Mode  :character   Mode  :character   Median : 5.000   Median : 39.583   Median :0.005181  
                                       Mean   : 5.129   Mean   : 71.887   Mean   :0.005037  
                                       3rd Qu.: 7.000   3rd Qu.:102.638   3rd Qu.:0.005556  
                                       Max.   :12.000   Max.   :454.274   Max.   :0.006849  

Print the contents of nodes in descending order of degree centrality.

rr arrange(nodes, desc(dCent))

Print the contents of nodes in descending order of betweenness centrality.

rr arrange(nodes, desc(bCent))

Print the contents of nodes in descending order of closeness centrality.

rr arrange(nodes, desc(cCent))

List any names of any dolphins that appear in the top 10 for all three centrality measures.

SN4, Kringel, and Beescratch.

Visualizing Centrality

In the cell below, complete the following steps: 1. Set the seed equal to 1. 2. Create a cut of the vector dC. Set the cuts to roughly correspond to the quartiles of the degree centrality (refer to the summary above). Set labels = FALSE. 3. Create a RColorBrewer palette with 4 colors. 4. Plot the graph with the size and color of the vertices each determined by degree centrality. Use the cut and palette you defined to set the color. Then set the size to be equal to 2 + the value of the cut. Do not display the labels.

In the cell below, complete the following steps: 1. Set the seed equal to 1. 2. Create a cut of the vector bC. Use the following cut levels: -1, 25, 75, 150, 450, and 500. Set labels = FALSE. 3. Create a RColorBrewer palette with 5 colors. 4. Plot the graph with the size and color of the vertices each determined by betweenness centrality. Use the cut and palette you defined to set the color. Then set the size to be equal to 2 + the value of the cut. Do not display the labels.

Community Detection

In the cell below, complete the following steps:

  1. Set the seed equal to 1.
  2. Use cluster_edge_betweenness() to detect communities within the graph.
  3. Plot the graph, displaying the communities that have been detected. Do not display the vertex labels.

rr set.seed(1) ceb <- cluster_edge_betweenness(g) plot(ceb, g, vertex.label = NA, vertex.size = 4)

In the cell below, complete the following steps:

  1. Create a data frame called commDF with two columns: Node and Comm. The Comm column should indicate the community to each each dolphin has been assigned.
  2. Add the community information to the nodes data frame.
  3. Print a table with two columns: Comm and n. The Comm column should list the labels for the communities that have been detected. The n column should list the number of dolphins that have been assigned to each community. The table should be sorted in descending order by n.

rr mem <- membership(ceb) commDF <- data.frame(Node = names(mem), Comm = as.vector(unname(mem)), stringsAsFactors = FALSE) nodes <- left_join(nodes, commDF, by=‘Node’) nodes %>% group_by(Comm) %>% count() %>% arrange(desc(n))

Use induced.subgraph() to create a subgraph consisting of nodes associated with dolphins that have been assigned to the largest of the detected communities. Name the graph g2.

rr comm2 <- filter(nodes, Comm == 2) g2 <- induced.subgraph(g, vids = comm2$Node)

Run the cell below to plot the graph for the largest community with each node labeled by the name of the associated dolphin.

rr sel<- V(g2)\(name selDF <- data.frame(Node = sel, stringsAsFactors = FALSE) selDF <- left_join(selDF, nodes, by='Node') set.seed(1) plot(g2, vertex.label = selDF\)Name, vertex.label.cex = 1.5)

Clique Detection

Find the largest cliques in the network. Print a list of nodes contained in each clique.

rr lc <- largest_cliques(g) lc

[[1]]
+ 5/62 vertices, named, from d6c212d:
[1] 57 13 9  6  17

[[2]]
+ 5/62 vertices, named, from d6c212d:
[1] 51 45 18 29 24

[[3]]
+ 5/62 vertices, named, from d6c212d:
[1] 51 45 18 29 21

Print the names of the dolphins contained in each of the largest cliques. You will need a separate code chunk for each clique.

rr c1 <- lc[[1]]$name filter(nodes, Node %in% c1)

rr c2 <- lc[[2]]$name filter(nodes, Node %in% c2)

rr c3 <- lc[[3]]$name filter(nodes, Node %in% c3)

A larger clique could be created by adding a single edge between two dolphins. Which two dolphins are they?

MN83 and MN105.

---
title: "HW 07: Dolphin Social Network Analysis"
output: 
    html_notebook:
        theme: flatly
        toc: true
        toc_float: true
---

### Description

In this assignment, you will be performing an analysis of a social network of bottle nose dolphins. The nodes in the network each represent a member of a bottle nose dolphin community living off Doubtful Sounds in New Zealand. An edge exists between two nodes if there is a frequent association between the dolphins represented by those nodes. The observations were gathered between 1994 and 2001. 

### Load Packages

Begin by loading the igraph, dplyr, and RColorBrewer packages. 

```{r, warning=FALSE, message=FALSE}
require(igraph)
require(dplyr)
require(RColorBrewer)
```

### Load Contents

The file `nodes.txt` contains the names of the dolphins, as well as a numerical code that has been assigned to each dolphin. The file `edges.txt` contains a list of edges. 

Run the code below to read in and prepare the data for our analysis.

```{r}
nodes <- read.table("nodes.txt", sep="\t", header = TRUE, stringsAsFactors = FALSE)
edges <- read.table("edges.txt", sep="\t", header = TRUE)
nodes$Node <- as.character(nodes$Node)
```

Perform the following steps in the cell below:

1. Use the `graph_from_data_frame()` function to create an undirected graph from the edges. 
2. For the sake of reproducability, set the seed to 1 using `set.seed()`. 
3. Plot the graph without vertex labels. Select an appropriate vertex size for your graph. 

```{r, fig.width = 20}

g <- graph_from_data_frame(edges,directed=FALSE)
set.seed(1)
plot(g, vertex.size = 4, vertex.label = NA)
```

### Calculate Centrality Measures

Perform the following steps in the cell below:

1. Calculate the degree centrality of the nodes in the graph. Store the results. 
2. Calculate the betweenness centrality of the nodes in the graph. Store the results. 
3. Calculate the closeness centrality of the nodes in the graph. Store the results. 
4. Create a data frame called `nodes` with five columns: Node (the number assigned to the dolphin), Name (the name of the dolphin), dCent (degree centrality), bCent (betweenness centrality), and cCent (closeness centrality).
5. Print a summary of this data frame. 

```{r}
dC <- degree(g)
bC <- betweenness(g)
cC <- closeness(g)

centDF <- data.frame(Node = names(dC), dCent = dC, bCent = bC, cCent = cC, stringsAsFactors = FALSE)

nodes <- left_join(nodes, centDF, by='Node')
summary(nodes)
```

Print the contents of `nodes` in descending order of degree centrality. 

```{r}
arrange(nodes, desc(dCent))
```

Print the contents of `nodes` in descending order of betweenness centrality. 

```{r}
arrange(nodes, desc(bCent))
```

Print the contents of `nodes` in descending order of closeness centrality. 

```{r}
arrange(nodes, desc(cCent))
```

List any names of any dolphins that appear in the top 10 for all three centrality measures. 

**SN4, Kringel, and Beescratch.**


### Visualizing Centrality

In the cell below, complete the following steps:
1. Set the seed equal to 1. 
2. Create a cut of the vector `dC`. Set the cuts to roughly correspond to the quartiles of the degree centrality (refer to the summary above). Set `labels = FALSE`. 
3. Create a `RColorBrewer` palette with 4 colors. 
4. Plot the graph with the size and color of the vertices each determined by degree centrality. Use the cut and palette you defined to set the color. Then set the size to be equal to 2 + the value of the cut. Do not display the labels. 


```{r, fig.width = 20}
set.seed(1)
ct <- cut(dC, c(-1, 3, 5, 7, 13), labels=FALSE)
myPal <- brewer.pal(4, "RdPu")
plot(g, vertex.label = NA, vertex.size = 2 + ct, vertex.color = myPal[ct])
```

In the cell below, complete the following steps:
1. Set the seed equal to 1. 
2. Create a cut of the vector `bC`. Use the following cut levels: -1, 25, 75, 150, 450, and 500. Set `labels = FALSE`. 
3. Create a `RColorBrewer` palette with 5 colors. 
4. Plot the graph with the size and color of the vertices each determined by betweenness centrality. Use the cut and palette you defined to set the color. Then set the size to be equal to 2 + the value of the cut. Do not display the labels. 

```{r, fig.width = 20}
set.seed(1)
ct <- cut(bC, c(-1, 25, 75, 150, 450, 500), labels=FALSE)
myPal <- brewer.pal(5, "RdPu")
plot(g, vertex.label = NA, vertex.size = 2 + ct, vertex.color = myPal[ct])
```

### Community Detection

In the cell below, complete the following steps:

1. Set the seed equal to 1.
2. Use `cluster_edge_betweenness()` to detect communities within the graph. 
3. Plot the graph, displaying the communities that have been detected. Do not display the vertex labels. 



```{r, fig.width = 20}
set.seed(1)
ceb <- cluster_edge_betweenness(g)
plot(ceb, g, vertex.label = NA, vertex.size = 4)
```

In the cell below, complete the following steps:

1. Create a data frame called `commDF` with two columns: Node and Comm. The Comm column should indicate the community to each each dolphin has been assigned. 
2. Add the community information to the `nodes` data frame. 
3. Print a table with two columns: Comm and n. The Comm column should list the labels for the communities that have been detected. The n column should list the number of dolphins that have been assigned to each community. The table should be sorted in descending order by n. 

```{r}
mem <- membership(ceb)
commDF <- data.frame(Node = names(mem), Comm = as.vector(unname(mem)), stringsAsFactors = FALSE)

nodes <- left_join(nodes, commDF, by='Node')

nodes %>% group_by(Comm) %>% count() %>% arrange(desc(n))

```

Use `induced.subgraph()` to create a subgraph consisting of nodes associated with dolphins that have been assigned to the largest of the detected communities. Name the graph `g2`. 

```{r}
comm2 <- filter(nodes, Comm == 2)
g2 <- induced.subgraph(g, vids = comm2$Node) 
```

Run the cell below to plot the graph for the largest community with each node labeled by the name of the associated dolphin. 


```{r, fig.width = 20}
sel<- V(g2)$name

selDF <- data.frame(Node = sel, stringsAsFactors = FALSE)
selDF <- left_join(selDF, nodes, by='Node')

set.seed(1)
plot(g2, vertex.label = selDF$Name, vertex.label.cex = 1.5)

```

### Clique Detection

Find the largest cliques in the network. Print a list of nodes contained in each clique. 

```{r}
lc <- largest_cliques(g)
lc
```

Print the names of the dolphins contained in each of the largest cliques. You will need a separate code chunk for each clique. 

```{r}
c1 <- lc[[1]]$name
filter(nodes, Node %in% c1)
```

```{r}
c2 <- lc[[2]]$name
filter(nodes, Node %in% c2)
```

```{r}
c3 <- lc[[3]]$name
filter(nodes, Node %in% c3)
```

A larger clique could be created by adding a single edge between two dolphins. Which two dolphins are they?

**MN83 and MN105**. 



