Importing Data from the Movie Network

Now that you have invested the time to code the various ties within the movie you just watched, it is time to learn how you can load it into R to analyze the movie network further using the tools we are covering in the course.

Understanding the type of data you have created

Because everyone has recorded ties using 1 to signify the presence of a tie and either a blank cell or 0 to indicate the absence of a tie, we have, together, created an adjacency matrix. An ajacency matrix describes which nodes are adjacent (connected) and which are not.

Now that the matrix have been completed, please take a couple minutes to make sure that the character that you coded has the ties that you observed. If the information in the network is accurate to the best of your knowledge, then you are ready to download the matrix and begin analyses.

The easiest means of getting the data into R involves downloading a sheet from the Google spreadsheet we created. When you download each sheet, save it as a csv file. This will simplify your task.

In Google Sheets: File > Download as > Coma-separated values (.csv, current sheet)

Converting the data into an igraph object

Now that you have the data, load igraph.

library(igraph)

Now, import the data as a matrix that we can use to create a network data object. Next, convert it to a matrix so that it will read-in properly.

network <- read.csv(file.choose(), 
                    header=TRUE, 
                    row.names=1, 
                    check.names=FALSE)

m <- as.matrix(network) # coerces the data set as a matrix

Once the matrix has been uploaded and converted, it is time to take care of the empty cells in the matrix. We coded the ties that were present and some people entered zeroes for the ties that were not present. But we need to enter zewroes for all the other empty cells that should be zeroes. To do so, use the following procedure.

For more options on this, see: https://www.programmingr.com/examples/r-replace-na-values-with-0

First, take a look at what you have. Here, we are just looking at the first five rows and the first five columns to get an idea of how well it imported.

When they are directly adjacent to a data object, square brackets are used to tell which part of the data object you would like to use. Inside the square brackets, we reference first rows, then columns. m["rows", "columns"] In the example below, we are referencing rows one through five using 1:5. We could also use c(1:5), c(1,2,3,4,5), or a subset of that like c(1, 3, 5). The same also goes for columns.

m[1:5, 1:5]

##                            King Arthur Second Swallow-Savvy Guard Bedivere
## King Arthur                         NA                          1        1
## Second Swallow-Savvy Guard           1                         NA       NA
## Bedivere                             1                         NA       NA
## Lancelot                             1                         NA        1
## Galahad                              1                         NA        1
##                            Lancelot Galahad
## King Arthur                       1       1
## Second Swallow-Savvy Guard        1      NA
## Bedivere                          1       1
## Lancelot                         NA       1
## Galahad                           1      NA

Clearly, there are missing values, as indicated by the NAs. That is because there was no need to identify where the ties aren’t. We can, instead, replace missing values with zeroes using the following.

m[is.na(m)] <- 0 # Replace missings with "0"

Take another look to verify.

m[1:5, 1:5]

##                            King Arthur Second Swallow-Savvy Guard Bedivere
## King Arthur                          0                          1        1
## Second Swallow-Savvy Guard           1                          0        0
## Bedivere                             1                          0        0
## Lancelot                             1                          0        1
## Galahad                              1                          0        1
##                            Lancelot Galahad
## King Arthur                       1       1
## Second Swallow-Savvy Guard        1       0
## Bedivere                          1       1
## Lancelot                          0       1
## Galahad                           1       0

Just to be sure, you are advised to simply look at the entire matrix - just to be sure that no missings or blanks remain.

We won’t run that here, to save space. But all you need to do is type in the name of the matrix as we do below.

At this point, you are almost finished. You just need to use igraph to create a network object. If the ties in the network are supposed to be directed, use the following:

g <- graph.adjacency(m, 
                     mode="directed", 
                     weighted=NULL) 
# Create a directed network of class 'igraph object'.

If the network should be undirected use the following:

g <- graph.adjacency(m, 
                     mode="undirected", 
                     weighted=NULL) 
# Create an undirected network of class 'igraph object'.

Once you have loaded the network, check to be sure that it loaded correctly. You can do this by simply typing the network’s name (in this example, the name of the network is g), or by plotting the network (plot(g)).

Adding Attributes

To get the attributes, switch to the “Attributes” tab in the Google Spreadsheet and download it as a csv file, as you did above. Once you have the csv file, you can import it into R.

Atts <- read.csv(file.choose(), header=T)

# Take a look to make sure it came in okay

Next, you can add vertex attributes to the network. So, check out your choices by looking at the attribute data again. To do this, use the head() function to read the first few rows of the matrix. Though, with smaller matrices like this one, you are probably just as well off typing the name of the matrix (rank) into the console to see the whole thing all at once.

head(Atts)

##                   Characters Swallow.Knowledge Gatekeeper
## 1                King Arthur                 1         NA
## 2 Second Swallow-Savvy Guard                 1          1
## 3                   Bedivere                 1          1
## 4                   Lancelot                 0          0
## 5                    Galahad                NA         NA
## 6                      Robin                NA         NA
##   Rank..Knight.or.higher   Rank
## 1                      1  noble
## 2                     NA common
## 3                      1  noble
## 4                      1  noble
## 5                      1  noble
## 6                      1  noble

There are four attributes here (Swallow.Knowledge, Gatekeeper, Rank..Knight.or.higher, and Rank) as well as the list of characters that we’ll use to map the attributes onto the network.

In this case, the first three potential attributes have substantial amounts of missing data. Leave those out this time and only add the attribute, “Rank”.

To add an attribute, we need to identify that it is a vertex attribute. We do that by using V(g), which singles out the vertices (V) of the network named “g”, and we will assign the attribute “Rank” to the vertices (V(g)$Rank).

It does get just a little more complicated when we consider that we cannot be sure that the network vertex names are in the same order as we have the attributes. So, just in case, we’ll match the names in the attribute sheet to the names of the nodes in the network using the match() function.

V(g)$Rank <- as.character(Atts$Rank[match(V(g)$name, Atts$Characters)])

As usual, check to be sure that the attribute is included in the network data.

## IGRAPH 7528a0b UN-- 30 102 -- 
## + attr: name (v/c), Rank (v/c)
## + edges from 7528a0b (vertex names):
##  [1] King Arthur--Second Swallow-Savvy Guard 
##  [2] King Arthur--Bedivere                   
##  [3] King Arthur--Lancelot                   
##  [4] King Arthur--Galahad                    
##  [5] King Arthur--Robin                      
##  [6] King Arthur--Brother Maynard            
##  [7] King Arthur--Killer Rabbit of Caerbannog
##  [8] King Arthur--Gawain                     
## + ... omitted several edges

Once the attribute has been imported, we can color the network according to the attribute and plot it.

V(g)$color <- V(g)$Rank #assign the "Rank" attribute as the vertex color
V(g)$color <- gsub("noble","purple",V(g)$color) #Nobles will be purple
V(g)$color <- gsub("common","brown",V(g)$color) #Commoners will be brown

plot.igraph(g,
            vertex.size=6,
            vertex.label.cex=0.65,
            edge.arrow.size=0,
            layout=layout.kamada.kawai)

Check out Brendan’s visualizaiton cookbook to make your rendering of the network look better than this one.

Saving Your Work

Once you have finished importing a network object into R and you are satisfied with your work, it is a good idea to save the network so that you can use it again in the future without having to run through all these procedures once more.

Saving data in R is super easy. The command is save() and the arguments are the object you are saving, and what you want to call it when you save it to your computer. When you use it, it will look something like: save(object, file="object_name.rda")

Following that logic, imagine we have just loaded a network of characters that have met one another. If we named the network “g” when we created it, as in the example above, then we would save it as:

save(g, file="Has_Met.rda")

In these examples, we tend to use the same name for a network over and over again. In our case, the networks are generally named “g” when working with igraph and they are named “net” when working with statnet. But, naming conventions of this sort are done for convenience only in the examples and can be a bad habit in practice.

Something to keep in mind when you save networks is that, no matter what you call them when you save them on your computer, R will remember the name of the original object that you were saving. This means that, if you named all of your networks “g” when you created them, they will overwrite one another when you reload them.

That is because, as far as R is concerned, the file name on your computer is just a container where your data are stored. R really only pays attention to the stuff inside the container. So if there are a bunch of containers that hold data named “g”, then R is only going to keep up with the latest version of “g” that was loaded, since it does not know the difference between them.

We can prevent this sort of confusion by renaming the network before saving it.

Has_Met <- g # Change the name of the network from "g" to "Has_Met"

save(Has_Met, file="Has_Met.rda") # Save "Has_Met"

In either case, the network object you are saving will be written to whatever folder you set R to work in at the beginning of the exercise.

Importing Data from the Movie Network

Phil Murphy

Understanding the type of data you have created

Converting the data into an igraph object

Adding Attributes

Saving Your Work