Importing Edgelist Data from the Movie Network

Understanding the type of data you have created

Converting the data into an igraph object

Adding Attributes

Saving Your Work

Just to reiterate…

Now that you have invested the time to code the various ties within the movie you just watched, it is time to learn how you can load it into R to analyze the movie network further using the tools we are covering in the course.

Understanding the type of data you have created

Because everyone has recorded ties using names that indicate who ties come from and go to, we have, together, created an edgelist. An edgelist describes which nodes are ajacent (connected) and omits thost that are not.

Once the edgelist have been completed, please take a couple minutes to make sure that the character that you coded has the ties that you observed. If the information in the network is accurate to the best of your knowledge, then you are ready to download the edgelist and begin analyses.

The easiest means of getting the data into R involves downloading a sheet from the Google spreadsheet we created. When you download each sheet, save it as a csv file. This will simplify your task.

In Google Sheets: File > 
                    Download as > 
                              Comma-separated values (.csv, current sheet)

Converting the data into an igraph object

Now that you have the data, load igraph.

library(igraph)

Now, import the data as a matrix that we can use to create a network data object. Next, convert it to a matrix so that it will read-in properly.

network <- read.csv(file.choose(), 
                    header=FALSE, 
                    check.names=TRUE)  # First find and load your data.

m <- as.matrix(network) # coerces the data set as a matrix

Once the edgelist has been uploaded and converted to a 2 x n matrix, it is time to create the network object with igraph. If the ties in the network are supposed to be directed, use the following:

g <- graph_from_edgelist(m, directed = TRUE)  # Create a directed network of class 'igraph object'.

If the network should be undirected use the following:

g <- graph_from_edgelist(m, directed = FALSE) # Create an undirected network of class 'igraph object'.

Once you have loaded the network, check to be sure that it loaded correctly. You can do this by simply typing the network’s name (in this case, g), or by plotting the network (plot(g)).

Please, please name your network something other than “g”. We use “g” for convenience sake in these scripts so that things will generally run cleanly and easily. In this case, think of a very short version of a name to use that still describes what the network is.

Renaming the network is just a matter of putting “g” into a new wrapper.

cool_name <- g

In your case, you should name the network something like:

talked_to <- g

Adding Attributes

To get the attributes, switch to the “Attributes” tab in the Google Spreadsheet and download it as a csv file, as you did above. Once you have the csv file, you can import it into R.

Attributes <- read.csv(file.choose(), header=TRUE)

Attributes # Take a look to make sure it came in okay

Next, you can add the vertex attribute to the network. Here, we’ll focus on the “potential_successor” attribute.

We cannot be sure that the network vertex names are in the same order as we have the attributes. So, just in case, we’ll match the names in the attribute sheet to the names of the nodes in the “talked_to” network “talked_to”. Once you complete this for the “talked_to” network, do the same for the “orders” network.

V(talked_to)$potential_successor <- as.character(Attributes$potential_successor[match(V(talked_to)$name, Attributes$Characters)])

As usual, check to be sure that the attribute is included in the network data.

 talked_to

Once the attribute has been imported, we can color the network according to the attribute and plot it.

To do this, we add a new attribute named “color.” Igraph will recognize this attribute when you plot the network, as long as you name it “color” (all lower case).

Keep in mind that we coded the potential_successor attribute as “1” for a potential successor and “0” for those who were not. To add the colors, use the following code:

V(talked_to)$color <- V(talked_to)$potential_successor #assign the characters who were potential successors a vertex color
V(talked_to)$color <- gsub(1, "red", V(talked_to)$color) #potential successors will be red
V(talked_to)$color <- gsub(0, "blue", V(talked_to)$color) #others will be blue

plot.igraph(talked_to,
            vertex.size=6,
            vertex.label.cex=0.65,
            edge.arrow.size=0,
            layout=layout.kamada.kawai)

Check out Brendan’s visualizaiton cookbook to make your rendering of the network look better than this one.

Saving Your Work

Once you have finished importing a network object into R and you are satisfied with your work, it is a good idea to save the network so that you can use it again in the future without having to run through all these procedures once more.

Saving data in R is super easy. The command is save() and the arguments are the object you are saving, and what you want to call it when you save it to your computer. When you use it, it will look something like: save(object, file="object_name.rda")

Following that logic, imagine we have just loaded a network of characters that have met one another. If we named the network “g” when we created it, as in the example above, then we would save it as:

save(talked_to, file="talked_to.rda")
save(orders, file="orders.rda")

Just to reiterate…

In these examples, we tend to use the same name for a network over and over again. In our case, the networks are generally named “g” when working with igraph and they are named “net” when working with statnet. But, naming conventions of this sort are done for convenience only in the examples and can be a bad habit in practice.

Something to keep in mind when you save networks is that, no matter what you call them when you save them on your computer, R will remember the name of the original object that you were saving. This means that, if you named all of your networks “g” when you created them, they will overwrite one another when you reload them.

That is because, as far as R is concerned, the file name on your computer is just a container where your data are stored. R really only pays attention to the stuff inside the container. So if there are a bunch of containers that hold data named “g”, then R is only going to keep up with the latest version of “g” that was loaded, since it does not know the difference between them.

We can prevent this sort of confusion by renaming the network before saving it.

orders <- g # Change the name of the network from "g" to "orders"

save(orders, file="orders.rda") # Save "orders"

In either case, the network object you are saving will be written to whatever folder you set R to work in at the beginning of the exercise.