2023-08-10

Class Plan

  • Data activity (10 min)
  • Introduction to Networks (10 min)
  • Building our first networks (20 min)
  • Measures of Centrality (15 min)
  • Break (5 min)
  • IPCC Report Authorship (25 min)
  • Final project time (Remainder)

Week 7 Groups!

print.data.frame(groups)
##                   group 1             group 2               group 3
## 1      Jun, Ernest Ng Wei       Ning, Zhi Yan       Tan, Zheng Yang
## 2            Shah, Jainam           Su, Barry           Tian, Zerui
## 3         Gnanam, Akash Y        Gupta, Umang        Somyurek, Ecem
## 4 Alsayegh, Aisha E H M I Andrew Yu Ming Xin, Leong, Wen Hou Lester
##                     group 4                          group 5
## 1   Spindler, Laine Addison             Dotson, Bianca Ciara
## 2             Cai, Qingyuan Ramos, Jessica Andria Potestades
## 3 Saccone, Alexander Connor           Cortez, Hugo Alexander
## 4          Wan Rosli, Nadia                                 
##                 group 6                  group 7
## 1   Premkrishna, Shrish Huynh Le Hue Tam, Vivian
## 2       Knutson, Blue C                         
## 3         Lim, Fang Jan             Ng, Michelle
## 4 Widodo, Ignazio Marco      Albertini, Federico

Data Activity

  • Task: communicate a message about the data using a visual approach of your choosing.
  • Consider: which information is most important? how much can you display graphically?
  • Visualization does not need to be complete!

Learning Goals

  • Understand network structure (node vs. edge, directed vs. undirected)
  • Learn how to plot networks using igraph
  • Understand different types of centrality and what these mean
  • Motivate examination of IPCC authorship network

What Are Networks?

What Are Networks?

What Are Networks?

What Are Networks?

What Are Networks?

  • We can modify the nodes and edges according to various characteristics to make our visuals more interesting

Data Visualization

  • Nodes are users
  • Edges are twitter interactions
  • Colors based on opinions

Data Visualization

  • Nodes are actors
  • Edges are information shared
  • Color based on agreement with ‘There should be an international binding commitment on all nations to reduce GHG emissions’.

Data Visualization

  • Special case: two-mode network
  • Small nodes are actors, large nodes are organizations

Data Visualization

  • Can represent this as one-mode network
  • Now ties represent shared actors
  • Colors represent funding from Koch/Exxon (green) or not (red)

Building Our First Networks

Building Our First Networks

  • Let’s create a network!
  • We’ll represent friendship ties among 7 people
  • We’ll call it el for “edge list”
  • Notice how the data structure is different
  • Needs to be a matrix
el <- data.frame(person_1 = c("Lionel", "Frederica", "Garrett", "Lina",
                               "Sampson", "Garrett", "Frederica", "Lionel",
                              "Frederica", "Frederica", "Ahmad"),
                  person_2 = c("Lina", "Charlotte", "Lina", "Sampson",
                               "Charlotte", "Lionel", "Lina", "Sampson",
                               "Sampson", "Garrett", "Garrett")) %>%
  as.matrix()
# take a look
el
##       person_1    person_2   
##  [1,] "Lionel"    "Lina"     
##  [2,] "Frederica" "Charlotte"
##  [3,] "Garrett"   "Lina"     
##  [4,] "Lina"      "Sampson"  
##  [5,] "Sampson"   "Charlotte"
##  [6,] "Garrett"   "Lionel"   
##  [7,] "Frederica" "Lina"     
##  [8,] "Lionel"    "Sampson"  
##  [9,] "Frederica" "Sampson"  
## [10,] "Frederica" "Garrett"  
## [11,] "Ahmad"     "Garrett"

Building Our First Networks

  • We can use igraph to create a network object
  • Should it be directed?
library(igraph)

# create network from edge list
net <- graph_from_edgelist(el)

Building Our First Networks

  • Let’s assume friendships are bidirectional (undirected)
library(igraph)

# create network from edge list
net <- graph_from_edgelist(el, directed = FALSE)

Building Our First Networks

  • Let’s try plotting our network!
# plot network
plot(net)

Building Our First Networks

  • We can adjust the graph using arguments that start with vertex and edge
# customize network plot
plot(net, vertex.size = 30, 
     vertex.color = "lavender", 
     vertex.frame.color = NA, 
     vertex.label.cex = .7, 
     edge.curved = .05, 
     edge.arrow.size = .3, 
     edge.width = .7, 
     edge.color = "gray1")

Measures of Centrality

Measures of Centrality

  • Degree Centrality
  • Based on number of edges, node with most edges is most central

Measures of Centrality

  • Betweenness Centrality
  • Based on “shortest paths” between nodes, node involved in most “shortest paths” is most central

Measures of Centrality

  • Closeness Centrality
  • Based on “shortest paths” between nodes, node with lowest average “shortest path” to the others is most central

Measures of Centrality

  • Eigenvalue Centrality
  • Based on the degree of adjacent nodes, node with “friends in high places” is most central

Measures of Centrality

  • Group activity:
  • Which point(s) have the highest degree centrality? (most edges)
  • Betweenness centrality? (involved in most “shortest paths”)
  • Closeness centrality? (lowest average “shortest path” to others)
  • Eigenvalue Centrality? (“friends in high places”)

Measures of Centrality

IPCC Report Authorship

IPCC Report Authorship

  • IPCC has written 6 climate assessments since 1990 (FAR - AR6)
  • Three working groups:
  • WGI: physical and scientific basis of climate change
  • WGII: vulnerability of natural and social systems to climate change
  • WGIII: mitigation strategies and policy recommendations
  • All of these influence global policies and mitigation strategies
  • Geographic bias? Most authors come from “Global North” countries

IPCC Report Authorship

  • Let’s take a look at the WGIII authors
ipcc_authors <- read_csv("IPCC_co_authorship.csv",
                         skip = 1)

IPCC Report Authorship

  • Need to clean up data
library(dplyr)
# reorder column names to match rows
ipcc_authors %<>%
  select(order(colnames(.)))


# turn dataframe into matrix
ipcc_authors %<>%
  select(-Author) %>%
  as.matrix()

IPCC Report Coauthoship

  • Now we can turn it into a network!
ipcc_net <- graph_from_adjacency_matrix(ipcc_authors,
                                        mode = "undirected")

IPCC Report Coauthorship

  • Try plotting
plot(ipcc_net, vertex.size = 2, 
     vertex.color = "lavender", 
     vertex.frame.color = NA, 
     vertex.label.cex = .5, 
     edge.curved = .05, 
     edge.width = .1, 
     edge.color = "gray1")

IPCC Report Coauthorship

  • Let’s only look at \(\ge2\) collaborations
# replace 1s with 0s
ipcc_authors[ipcc_authors == 1] = 0

# replace NAs with 0s
ipcc_authors[is.na(ipcc_authors)] = 0

IPCC Report Coauthorship

  • Let’s remove authors who did not collaborate with anyone
# vector of authors who do not collaborate with others
no_colab <- which(apply(ipcc_authors, 1, sum, na.rm = TRUE) == 0)


# then we can remove these from ipcc_authors
ipcc_authors <- ipcc_authors[-no_colab,
                             -no_colab]

IPCC Report Coauthorship

  • Now let’s try again
# create network, again
ipcc_net <- graph_from_adjacency_matrix(ipcc_authors,
                                        mode = "undirected")

plot(ipcc_net, vertex.size = 2, 
     vertex.color = "lavender", 
     vertex.frame.color = NA, 
     vertex.label.cex = .5, 
     edge.curved = .05, 
     edge.width = .1, 
     edge.color = "gray1")

IPCC Report Coauthorship

  • We can set network characteristics using V()
# add betweenness centrality
V(ipcc_net)$betweenness <- betweenness(ipcc_net)

IPCC Report Coauthorship

  • We can use network characteristics to format plot
plot(ipcc_net, 
     vertex.size = V(ipcc_net)$betweenness/max(V(ipcc_net)$betweenness) * 20, 
     vertex.color = "lavender", 
     vertex.frame.color = NA, 
     vertex.label.cex = .5, 
     edge.curved = .05, 
     edge.width = .1, 
     edge.color = "gray1")

Moving Forward

  • Touch base with me on final project if you have not yet!
  • Grading rubrics up on Canvas

Class Plan

  • Data activity (10 min)
  • IPCC Report Authorship (25 min)
  • Discussion on networks (10 min)
  • Break (5 min)
  • Interactive network plots (15 min)
  • Migration and chord diagrams (5 min)
  • Final Reflections (10 min)
  • Final project time (Remainder)

Week 7 Groups!

print.data.frame(groups)
##                   group 1             group 2               group 3
## 1      Jun, Ernest Ng Wei       Ning, Zhi Yan       Tan, Zheng Yang
## 2            Shah, Jainam           Su, Barry           Tian, Zerui
## 3         Gnanam, Akash Y        Gupta, Umang        Somyurek, Ecem
## 4 Alsayegh, Aisha E H M I Andrew Yu Ming Xin, Leong, Wen Hou Lester
##                     group 4                          group 5
## 1   Spindler, Laine Addison             Dotson, Bianca Ciara
## 2             Cai, Qingyuan Ramos, Jessica Andria Potestades
## 3 Saccone, Alexander Connor           Cortez, Hugo Alexander
## 4          Wan Rosli, Nadia                                 
##                 group 6                  group 7
## 1   Premkrishna, Shrish Huynh Le Hue Tam, Vivian
## 2       Knutson, Blue C                         
## 3         Lim, Fang Jan             Ng, Michelle
## 4 Widodo, Ignazio Marco      Albertini, Federico

Data Activity

  • Task: communicate a message about the data using a visual approach of your choosing.
  • Consider: which information is most important? how much can you display graphically?
  • Visualization does not need to be complete!

Learning Goals

  • Plot networks with node/edge attributes
  • Consider cases of network data in issues relating to climate change and society
  • Consider uses of interactive networks and chord diagrams
  • Reflect on class

Measures of Centrality

IPCC Report Coauthorship

IPCC Report Coauthorship

  • Review of what we’ve done:
  • (no need to run again if you have ipcc_authors in your environment)
  • First, read in data
ipcc_authors <- read_csv("IPCC_co_authorship.csv",
                         skip = 1)

IPCC Report Coauthorship

  • Clean it up!
library(dplyr)
# reorder column names to match rows
ipcc_authors %<>%
  select(order(colnames(.)))


# turn dataframe into matrix
ipcc_authors %<>%
  select(-Author) %>%
  as.matrix()

IPCC Report Coauthorship

  • Let’s only look at \(\ge2\) collaborations
# replace 1s with 0s
ipcc_authors[ipcc_authors == 1] = 0

# replace NAs with 0s
ipcc_authors[is.na(ipcc_authors)] = 0

IPCC Report Coauthorship

  • We removed authors who did not collaborate with anyone
# vector of authors who do not collaborate with others
no_colab <- which(apply(ipcc_authors, 1, sum, na.rm = TRUE) == 0)


# then we can remove these from ipcc_authors
ipcc_authors <- ipcc_authors[-no_colab,
                             -no_colab]

IPCC Report Coauthorship

  • Create new igraph object
  • Can change weighted = TRUE to weight by collaborations
# create network, again
ipcc_net <- graph_from_adjacency_matrix(ipcc_authors,
                                        mode = "undirected",
                                        weighted = TRUE)

IPCC Report Coauthorship

  • We set network characteristics using V()
# add betweenness centrality
V(ipcc_net)$betweenness <- betweenness(ipcc_net)

IPCC Report Coauthorship

  • We can examine the “most central” authors, according to betweeenness scores
# show authors in descending order
V(ipcc_net)[order(V(ipcc_net)$betweenness, decreasing = TRUE)]
## + 148/148 vertices, named, from cd68747:
##   [1] Shukla             Schaeffer          Smith             
##   [4] Edenhofer          Price              Carraro           
##   [7] Uerge-Vorsatz      Krey               Riahi             
##  [10] Clarke             Rogner             Lutz              
##  [13] Kahn-Ribeiro       Nakicenovic        Winkler           
##  [16] Hoehne             Paltsev            Michaelowa        
##  [19] Strachan           Sathaye            Faaij             
##  [22] Hourcade           Ravindranath       Halsnaes          
##  [25] Bosetti            Zhang              Skea              
##  [28] Den-Elzen          Lecocq             Fischedick        
## + ... omitted several vertices

IPCC Report Coauthorship

  • We can specify that we want edge width to be proportional to weights:
# plot network
plot(ipcc_net, 
   edge.width=(E(ipcc_net)$weight)/5,
    vertex.size = V(ipcc_net)$betweenness/max(V(ipcc_net)$betweenness) * 20, 
     vertex.color = "lavender", 
     vertex.alpha = 0.5,
     vertex.frame.color = NA, 
     vertex.label.cex = .5, 
     edge.curved = .05, 
     edge.width = .1, 
     edge.color = "gray1")

IPCC Report Coauthorship

  • We can specify that we want edge width to be proportional to weights:

IPCC Report Coauthorship

  • What else could we add to the plot?

IPCC Report Coauthorship

  • We could add region to examine differences in network centrality
# read in cv data
ipcc_cvs <- read_csv("IPCC_cv.csv")

# define global regions
ipcc_cvs %<>%
  mutate(region = case_match(`IPCC representing country (i.e. Country of residence from IPCC doc)`,
                             # Europe will be green
                             c("Austria", "Belgium", "Denmark", "Finland", 
                               "France", "Germany", "Greece", "Hungary", "Italy",
                               "Netherlands", "Norway", "Spain", "Sweden", 
                               "Switzerland", "UK") ~ "green4", 
                             # North America will be red
                             c("Canada", "USA", "Mexico") ~ "tomato",
                             # BRICS will be blue
                             c("Brazil", "Russia", "India", "China", 
                               "South Africa") ~ "steelblue", 
                             # Other countries will be gold
                             .default = "gold"))

IPCC Report Coauthorship

  • Define region variable for the vertices
# define country variable
V(ipcc_net)$region <- ipcc_cvs$region

IPCC Report Coauthorship

  • Can change vertex color using vertex.color
# plot network
plot(ipcc_net, 
   edge.width=(E(ipcc_net)$weight)/5,
    vertex.size = V(ipcc_net)$betweenness/max(V(ipcc_net)$betweenness) * 20, 
     vertex.color = V(ipcc_net)$region, 
     vertex.alpha = 0.5,
     vertex.frame.color = NA, 
     vertex.label.cex = .5, 
     edge.curved = .05, 
     edge.width = .1, 
     edge.color = "gray1")

IPCC Report Coauthorship

  • Finally, we can add a legend manually
# add legend
legend(
  "bottomright",
  legend = c("Europe", "North America", "BRICS", "Other"),
  pt.bg  = c("green4", "tomato", "steelblue", "gold"),
  pch    = 21,
  cex    = 1,
  title  = "Region"
  )

IPCC Report Coauthorship

  • Finally, we can add a legend manually

Networks and Society

Networks and Society

  • Much of the social dimensions of reactions to and opinions on climate topics have network structures
  • We’ll discuss three of these, and how they relate to inequality in resources/beliefs

Networks and Society

  • First, networks matter for climate resilience

Networks and Society

  • Second, networks matter for post-disaster inequalities

“local network capacities of Lower Ninth Ward residents relative to those of the more affluent Lakeview neighborhood dissipated before, during, and after the disaster to erode the life chances of individual residents and the neighborhood they once constited.” (Elliott, Haney, & Sams-Abiodun, 2010:624)

“For example, if a translocal tie can help one find housing during evacuation, this assistance may be useful only for those who also possess the financial capital needed to travel there and pay market rent; otherwise, such translocal assistance may be much less useful and therefore go unutilized.” (Elliott et al., 2010:630)

Networks and Society

  • Third, networks matter for how we form our beliefs around issues like climate change

Networks and Society

  • In groups discuss:
  1. What is an example of a network structure related to climate change and society?
  2. What are the nodes? What are the edges?
  3. Is centrality in this network important? Which measure(s) best capture(s) this?

Interactive Networks with networkD3

Interactive Networks with networkD3

  • First, transform igraph object to networkd3 object
library(networkD3)

# create dataframe for networkD3
ipcc_d3 <- igraph_to_networkD3(ipcc_net, group = V(ipcc_net)$region)

Interactive Networks with networkD3

  • Next, set color palette (optional)
# set color scale
ColourScale <- 'd3.scaleOrdinal()
            .domain(["Europe", "North America", "BRICS", "Other"])
           .range(["#07C343", "#D2140C", "#0C4AD2", "#C7D20C"]);'

Interactive Networks with networkD3

  • Then use forceNetwork() to build interactive network
p <- forceNetwork(Links = ipcc_d3$links,
                  Nodes = ipcc_d3$nodes,
                  Group = "group",
                  height="400px", width="800px",
                  NodeID = "name",
                  fontSize = 14,
                  colourScale = JS(ColourScale),
                  zoom = TRUE)

Interactive Networks with networkD3

  • Plot network!
# plot network!
p

Migration and Chord Diagrams

Migration and Chord Diagrams

  • First, read in migration data
# read migration data
ca_migration <- read_csv("ca_migration.csv")
## Rows: 28 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): from, to
## dbl (1): value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Migration and Chord Diagrams

library(circlize)
library(stringr)
library(viridis)

# create chord diagram
chordDiagram(ca_migration,
             transparency = 0.25,
              grid.col = inferno(7),
  directional = 1,
  direction.type = c("arrows", "diffHeight"), 
  diffHeight  = -0.04,
  annotationTrack = "grid", 
  annotationTrackHeight = c(0.05, 0.1),
  link.arr.type = "big.arrow", 
  link.sort = TRUE, 
  link.largest.ontop = TRUE)

Migration and Chord Diagrams

  • Add stylistic choices
# Add text and axis
circos.trackPlotRegion(
  track.index = 1, 
  bg.border = NA, 
  panel.fun = function(x, y) {
    
    xlim = get.cell.meta.data("xlim")
    sector.index = get.cell.meta.data("sector.index")
    
    # Add names to the sector. 
    circos.text(
      x = mean(xlim), 
      y = 2.2, 
      labels = sector.index, 
      facing = "bending", 
      cex = 0.8
      )

  }
)

Migration and Chord Diagrams

  • When can we use chord diagrams?

Final Reflections

Final Reflections

  • On the first day, we defined climate change and data science
  • Think back to how you defined these!
  • In groups: have your definitions changed during this class?
  • What has led your conceptions of these topics to change or remain stable?
  • What are the possibilities for using data science tools to think about climate change? What are the challenges?

Final Problem Set

  • Fixed typo in question 3 (betweenness and eigenvalue scores switched)
  • Bookdown is correct now
  • Won’t take points off if you’ve already turned it in and misinterpreted this

Presentations

  • 12 minutes slides (max!)
  • 5-8 minutes Q&A
  • We will start promptly at 1:30! Recommend getting here early so we can leave on time
  • Can present on own computers (hdmi cord w/ adapter) or email slides to me