Useful commands or R Markdown Cheat Sheet
Ignacio et al. (2022) Banyard, Hamby, and Grych (2017) Cai et al. (2024) Cano et al. (2020) Cromer et al. (2019) Hong et al. (2018) Ferber and Weller (2022) Callaghan et al. (2019) Chan et al. (2023) Weidacker et al. (2022) Hirshfeld-Becker et al. (2019) Gur et al. (2020) Bram, Gottschalk, and Leeds (2018) Wymbs et al. (2020) Bernstein and McNally (2018) Gildawie, Honeycutt, and Brenhouse (2020) Kuhlman et al. (2023) Kayser et al. (2019) Dvorsky et al. (2019) Kirby et al. (2022) Murphy et al. (2017) Carter, Powers, and Bradley (2020) Ramaiya et al. (2018) Sendzik et al. (2017) Wu, Slesnick, and Murnan (2018) Cornwell et al. (2024) Skinner et al. (2020) Gibbons and Bouldin (2019) Rodman et al. (2019) Higheagle Strong et al. (2020) McRae et al. (2017) Martinez Jr et al. (2022) Vannucci et al. (2019) Noroña-Zhou and Tung (2021) Motsan, Yirmiya, and Feldman (2022) Rich et al. (2019) Krause et al. (2018) Sui et al. (2020) Grych et al. (2020) Vega-Torres et al. (2020) Kliewer and Parham (2019) Griffith, Farrell-Rosen, and Hankin (2023) Siciliano et al. (2023) White et al. (2021) Bettis et al. (2019) Rudolph et al. (2024) Linke et al. (2020) Cornwell et al. (2023) Criss et al. (2017) Koban et al. (2017) Kashdan et al. (2020) Priel et al. (2020) Hannan et al. (2017) Malberg (2023) Tang, Tang, and Gross (2019) Caceres et al. (2024) Gupta, Dickey, and Kujawa (2022) Gee (2022) Khahra et al. (2024) Szoko et al. (2023) Barzilay et al. (2020) Jiang, Paley, and Shi (2022) Xiao et al. (2019) Finkelstein-Fox, Park, and Riley (2018) Sevinc et al. (2019) Blair et al. (2018) Buthmann et al. (2024) Schäfer et al. (2017) Smith and Jen’nan (2024) explored research focusing on various populations and areas to find the effects of several factors like living experience, environment, physical and mental health, and their relationship of emotion regulation and resilience development.
This is where the steps go
#clean data, prepare author names that match ID, get one-mode matrix g and two-mode matrix g2
library(splitstackshape)
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
source <- read.csv("/Users/jinzeyang/Desktop/Fall 2024/Social Networks Analysis/A_Delieverables/US emotion regulation.csv")
# clean the data, keep only authors and articles
source <- source[source$Author.s..ID!="[No author id available]",]
### create a author-author matrix through transformation of a two-mode edgelist#
authors <- as.data.frame(source[,3])
colnames(authors) <- "AU"
#id-authors dataset; adjacency part to Edgelist part(pink part in google notes)
source_split <- cSplit(authors, splitCols = "AU", sep = ";", direction = "wide", drop = TRUE)
dim(source_split)
## [1] 70 16
# EID: the id of the paper. EID AS THE FIRST column in decomposed author-id edgelist matrix
#No needed but just in case
# source_split <- data.frame(lapply(source_split, trimws), stringsAsFactors = FALSE)
mat_source_split <- as.matrix(source_split)
#add EID to each publication
combined <- cbind(source$EID, mat_source_split)
dim(combined)
## [1] 70 17
# the matrix include both EID AND AUTHOR ID
mat_combined_eid_source_split <- as.matrix(combined)
#two-mode connection of EID(publication) and AUTHOR ID
edgelist_two_mode <- cbind(mat_combined_eid_source_split[, 1], c(mat_combined_eid_source_split[, -1]))
#remove rows with blank column 2
edgelist_two_mode <- edgelist_two_mode[!is.na(edgelist_two_mode[,2]), ]
dim(edgelist_two_mode)
## [1] 407 2
# 407 connections of articles and authors.
head(edgelist_two_mode)
## [,1] [,2]
## [1,] "2-s2.0-85124618928" "57211550278"
## [2,] "2-s2.0-85010380267" "57192659150"
## [3,] "2-s2.0-85195804072" "59169224100"
## [4,] "2-s2.0-85078064575" "36141884200"
## [5,] "2-s2.0-85066870128" "57070244400"
## [6,] "2-s2.0-85054193257" "57196352348"
g2 <- graph.edgelist(edgelist_two_mode[, 2:1], directed = TRUE)
g2
## IGRAPH 95641b6 DN-- 440 407 --
## + attr: name (v/c)
## + edges from 95641b6 (vertex names):
## [1] 57211550278->2-s2.0-85124618928 57192659150->2-s2.0-85010380267
## [3] 59169224100->2-s2.0-85195804072 36141884200->2-s2.0-85078064575
## [5] 57070244400->2-s2.0-85066870128 57196352348->2-s2.0-85054193257
## [7] 55667386400->2-s2.0-85126246871 36989432800->2-s2.0-85067840050
## [9] 57213151477->2-s2.0-85166593483 36769684300->2-s2.0-85095849567
## [11] 6603295453 ->2-s2.0-85067115477 7103065698 ->2-s2.0-85091775089
## [13] 6602196777 ->2-s2.0-85053861063 6507860353 ->2-s2.0-85079268079
## [15] 55882231500->2-s2.0-85050859465 57184398400->2-s2.0-85077753482
## + ... omitted several edges
g2 <- graph.edgelist(edgelist_two_mode[, 2:1], directed = FALSE)
g2
## IGRAPH 0e5be0f UN-- 440 407 --
## + attr: name (v/c)
## + edges from 0e5be0f (vertex names):
## [1] 57211550278--2-s2.0-85124618928 57192659150--2-s2.0-85010380267
## [3] 59169224100--2-s2.0-85195804072 36141884200--2-s2.0-85078064575
## [5] 57070244400--2-s2.0-85066870128 57196352348--2-s2.0-85054193257
## [7] 55667386400--2-s2.0-85126246871 36989432800--2-s2.0-85067840050
## [9] 57213151477--2-s2.0-85166593483 36769684300--2-s2.0-85095849567
## [11] 6603295453 --2-s2.0-85067115477 7103065698 --2-s2.0-85091775089
## [13] 6602196777 --2-s2.0-85053861063 6507860353 --2-s2.0-85079268079
## [15] 55882231500--2-s2.0-85050859465 57184398400--2-s2.0-85077753482
## + ... omitted several edges
V(g2)$type <- V(g2)$name %in% edgelist_two_mode[ , 2]
table(V(g2)$type)
##
## FALSE TRUE
## 70 370
#network authors and publications.
i<-table(V(g2)$type)[2]
#Transformations to retain actors
mat_g2_incidence <- t(get.incidence(g2))
#form matrix
dim(mat_g2_incidence)
## [1] 370 70
#frequencies of id in the author-article matrix == publication count
#first mode: find connections in a two-mode matrix
dta<-data.frame(id=rownames(mat_g2_incidence), count=rowSums(mat_g2_incidence))
head(dta)
## id count
## 57211550278 57211550278 1
## 57192659150 57192659150 2
## 59169224100 59169224100 1
## 36141884200 36141884200 1
## 57070244400 57070244400 1
## 57196352348 57196352348 1
dta<-dta[order(dta$count, decreasing=T), ]
head(dta)
## id count
## 7004847215 7004847215 3
## 57192659150 57192659150 2
## 7103065698 7103065698 2
## 57194620691 57194620691 2
## 6603834101 6603834101 2
## 57208803309 57208803309 2
#question 4:
dta1<-data.frame(id=colnames(mat_g2_incidence), count=colSums(mat_g2_incidence))
head(dta1)
## id count
## 2-s2.0-85124618928 2-s2.0-85124618928 6
## 2-s2.0-85010380267 2-s2.0-85010380267 3
## 2-s2.0-85195804072 2-s2.0-85195804072 8
## 2-s2.0-85078064575 2-s2.0-85078064575 14
## 2-s2.0-85066870128 2-s2.0-85066870128 5
## 2-s2.0-85054193257 2-s2.0-85054193257 6
dta1<-dta1[order(dta1$count, decreasing=T), ]
head(dta1)
## id count
## 2-s2.0-85091775089 2-s2.0-85091775089 16
## 2-s2.0-85184577079 2-s2.0-85184577079 16
## 2-s2.0-85165091093 2-s2.0-85165091093 15
## 2-s2.0-85078064575 2-s2.0-85078064575 14
## 2-s2.0-85067840050 2-s2.0-85067840050 12
## 2-s2.0-85079268079 2-s2.0-85079268079 11
# second mode: matrix multiplication; 370*370
mat_g2_incidence_to_1 <- mat_g2_incidence%*%t(mat_g2_incidence)
summary(diag(mat_g2_incidence_to_1)) #What does this mean? --summary of frequencies=publication count
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 1.0 1.1 1.0 3.0
diag1mode<-data.frame(id=rownames(mat_g2_incidence_to_1), count=diag(mat_g2_incidence_to_1))
diag1mode<-diag1mode[order(diag1mode$count, decreasing=T), ]
head(diag1mode) #find connections inn order through matrix
## id count
## 7004847215 7004847215 3
## 57192659150 57192659150 2
## 7103065698 7103065698 2
## 57194620691 57194620691 2
## 6603834101 6603834101 2
## 57208803309 57208803309 2
Answer:70
Answer: 407 connections
Answer:
Similarity: they both try to use a two-mode matrix to find the maximum amount. Also, they find authors/publications with a “most” feature—one for most prolific authors and another for most co-authors.
Difference:they focus on different individual units—one for authors and the other for publications. Also, the emphasis on authors means to indicate the authors’ contribution to the field. The emphasis of the publications indicated the necessary expertise, diversity, and collaboration in large-scale research.
Answer:
The reason is that different requirements for including actors are due to the different types of networks in the two deliverables. In deliverable 1, we use the one-mode network, which includes all authors who have co-authorship in at least one paper. Also, we included authors having indirect connections through shared collaborations.
However, we use the two-mode network in the deliverable 3. So, the network only includes authors with valid connections to publications in deliverable 3. Therefore, the network removes indicted connections and only focuses on authors directly to the publications. So, the actor mount is different.
Additionally, in Deliverable 1, we exclude solo authors with solo publications in the one-mode network. However, in Deliverable 3, we include solo authors with both solo-authorship publications and co-authorship publications in the two-mode network. As a result, some solo authors excluded in Deliverable 1 are added back in Deliverable 3 because they also have at least one co-authorship publication. This inclusion of solo authors with co-authorship connections may explain the difference in the count of human actors between Deliverable 1 and 3.