Data Cleanse…

As in any data problem, the most important aspect is making sure our data is completely clean of noise.

We will:

  1. load our data into a spreadsheet called df. df is a Dataframe with the realationships between all nodes. Secondly, we will load the attributes for these nodes (found as a separate sheet within the file), we will call these attributes: attrs. Lastly, we will declare column 1 as the index by dropping column 0.

  2. Network Adjacency Matrix: the network adjecency matrix is just a correlation matrix with boolean values that express a relationship between two nodes. We will call this nwmat (just a simple matrix of df). These will be expressed as numeric values which is erroneous, we will convert them to integers with strtoi().

  3. We now have a correlation matrix that uses the names of the students in the dataset as declared by colnames(df)/rownames(df). We will remove NAs with an ifelse() statement. Finally, we will benchmark nwmat relationships, so that only nodes with 3 or more relationships are represented in our dataframe as TRUE.

## STEP 1:
# read sheet 1 as df
df = read_excel(filename, sheet='Sheet1')
# read attributes for every entry in excel (sheet2)
attrs = read_excel(filename, sheet = 'Sheet2')
# drop first column
df = df[-c(1)]

## STEP 2:
# make network adjacency matrix
nwmat = as.matrix(df)
n = dim(nwmat) #dimensions of the data
# convert to integers
nwvalues = strtoi(nwmat)

## STEP 3:
# create a new matrix
colnames(nwmat) = colnames(df)
rownames(nwmat) = colnames(df)

# Replace NAs with 0
nwmat = ifelse(is.na(nwmat),0,nwmat)
#dichotomize the adjacency matrix
nwmat = ifelse(nwmat>=3, 1,0)

head(nwmat, 1)
##                Alate Abhishek An Siye Chen Kewei Ding Yuzhou Gao Peipei
## Alate Abhishek              0       0          0           0          0
##                Hu Yucong Huang Ansheng Huang Hao Ling Hailey Liu Carl
## Alate Abhishek         0             0         0           0        0
##                Liu Lingxuan Nguyen Jennifer Ni Hao Peng Haoyun Shi Boyu
## Alate Abhishek            0               0      0           0        0
##                Song Ci Sun Yizhu Tan Lechen Taubes Terrance Wang Elianna
## Alate Abhishek       0         0          0               0            0
##                Wang Yongheng Xu Fengshu Xu Bella Yaghmmour Meriem
## Alate Abhishek             0          0        0                0
##                Yan Xinyu Yang Mingchuan Ye Jacob You Guandong Zhao Yuyang
## Alate Abhishek         0              0        0            0           0
##                Zheng Yue Zhou Gabrielle Zhu Qianying
## Alate Abhishek         0              0            0
# nwmat is our 'clean' dataframe

Plotting!

Now that we have the network information, we are able to visualize this badboy.

We will use the igraph library loaded in the dependencies “chunk” that is hidden at the beginning of the r markdown file.

We will: 1. create a network object that reads nwmat. 2. we actually visualize the network using plot() 3. it’s important to remember that every node in the network we have has hidden values, we are able to call those hidden values and make changes to the plot based on them! So we’ll call the sex and degree values from those nodes. 4. we can further see iterations of the same plot below, names removed with vertex.label=’’ for clearer visualizations.

#1. create a network object
g = graph.adjacency(nwmat, mode='undirected')
#2. plot the network
plot(g, layout=layout.fruchterman.reingold)

#3. add the attributes
V(g)$sex = as.character(attrs$Sex)
V(g)$degree = as.character(attrs$Degree)
#4. plot based on those attributes
V(g)$color = ifelse(V(g)$sex == "Male", "lightblue", ifelse(V(g)$sex == "Female", "tomato", "white"))
#iteration i
plot(g, layout=layout_nicely)

#iteration ii
plot(g, layout=layout.fruchterman.reingold,vertex.size=10,vertex.label="")

#iteration iii
plot(g,layout=layout.fruchterman.reingold, vertex.size=10, vertex.label="")