Prerequisites

Intro

Below are the files we obtained using mapreduce and hadoop. It was enough to have one main mapper and reducer. We focused on sent and recieved emails over the period May 1999 and July 2001. Enron declared bankruptcy in December 2001 and the scandal started in November. We would like to observe the Enron Email Network up to the point where the internal community of Enron started suffering from fraudulent practices.

Look at the data

FALSE [1] "conns-inbox.txt"       "inbox-emails-long.txt" "n-conns-inbox.txt"    
FALSE [4] "sent-emails-long.txt"
FALSE                                             sender                date
FALSE 1                  elizabeth sager sara shackleton 1998-10-30 07:56:00
FALSE 2                                   alison smythe  1998-10-30 08:02:00
FALSE 3 brent hendry at enron_development@ccmail @ enron 1998-10-30 09:06:00
FALSE 4                                janette elbertson 1998-11-13 04:07:00
FALSE 5                                       tana jones 1998-11-13 08:09:00
FALSE 6                                        per sekse 1998-11-13 14:57:52
FALSE                  reciever email_id
FALSE 1 mark - ect legal taylor    39502
FALSE 2 mark - ect legal taylor    39587
FALSE 3 mark - ect legal taylor    39870
FALSE 4             mark taylor    39094
FALSE 5             mark taylor    39182
FALSE 6 mark - ect legal taylor    39183
FALSE     sender              date             reciever        
FALSE  Length:40865       Length:40865       Length:40865      
FALSE  Class :character   Class :character   Class :character  
FALSE  Mode  :character   Mode  :character   Mode  :character  
FALSE    email_id        
FALSE  Length:40865      
FALSE  Class :character  
FALSE  Mode  :character

Interactive graphs

Dynamic social networks

FALSE Edge activity in base.net was ignored
FALSE Created net.obs.period to describe network
FALSE  Network observation period info:
FALSE   Number of observation spells: 1 
FALSE   Maximal time range observed: 0 until 27 
FALSE   Temporal mode: continuous 
FALSE   Time unit: unknown 
FALSE   Suggested time increment: NA

Enron network in strips

Enron Network over time

The are several ways to display a social network over time. In this instance we keep the edges until the end instead of terminating tham if there was no contact in a given period.

It is evident that the Enron network became hiighly grouped as its aproached the scandal point closer and closer. Below is the graph of the number of new edges that appeared in a given period.

It would be helpful to know the number of cliques (groups) in the network. Unfortunately, here is what the documentation for the R package sna says.

The computational cost of calculating cliques grows very sharply in size and network density. It is possible that the expected completion time for your calculation may exceed your life expectancy (and those of subsequent generations).