As a first step, we came up with a stategy for working on the assignment together. First, we picked the 3 datasets we were interested in analyzing. Then we each picked one dataset to take a lead on after discussing some basic strategies. We worked on those individually, collaborating and reaching out to each other with questions. That gave us an opportunity to perfect our data wrangling and collaboration skils. Then we worked on a 3rd dataset together.

#install.packages("tidyverse")
#install.packages("dplyr")
require(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
require(tidyr)

The source of the data that I am analyzing is located in the following location: http://bit.ly/tardistravels. The reference to this dataset was found in the following article: https://www.theguardian.com/news/datablog/2010/aug/20/doctor-who-time-travel-information-is-beautiful.

This dataset includes details of time travel dates on the TV Show Doctor Who. The classic era of the show has been airing since 1963 to 1987. The new era has started in 2005. This data includes episodes through 2011 only. This is data for 11 incarnations of the Doctor. I am going to compare time travel span between classis and new era episodes. My theory is that due to the technology limitations at the time the classis episodes were aired there will be limited time travel. I will also determine what’s the average number of years each doctor gets to travel through during their tenure and identify which Doctor gets to time travel the most/least?

Step 1. Reading in the data:

DWData <- read.csv(file="https://raw.githubusercontent.com/che10vek/Data-607-Assignments/master/Copy%20of%20Doctor%20Who%20Time%20Travel%20Journeys%20-%20Copy%20of%20DM_mastersheet%201.csv", header=TRUE, sep=",")
head(DWData)
##   Doctor.Who.season doctor.actor    ep..no           episode.title  from
## 1                10      Tennant xmas 2006       the Runaway Bride  2007
## 2                 1     Hartnell         1      An Unearthly Child  1963
## 3                 1     Hartnell        21 The Daleks' Master Plan -2500
## 4                 3      Pertwee        64        The Time Monster  1972
## 5                 1     Hartnell        20         The Myth Makers -1200
## 6                 1     Hartnell        12              The Romans    64
##   estimated.from          to estimated.          planet sub.location
## 1             NA -5000000000                      Earth      England
## 2             NA     -100000                                     n/a
## 3           4000       -2500                      Earth             
## 4           2900       -2000            Earth/ Atlantis             
## 5           3999       -1200          y           Earth             
## 6           2493          64                      Earth        Italy
##          location
## 1          London
## 2             n/a
## 3           Egypt
## 4                
## 5 Asia Minor/Troy
## 6            Rome
##                                                                 notes
## 1                                                                    
## 2                                                                    
## 3                                                                    
## 4  'ancient' Atlantis. Krasis is brought forward in time from 2000 BC
## 5                                                                    
## 6                                                                    
##                                            source X X.1 X.2 X.3 X.4 X.5
## 1             http://who-transcripts.atspace.com/        NA  NA  NA  NA
## 2 http://tardis.wikia.com/wiki/An_Unearthly_Child        NA  NA  NA  NA
## 3       http://tardis.wikia.com/wiki/Season_Three        NA  NA  NA  NA
## 4             http://tardis.wikia.com/wiki/Krasis        NA  NA  NA  NA
## 5       http://tardis.wikia.com/wiki/Season_Three        NA  NA  NA  NA
## 6         http://tardis.wikia.com/wiki/Season_Two        NA  NA  NA  NA
##   X.6
## 1  NA
## 2  NA
## 3  NA
## 4  NA
## 5  NA
## 6  NA

Step 2 Cleaning up the data and preparing it for analysis

Since none of my questions apply to location information - I am going to get rid of that data and keep the time travel details only.

DWTime <- subset(DWData, select=Doctor.Who.season:to)
head(DWTime)
##   Doctor.Who.season doctor.actor    ep..no           episode.title  from
## 1                10      Tennant xmas 2006       the Runaway Bride  2007
## 2                 1     Hartnell         1      An Unearthly Child  1963
## 3                 1     Hartnell        21 The Daleks' Master Plan -2500
## 4                 3      Pertwee        64        The Time Monster  1972
## 5                 1     Hartnell        20         The Myth Makers -1200
## 6                 1     Hartnell        12              The Romans    64
##   estimated.from          to
## 1             NA -5000000000
## 2             NA     -100000
## 3           4000       -2500
## 4           2900       -2000
## 5           3999       -1200
## 6           2493          64

After visually inspecing the data, I am going to assume that “Estimated From” is a more accurate variable for measuring the beginning of time travel journey. The original “From” column seems to include same data as “to” column if the beginning time is not explicitely stated. I am going to replace the data in “From” column with “Estimated From” since that seems to provide a more accurate picture.

DWTime$estimated.from[is.na(DWTime$estimated.from)] <- as.character(DWTime$from[is.na(DWTime$estimated.from)])

#the code below removes "," which are present in certain fields.
DWTime[,'estimated.from'] <- gsub(",","",DWTime[,'estimated.from'])
head(DWTime)
##   Doctor.Who.season doctor.actor    ep..no           episode.title  from
## 1                10      Tennant xmas 2006       the Runaway Bride  2007
## 2                 1     Hartnell         1      An Unearthly Child  1963
## 3                 1     Hartnell        21 The Daleks' Master Plan -2500
## 4                 3      Pertwee        64        The Time Monster  1972
## 5                 1     Hartnell        20         The Myth Makers -1200
## 6                 1     Hartnell        12              The Romans    64
##   estimated.from          to
## 1           2007 -5000000000
## 2           1963     -100000
## 3           4000       -2500
## 4           2900       -2000
## 5           3999       -1200
## 6           2493          64

Now we can remove “from” variable

DWTime$from <- NULL
head(DWTime)
##   Doctor.Who.season doctor.actor    ep..no           episode.title
## 1                10      Tennant xmas 2006       the Runaway Bride
## 2                 1     Hartnell         1      An Unearthly Child
## 3                 1     Hartnell        21 The Daleks' Master Plan
## 4                 3      Pertwee        64        The Time Monster
## 5                 1     Hartnell        20         The Myth Makers
## 6                 1     Hartnell        12              The Romans
##   estimated.from          to
## 1           2007 -5000000000
## 2           1963     -100000
## 3           4000       -2500
## 4           2900       -2000
## 5           3999       -1200
## 6           2493          64

Let’s calculate time travel span.

DWTime$Span = (as.numeric(as.character(DWTime$to))) - as.numeric(as.character(DWTime$estimated.from))
## Warning: NAs introduced by coercion
#since we don't care where the travel is back in time or forward in time - we are taking absolute values

DWTime$Span<-abs(DWTime$Span)

head(DWTime)
##   Doctor.Who.season doctor.actor    ep..no           episode.title
## 1                10      Tennant xmas 2006       the Runaway Bride
## 2                 1     Hartnell         1      An Unearthly Child
## 3                 1     Hartnell        21 The Daleks' Master Plan
## 4                 3      Pertwee        64        The Time Monster
## 5                 1     Hartnell        20         The Myth Makers
## 6                 1     Hartnell        12              The Romans
##   estimated.from          to       Span
## 1           2007 -5000000000 5000002007
## 2           1963     -100000     101963
## 3           4000       -2500       6500
## 4           2900       -2000       4900
## 5           3999       -1200       5199
## 6           2493          64       2429

Now let’s make out dataset Wide.

DWWide<-spread(DWTime, doctor.actor, Span)
head(DWWide)
##   Doctor.Who.season ep..no               episode.title estimated.from
## 1                                                                2007
## 2                                                                2007
## 3                      164             The Empty Child           2005
## 4                 1      1          An Unearthly Child           1963
## 5                 1     10 The Dalek Invasion of Earth           1963
## 6                 1     12                  The Romans           2493
##        to V1 Baker Colin Baker Davison Eccleston Hartnell McCoy McGann
## 1    1969 NA    NA          NA      NA        NA       NA    NA     NA
## 2    2008 NA    NA          NA      NA        NA       NA    NA     NA
## 3    1941 NA    NA          NA      NA        64       NA    NA     NA
## 4 -100000 NA    NA          NA      NA        NA   101963    NA     NA
## 5    2164 NA    NA          NA      NA        NA      201    NA     NA
## 6      64 NA    NA          NA      NA        NA     2429    NA     NA
##   Pertwee Smith Tennant Troughton
## 1      NA    NA      38        NA
## 2      NA    NA       1        NA
## 3      NA    NA      NA        NA
## 4      NA    NA      NA        NA
## 5      NA    NA      NA        NA
## 6      NA    NA      NA        NA
DWWide$V1 <- NULL

# I noticed that the cell below has an empty value and I am adding the correct number just because I know what it is. If this error was not visible I would not be able to make this correction and it would just be excluded from calculation. 

DWWide[3, 1] = 9

Now I am removing some blank values.

DWWide <- subset(DWWide, DWWide$episode.title != "") 
head(DWWide)
##   Doctor.Who.season ep..no               episode.title estimated.from
## 3                 9    164             The Empty Child           2005
## 4                 1      1          An Unearthly Child           1963
## 5                 1     10 The Dalek Invasion of Earth           1963
## 6                 1     12                  The Romans           2493
## 7                 1     14                 The Crusade           1965
## 8                 1     16                   The Chase           1872
##        to Baker Colin Baker Davison Eccleston Hartnell McCoy McGann
## 3    1941    NA          NA      NA        64       NA    NA     NA
## 4 -100000    NA          NA      NA        NA   101963    NA     NA
## 5    2164    NA          NA      NA        NA      201    NA     NA
## 6      64    NA          NA      NA        NA     2429    NA     NA
## 7    1200    NA          NA      NA        NA      765    NA     NA
## 8    1996    NA          NA      NA        NA      124    NA     NA
##   Pertwee Smith Tennant Troughton
## 3      NA    NA      NA        NA
## 4      NA    NA      NA        NA
## 5      NA    NA      NA        NA
## 6      NA    NA      NA        NA
## 7      NA    NA      NA        NA
## 8      NA    NA      NA        NA

Step 3. Analyzing results.

From the summary below we can see glimpses of a lot of information, things like most frequently visited year (2010), which seasons have the most time travel (10,11), which episodes have the most time jumps.

summary(DWWide)
##  Doctor.Who.season     ep..no                      episode.title
##  11     :27        12     :  6   The Big Bang             :  4  
##  10     :25        1      :  5   the Girl in the Fireplace:  4  
##  1      :21        13     :  5   The Pandorica Opens      :  4  
##  5      :12        204    :  4   The Daleks' Master Plan  :  3  
##  2      :11        Special:  4   The Eleventh Hour        :  3  
##  3      : 9        21     :  3   The Chase                :  2  
##  (Other):27        (Other):105   (Other)                  :112  
##  estimated.from           to          Baker            Colin Baker   
##  Length:132         2010   :  8   Min.   :6.900e+01   Min.   :  0.0  
##  Class :character   2008   :  5   1st Qu.:1.140e+02   1st Qu.: 50.0  
##  Mode  :character   1983   :  4   Median :4.880e+02   Median :100.0  
##                     1941   :  3   Mean   :5.714e+11   Mean   :215.0  
##                     1966   :  3   3rd Qu.:1.914e+04   3rd Qu.:322.5  
##                     1970   :  3   Max.   :4.000e+12   Max.   :545.0  
##                     (Other):106   NA's   :125         NA's   :129    
##     Davison            Eccleston           Hartnell        
##  Min.   :5.000e+00   Min.   :1.00e+00   Min.   :4.500e+01  
##  1st Qu.:2.200e+02   1st Qu.:1.50e+01   1st Qu.:1.690e+02  
##  Median :8.420e+02   Median :9.30e+01   Median :8.990e+02  
##  Mean   :2.296e+09   Mean   :1.25e+09   Mean   :6.195e+08  
##  3rd Qu.:4.250e+07   3rd Qu.:1.25e+09   3rd Qu.:2.933e+03  
##  Max.   :1.370e+10   Max.   :5.00e+09   Max.   :1.300e+10  
##  NA's   :120         NA's   :124        NA's   :111        
##      McCoy            McGann       Pertwee         Smith     
##  Min.   :  40.0   Min.   :12    Min.   :  20   Min.   :   0  
##  1st Qu.:  86.0   1st Qu.:12    1st Qu.: 128   1st Qu.:  11  
##  Median : 252.5   Median :12    Median : 567   Median : 271  
##  Mean   : 921.5   Mean   :12    Mean   :1298   Mean   :1313  
##  3rd Qu.:1581.0   3rd Qu.:12    3rd Qu.: 774   3rd Qu.:2990  
##  Max.   :2759.0   Max.   :12    Max.   :4900   Max.   :5043  
##  NA's   :124      NA's   :131   NA's   :123    NA's   :105   
##     Tennant            Troughton    
##  Min.   :0.000e+00   Min.   :  3.0  
##  1st Qu.:1.500e+01   1st Qu.: 38.0  
##  Median :1.110e+02   Median : 69.0  
##  Mean   :4.167e+12   Mean   :100.6  
##  3rd Qu.:4.800e+02   3rd Qu.:162.0  
##  Max.   :1.000e+14   Max.   :240.0  
##  NA's   :108         NA's   :121

Let’s compare average travel years for classic Era

#Classis Era Average
mean(DWTime$Span[DWTime$Doctor.Who.season ==1 | DWTime$Doctor.Who.season ==2 | DWTime$Doctor.Who.season ==3 | DWTime$Doctor.Who.season ==4 | DWTime$Doctor.Who.season ==5 | DWTime$Doctor.Who.season ==6 | DWTime$Doctor.Who.season == 7 |DWTime$Doctor.Who.season ==8], na.rm = TRUE)
## [1] 56118891661
#New Era Average
mean(DWTime$Span[DWTime$Doctor.Who.season ==9 | DWTime$Doctor.Who.season ==10 | DWTime$Doctor.Who.season ==11], na.rm = TRUE)
## [1] 1.724483e+12
#Classis Era Sum
sum(DWTime$Span[DWTime$Doctor.Who.season ==1 | DWTime$Doctor.Who.season ==2 | DWTime$Doctor.Who.season ==3 | DWTime$Doctor.Who.season ==4 | DWTime$Doctor.Who.season ==5 | DWTime$Doctor.Who.season ==6 | DWTime$Doctor.Who.season == 7 |DWTime$Doctor.Who.season ==8], na.rm = TRUE)
## [1] 4.04056e+12
#New Era Sum
sum(DWTime$Span[DWTime$Doctor.Who.season ==9 | DWTime$Doctor.Who.season ==10 | DWTime$Doctor.Who.season ==11], na.rm = TRUE)
## [1] 1.0002e+14

My initial assumption was correct - New Era has more time travel than Classic Episodes.

Now, we can see which incarnations of the Doctor traveled in time the most.

DWDoctors<-DWWide %>%
  select ("Baker", "Colin Baker", "Davison", "Hartnell", "McCoy", "McGann", "Pertwee", "Troughton", "Eccleston", "Smith", "Tennant")

#Average
sort(apply(DWDoctors, 2, mean, na.rm=TRUE), decreasing = TRUE)
##      Tennant        Baker      Davison    Eccleston     Hartnell 
## 4.167083e+12 5.714286e+11 2.295834e+09 1.250000e+09 6.195299e+08 
##        Smith      Pertwee        McCoy  Colin Baker    Troughton 
## 1.312963e+03 1.297667e+03 9.215000e+02 2.150000e+02 1.006364e+02 
##       McGann 
## 1.200000e+01
#Sum of all time travel years
sort(apply(DWDoctors, 2, sum, na.rm=TRUE), decreasing = TRUE)
##      Tennant        Baker      Davison     Hartnell    Eccleston 
## 1.000100e+14 4.000000e+12 2.755001e+10 1.301013e+10 9.999996e+09 
##        Smith      Pertwee        McCoy    Troughton  Colin Baker 
## 3.545000e+04 1.167900e+04 7.372000e+03 1.107000e+03 6.450000e+02 
##       McGann 
## 1.200000e+01

1st Place: Tennant (New Era), 2nd Place: Baker (Classic Episodes), 3rd Place: Davidson(Classic Episodes)

DWClassic<-DWWide %>%
  select ("Baker", "Colin Baker", "Davison", "Hartnell", "McCoy", "McGann", "Pertwee", "Troughton")

DWNew<-DWWide %>%
  select ("Eccleston", "Smith", "Tennant")