Loading and preliminary analysis of twitter data

Load the files

tw <- read.csv("results.csv")  # load
names(tw)  # column names
##  [1] "word"                    "tweet_id"               
##  [3] "user_id"                 "X_tweet_location"       
##  [5] "Y_tweet_location"        "day"                    
##  [7] "month"                   "year"                   
##  [9] "hour"                    "minutes"                
## [11] "seconds"                 "day_of_week"            
## [13] "X_favoured_location"     "Y_favoured_location"    
## [15] "tweet_count_at_location" "text"
nrow(tw)  # number of rows of data
## [1] 1553
plot(tw$X_tweet_location, tw$Y_tweet_location)  # plot

plot of chunk unnamed-chunk-1

summary(tw$word)  # summary of the words which lead to selection
##    exhibit exhibition    gallery     museum 
##         27        207        352        967

Filtering

This project is about museums, so it is important to estimate the proportion of tweets that are actually about museums in the dataset. To do this let's select 20 tweets at random and identify how many are museum-related manually, by reading them:

tw$text <- as.character(tw$text)
ss <- (strsplit(tw$text, split = "http"))
sapply(ss, "[", 2)[1:10]  # all the urls
##  [1] NA                 "://t.co/MEAOoWW6" "://t.co/69Pt2pnj"
##  [4] NA                 NA                 NA                
##  [7] NA                 NA                 NA                
## [10] NA
sapply(ss, "[", 1)[1:10]  # all the text
##  [1] "@RachGreaux @lawrenx exhibit 1 here_ tarnishing me with the same old brush #AGuyCanChange"                                                   
##  [2] "Remember the case I made the other day of deleting my Facebook because of the absolute dross? Observe_ exhibit two... "                      
##  [3] "Exhibit A : a guy who has just passed me in Leeds. 13 degrees is not shorts and tshirt weather. Crazy fool! "                                
##  [4] "I must exhibit helplessness too well. Woman who changed my coin_ spotted me from a bus stop in front of the car park. Thank you anyway"      
##  [5] "@rache_elizdakin get some kip. Leeds beerfest starts tmoro if your north n really good postwar painters n sculpturs exhibit at Hepworth"     
##  [6] "Had a day out planned for me & Stu this Sat_afternoon looking at new Joan Mir— exhibit at YSP dinner there_then concert in the chapel..."  
##  [7] "@mdhendry Anatomy of an Angel is brilliant. Went to the exhibit a month or two ago."                                                         
##  [8] "I see that life cannot exhibit all to me_ as day cannot_ I see that I am to wait for what will be exhibited by death."                       
##  [9] "@helendaykin @lordlangley73 u want to swop and take stage and we'll have your exhibit #whathaveiagreedto : )"                                
## [10] "#8outof10cats repeats on 4music Sean Lock is hilarious #jedwardbanter  your best chance of getting into Uni is as an exhibit in a jar!  haha"
tw$ttext <- sapply(ss, "[", 1)  # all the text
length(unique(tw$text))/nrow(tw)
## [1] 0.727
length(unique(tw$ttext))/nrow(tw)
## [1] 0.5976
head(unique(tw$ttext))
## [1] "@RachGreaux @lawrenx exhibit 1 here_ tarnishing me with the same old brush #AGuyCanChange"                                                 
## [2] "Remember the case I made the other day of deleting my Facebook because of the absolute dross? Observe_ exhibit two... "                    
## [3] "Exhibit A : a guy who has just passed me in Leeds. 13 degrees is not shorts and tshirt weather. Crazy fool! "                              
## [4] "I must exhibit helplessness too well. Woman who changed my coin_ spotted me from a bus stop in front of the car park. Thank you anyway"    
## [5] "@rache_elizdakin get some kip. Leeds beerfest starts tmoro if your north n really good postwar painters n sculpturs exhibit at Hepworth"   
## [6] "Had a day out planned for me & Stu this Sat_afternoon looking at new Joan Mir— exhibit at YSP dinner there_then concert in the chapel..."
head(which(duplicated(tw$ttext)))
## [1] 18 19 20 21 22 23
du <- which(duplicated(tw$ttext))
length(du)/nrow(tw)
## [1] 0.4024
tw <- tw[-du, ]

From these it seems that 2 are unrelated to museums (entries 17 and 19, one about Amy Winehouse and the other about snooker) and many others are not about visiting museums per se, rather preparing for museum interviews etc. Still, there are an impressive number (roughly half) that seem to be directly related to museum appreciation or visits.

Analysis of frequency of tweets by user

It is important to know how frequently each user tweets, to see if we have enough information for analysing behaviour of single agents (if people tweet many times). If most people only tweet about museums/exhibits once, however, the dataset is far less useful to provide insight into the behaviour of individuals over time, providing only single snapshots of different people.

# analysis most prolific tweeters
summary(tw$user_id)  # summary of tweets
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 1.36e+04 1.96e+07 3.78e+07 1.19e+08 1.77e+08 7.38e+08
tw$user_id <- as.factor(tw$user_id)  # convert id to factor class
prolif1 <- which(tw$user_id == 19825421)
tw$text[prolif1]
##  [1] "OK_ following the 4 nations game on November 5th_ We're off to the Nat. Army museum to see the War Horse exhibition. #lovevisitinglondon"    
##  [2] "Looks like a trip to Huddersfield is coming up soon... http://t.co/p0DxV5p4 #artgeek Good to have a decent exhibition close by."             
##  [3] "@RodRhino hoping to get down to see the new exhibition before it ends."                                                                      
##  [4] "Other 1/2 has taken her father shopping_ before we go to Leeds City Museum. Not been before as I prefer the art gallery. should be nice."    
##  [5] "@chrisandbensmum I came out just after 1st Gulf ended & am now working for Regimental Museum at Duxford."                                    
##  [6] "Tomorrow I will be interviewing former members of my old regiment who served in Aden for the museum. A job I am honored to do. #vikings"     
##  [7] "And if your ever near Cambridge_ pop into the IWM Duxford &amp; look at our regimental museum. We have a French eagle taken in battle!"      
##  [8] "Just catching up with the Elgin Marbles doc from last week on BBC. Saw them for 1st time_ last month at British Museum. Wonderful place."    
##  [9] "Looks like I will be recording the memories of former members of my old regiment for our museum at Duxford in the near future. A real Honor" 
## [10] "@AwesomeCT LMAO. Cant make it to the next show at Electric_ we've just moved & I'm busy with museum work."                                   
## [11] "OK_ loads of writing to do and editing of audio interviews for museum. May be away for some time..."                                         
## [12] "The chance to do something for the #royalanglian museum (http://t.co/A1ijkXbN) that will preserve these memories forever is superb."         
## [13] "@penelopedavo going to Oldham to house sit until thursday with youngest. Planning trip to science &amp; industry museum weds."               
## [14] "In 2 weeks time_ I am privileged to meet with former members of the Royal Anglians & talk about there service in Aden for our museum."       
## [15] "@andybolton @ajhmurray There must be a shoe museum..."                                                                                       
## [16] "Dont have time to hang about on here_ I have interviews to edit for the IWM Land Warfare museum..."                                          
## [17] "@ajhmurray Pop into the excellent Royal Anglian Museum whilst there. As an Archivist there_ I am biased_ but we do a good job..."            
## [18] "@augustharvest Former member of the British Army. Rejoned civvy life 1992. Now work at Regimental museum."                                   
## [19] "As a former Anglian_ to get to chat to old hands for the museum is a real honour. Just hope we do their tales justice."                      
## [20] "What a LOOOOOONG but great day. Loads of interviews to edit for Museum and a few reunions to visit in the next few months."                  
## [21] "@RodRhino I used to be in 6pl_ B Coy_ 1 Royal Anglain. Now got job at Regiment level interviewing lads of all eras for museum."              
## [22] "@ajhmurray go there often to visit my Regimental memorial & museum. !st Bn Royal Anglian. Did you see it? Near Land warfare rooms."          
## [23] "It appears to have worked. I'm getting up feeling almost refreshed. Going to Leeds museum in a bit to see just how Leeds evolved.(it didn't)"
## [24] "I'm at Leeds City Museum (Cookridge Street_ Leeds) [pic]: http://t.co/sHi28qna"                                                              
## [25] "But I do have some very useful interviews to edit for use in the museum."
plot(tw$X_tweet_location[prolif1], tw$Y_tweet_location[prolif1])

plot of chunk unnamed-chunk-3


prolif2 <- which(tw$user_id == 16199914)
tw$text[prolif2]
##  [1] "Brandon Tutthill_ shot by Paul Floyd Blake_ on show at Impression Gallery Personal Be  @ Impressions Gallery http://t.co/0FwJik1b"
##  [2] "Zine fair at Impressions Gallery is ace! Busy as well! Ace!"                                                                      
##  [3] "HIGH FIVE WITH KATSUMA! #awesomesauce  @ National Media Museum http://t.co/8qLUDkaL"                                              
##  [4] "Sky diving  @ National Media Museum http://t.co/TlBBcShr"                                                                         
##  [5] "I'm at National Media Museum for Mission: Impossible - Ghost Protocol: The IMAX Experience http://t.co/4GzXid2g"                  
##  [6] "Daniel Meadows  @ National Media Museum http://t.co/4M3Ogf9l"                                                                     
##  [7] "3D is the future (in the past)  @ National Media Museum http://t.co/BTUJdC35"                                                     
##  [8] "The massed throng awaits #moshimonster Katsumi @mediamuseum!  @ National Media Museum http://t.co/bW1ZAiNJ"                       
##  [9] "Tram stop_ Trolley bus stop_ Bus stop  @ Bradford Industrial Museum http://t.co/SXgVcHB8"                                         
## [10] "Silent Service Jr  @ National Media Museum http://t.co/wxFUckU1"                                                                  
## [11] "The Dark Knight Rises prologue. With FULL BODY SEARCH! (@ National Media Museum) http://t.co/EpzAmaFk"                            
## [12] "Dance Katsumi_ dance!  @ National Media Museum http://t.co/c9v49zrf"                                                              
## [13] "Linotype 78 #phwoar  @ Bradford Industrial Museum http://t.co/E2uE06yi"                                                           
## [14] "Is Pac Man hiding underground?  @ National Media Museum http://t.co/m8lsYHYq"                                                     
## [15] "Furry  @ National Media Museum http://t.co/27FFxxN1"
plot(tw$X_tweet_location[prolif2], tw$Y_tweet_location[prolif2])

plot of chunk unnamed-chunk-3


summary(tw$user_id)  # summary of tweets
##  19825421  16199914     35853  96873013  10621322  19386828  20370009 
##        25        15        14        10         9         8         8 
## 386455399  20375397   1151411 348334015 385517913   7596312  14388335 
##         8         7         6         6         6         5         5 
##  18221877  24222444  37530200  39280228  84302881 193617734 244700139 
##         5         5         5         5         5         5         5 
## 602899508     79553   5677082   6446442  19018058  21218576  22379233 
##         5         4         4         4         4         4         4 
##  23597283  24372980  26526261  37889243  39557708  49633019 162066517 
##         4         4         4         4         4         4         4 
## 371776953 467716929 577004102     50373   8050342  13104712  14123137 
##         4         4         4         3         3         3         3 
##  15602469  16186960  17063725  19552271  19706865  20139698  20224924 
##         3         3         3         3         3         3         3 
##  20612726  21853340  25491483  25986893  27495816  63437896  76656085 
##         3         3         3         3         3         3         3 
## 106713698 127294711 149536085 454306409     64623    795801   1026931 
##         3         3         3         3         2         2         2 
##   5091661   5862532   6526072   6837332   7785312   9454082  11813992 
##         2         2         2         2         2         2         2 
##  12455292  13678922  14118277  14713527  14786188  14868904  14887218 
##         2         2         2         2         2         2         2 
##  15432355  15726425  16192509  18123429  18567792  19015555  19978868 
##         2         2         2         2         2         2         2 
##  20139600  20175413  20229810  20363628  20390884  20424362  20464213 
##         2         2         2         2         2         2         2 
##  21078394  21080647  21086595  21858606  22263434  22377951  22938442 
##         2         2         2         2         2         2         2 
##  25694631   (Other) 
##         2       548
ttable <- as.factor(table(tw$user_id))  # create summary
summary(ttable)  # distribution of tweets
##   1   2   3   4   5   6   7   8   9  10  14  15  25 
## 468  79  22  16  10   3   1   3   1   1   1   1   1
plot(ttable)

plot of chunk unnamed-chunk-3

nrow(tw)/length(unique(tw$user_id))  # average number of tweets/person
## [1] 1.529