tw <- read.csv("results.csv") # load
names(tw) # column names
## [1] "word" "tweet_id"
## [3] "user_id" "X_tweet_location"
## [5] "Y_tweet_location" "day"
## [7] "month" "year"
## [9] "hour" "minutes"
## [11] "seconds" "day_of_week"
## [13] "X_favoured_location" "Y_favoured_location"
## [15] "tweet_count_at_location" "text"
nrow(tw) # number of rows of data
## [1] 1553
plot(tw$X_tweet_location, tw$Y_tweet_location) # plot
summary(tw$word) # summary of the words which lead to selection
## exhibit exhibition gallery museum
## 27 207 352 967
This project is about museums, so it is important to estimate the proportion of tweets that are actually about museums in the dataset. To do this let's select 20 tweets at random and identify how many are museum-related manually, by reading them:
tw$text <- as.character(tw$text)
ss <- (strsplit(tw$text, split = "http"))
sapply(ss, "[", 2)[1:10] # all the urls
## [1] NA "://t.co/MEAOoWW6" "://t.co/69Pt2pnj"
## [4] NA NA NA
## [7] NA NA NA
## [10] NA
sapply(ss, "[", 1)[1:10] # all the text
## [1] "@RachGreaux @lawrenx exhibit 1 here_ tarnishing me with the same old brush #AGuyCanChange"
## [2] "Remember the case I made the other day of deleting my Facebook because of the absolute dross? Observe_ exhibit two... "
## [3] "Exhibit A : a guy who has just passed me in Leeds. 13 degrees is not shorts and tshirt weather. Crazy fool! "
## [4] "I must exhibit helplessness too well. Woman who changed my coin_ spotted me from a bus stop in front of the car park. Thank you anyway"
## [5] "@rache_elizdakin get some kip. Leeds beerfest starts tmoro if your north n really good postwar painters n sculpturs exhibit at Hepworth"
## [6] "Had a day out planned for me & Stu this Sat_afternoon looking at new Joan Mir— exhibit at YSP dinner there_then concert in the chapel..."
## [7] "@mdhendry Anatomy of an Angel is brilliant. Went to the exhibit a month or two ago."
## [8] "I see that life cannot exhibit all to me_ as day cannot_ I see that I am to wait for what will be exhibited by death."
## [9] "@helendaykin @lordlangley73 u want to swop and take stage and we'll have your exhibit #whathaveiagreedto : )"
## [10] "#8outof10cats repeats on 4music Sean Lock is hilarious #jedwardbanter your best chance of getting into Uni is as an exhibit in a jar! haha"
tw$ttext <- sapply(ss, "[", 1) # all the text
length(unique(tw$text))/nrow(tw)
## [1] 0.727
length(unique(tw$ttext))/nrow(tw)
## [1] 0.5976
head(unique(tw$ttext))
## [1] "@RachGreaux @lawrenx exhibit 1 here_ tarnishing me with the same old brush #AGuyCanChange"
## [2] "Remember the case I made the other day of deleting my Facebook because of the absolute dross? Observe_ exhibit two... "
## [3] "Exhibit A : a guy who has just passed me in Leeds. 13 degrees is not shorts and tshirt weather. Crazy fool! "
## [4] "I must exhibit helplessness too well. Woman who changed my coin_ spotted me from a bus stop in front of the car park. Thank you anyway"
## [5] "@rache_elizdakin get some kip. Leeds beerfest starts tmoro if your north n really good postwar painters n sculpturs exhibit at Hepworth"
## [6] "Had a day out planned for me & Stu this Sat_afternoon looking at new Joan Mir— exhibit at YSP dinner there_then concert in the chapel..."
head(which(duplicated(tw$ttext)))
## [1] 18 19 20 21 22 23
du <- which(duplicated(tw$ttext))
length(du)/nrow(tw)
## [1] 0.4024
tw <- tw[-du, ]
From these it seems that 2 are unrelated to museums (entries 17 and 19, one about Amy Winehouse and the other about snooker) and many others are not about visiting museums per se, rather preparing for museum interviews etc. Still, there are an impressive number (roughly half) that seem to be directly related to museum appreciation or visits.
It is important to know how frequently each user tweets, to see if we have enough information for analysing behaviour of single agents (if people tweet many times). If most people only tweet about museums/exhibits once, however, the dataset is far less useful to provide insight into the behaviour of individuals over time, providing only single snapshots of different people.
# analysis most prolific tweeters
summary(tw$user_id) # summary of tweets
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.36e+04 1.96e+07 3.78e+07 1.19e+08 1.77e+08 7.38e+08
tw$user_id <- as.factor(tw$user_id) # convert id to factor class
prolif1 <- which(tw$user_id == 19825421)
tw$text[prolif1]
## [1] "OK_ following the 4 nations game on November 5th_ We're off to the Nat. Army museum to see the War Horse exhibition. #lovevisitinglondon"
## [2] "Looks like a trip to Huddersfield is coming up soon... http://t.co/p0DxV5p4 #artgeek Good to have a decent exhibition close by."
## [3] "@RodRhino hoping to get down to see the new exhibition before it ends."
## [4] "Other 1/2 has taken her father shopping_ before we go to Leeds City Museum. Not been before as I prefer the art gallery. should be nice."
## [5] "@chrisandbensmum I came out just after 1st Gulf ended & am now working for Regimental Museum at Duxford."
## [6] "Tomorrow I will be interviewing former members of my old regiment who served in Aden for the museum. A job I am honored to do. #vikings"
## [7] "And if your ever near Cambridge_ pop into the IWM Duxford & look at our regimental museum. We have a French eagle taken in battle!"
## [8] "Just catching up with the Elgin Marbles doc from last week on BBC. Saw them for 1st time_ last month at British Museum. Wonderful place."
## [9] "Looks like I will be recording the memories of former members of my old regiment for our museum at Duxford in the near future. A real Honor"
## [10] "@AwesomeCT LMAO. Cant make it to the next show at Electric_ we've just moved & I'm busy with museum work."
## [11] "OK_ loads of writing to do and editing of audio interviews for museum. May be away for some time..."
## [12] "The chance to do something for the #royalanglian museum (http://t.co/A1ijkXbN) that will preserve these memories forever is superb."
## [13] "@penelopedavo going to Oldham to house sit until thursday with youngest. Planning trip to science & industry museum weds."
## [14] "In 2 weeks time_ I am privileged to meet with former members of the Royal Anglians & talk about there service in Aden for our museum."
## [15] "@andybolton @ajhmurray There must be a shoe museum..."
## [16] "Dont have time to hang about on here_ I have interviews to edit for the IWM Land Warfare museum..."
## [17] "@ajhmurray Pop into the excellent Royal Anglian Museum whilst there. As an Archivist there_ I am biased_ but we do a good job..."
## [18] "@augustharvest Former member of the British Army. Rejoned civvy life 1992. Now work at Regimental museum."
## [19] "As a former Anglian_ to get to chat to old hands for the museum is a real honour. Just hope we do their tales justice."
## [20] "What a LOOOOOONG but great day. Loads of interviews to edit for Museum and a few reunions to visit in the next few months."
## [21] "@RodRhino I used to be in 6pl_ B Coy_ 1 Royal Anglain. Now got job at Regiment level interviewing lads of all eras for museum."
## [22] "@ajhmurray go there often to visit my Regimental memorial & museum. !st Bn Royal Anglian. Did you see it? Near Land warfare rooms."
## [23] "It appears to have worked. I'm getting up feeling almost refreshed. Going to Leeds museum in a bit to see just how Leeds evolved.(it didn't)"
## [24] "I'm at Leeds City Museum (Cookridge Street_ Leeds) [pic]: http://t.co/sHi28qna"
## [25] "But I do have some very useful interviews to edit for use in the museum."
plot(tw$X_tweet_location[prolif1], tw$Y_tweet_location[prolif1])
prolif2 <- which(tw$user_id == 16199914)
tw$text[prolif2]
## [1] "Brandon Tutthill_ shot by Paul Floyd Blake_ on show at Impression Gallery Personal Be @ Impressions Gallery http://t.co/0FwJik1b"
## [2] "Zine fair at Impressions Gallery is ace! Busy as well! Ace!"
## [3] "HIGH FIVE WITH KATSUMA! #awesomesauce @ National Media Museum http://t.co/8qLUDkaL"
## [4] "Sky diving @ National Media Museum http://t.co/TlBBcShr"
## [5] "I'm at National Media Museum for Mission: Impossible - Ghost Protocol: The IMAX Experience http://t.co/4GzXid2g"
## [6] "Daniel Meadows @ National Media Museum http://t.co/4M3Ogf9l"
## [7] "3D is the future (in the past) @ National Media Museum http://t.co/BTUJdC35"
## [8] "The massed throng awaits #moshimonster Katsumi @mediamuseum! @ National Media Museum http://t.co/bW1ZAiNJ"
## [9] "Tram stop_ Trolley bus stop_ Bus stop @ Bradford Industrial Museum http://t.co/SXgVcHB8"
## [10] "Silent Service Jr @ National Media Museum http://t.co/wxFUckU1"
## [11] "The Dark Knight Rises prologue. With FULL BODY SEARCH! (@ National Media Museum) http://t.co/EpzAmaFk"
## [12] "Dance Katsumi_ dance! @ National Media Museum http://t.co/c9v49zrf"
## [13] "Linotype 78 #phwoar @ Bradford Industrial Museum http://t.co/E2uE06yi"
## [14] "Is Pac Man hiding underground? @ National Media Museum http://t.co/m8lsYHYq"
## [15] "Furry @ National Media Museum http://t.co/27FFxxN1"
plot(tw$X_tweet_location[prolif2], tw$Y_tweet_location[prolif2])
summary(tw$user_id) # summary of tweets
## 19825421 16199914 35853 96873013 10621322 19386828 20370009
## 25 15 14 10 9 8 8
## 386455399 20375397 1151411 348334015 385517913 7596312 14388335
## 8 7 6 6 6 5 5
## 18221877 24222444 37530200 39280228 84302881 193617734 244700139
## 5 5 5 5 5 5 5
## 602899508 79553 5677082 6446442 19018058 21218576 22379233
## 5 4 4 4 4 4 4
## 23597283 24372980 26526261 37889243 39557708 49633019 162066517
## 4 4 4 4 4 4 4
## 371776953 467716929 577004102 50373 8050342 13104712 14123137
## 4 4 4 3 3 3 3
## 15602469 16186960 17063725 19552271 19706865 20139698 20224924
## 3 3 3 3 3 3 3
## 20612726 21853340 25491483 25986893 27495816 63437896 76656085
## 3 3 3 3 3 3 3
## 106713698 127294711 149536085 454306409 64623 795801 1026931
## 3 3 3 3 2 2 2
## 5091661 5862532 6526072 6837332 7785312 9454082 11813992
## 2 2 2 2 2 2 2
## 12455292 13678922 14118277 14713527 14786188 14868904 14887218
## 2 2 2 2 2 2 2
## 15432355 15726425 16192509 18123429 18567792 19015555 19978868
## 2 2 2 2 2 2 2
## 20139600 20175413 20229810 20363628 20390884 20424362 20464213
## 2 2 2 2 2 2 2
## 21078394 21080647 21086595 21858606 22263434 22377951 22938442
## 2 2 2 2 2 2 2
## 25694631 (Other)
## 2 548
ttable <- as.factor(table(tw$user_id)) # create summary
summary(ttable) # distribution of tweets
## 1 2 3 4 5 6 7 8 9 10 14 15 25
## 468 79 22 16 10 3 1 3 1 1 1 1 1
plot(ttable)
nrow(tw)/length(unique(tw$user_id)) # average number of tweets/person
## [1] 1.529