By Helge Peters (Oxford) and Nathanael Sheehan (UCL)

Part 1: Different actors problematise water in different ways

Social studies of environmental controversies suggest that different actors problematise ‘water’ in distinct ways, which likely reflect their organisational purpose, priorities, constraints, accountabilities, audiences etc Such distinct problematisations are liable to contribute to water management as a ‘wicked problem’ in which multiple ontologies of water are at stake. We propose to use text clustering in order to identify these distinct ontologies/problematisations of water and subsequently measure the extent to which these differ between actor categories.

Research question 1) How can natural language processing from tweets identify latent topics regarding water?

Research question 2) Can we measure the proximity/distance between these topics and map these onto actor categories?

Data collection

In order to understand how different actors speak about water on twitter, we needed to mine twitter data from the relevent actors. A list of 170 actors was produced and were labeled under six categories Activist, Charity, Industry, Government, Journalist and Partnership’s dependent on their affiliations.

To perform the analysis, a number of R libraries were used. In order to access and mine twitter data, we used the rtweet package. Collected data was cleaned using tm, stringr and qdapRegex. For data merging, formatting and munging we used the purrr, dplyr and readr packages.

library(httpuv)
library(rtweet)       
library(tm)            
library(stringr)       
library(qdapRegex)    
library(wordcloud2)    
library(purrr)
library(dplyr)
library(plyr)
library(readr)

Authetication to the twitter API was generated using our API keys and tokens. The rtweet package makes this easy through the create_token and get_token functions.

##Set up twitter auth - these are not our actual keys :p
api_key <- "hjksfdhkjeh342324"
api_secret_key <- "jh342kjhkjfhdkjh2jk3h4kjbfdskjhkjh"
access_token <- "123u098f90ds8fshjk234lnhkljfd"
access_secret <- "fdsjkhakjfh324fkjhkj3riuhi9"


token <- create_token(
  app = "Camellia Research",
  consumer_key = api_key,
  consumer_secret = api_secret_key,
  access_token = access_token,
  access_secret = access_secret)

get_token()

Once connection was secure, a function was writted to mine twitter data. The function mineTweets takes the string username as a variable to mine the relevent account. A maxiumum of 1000 tweets were mined per account (not including retweets) and a retryonratelimit was set to true in order to ensure we sucessfully mined everyones tweets. Mined tweets were subsetted to only include the twitter and the tweet. Each accounts tweets were the exported as a CSV.

mineTweets <- function(username){
  #Mine tweets
  tweets = get_timelines(username, n = 1000, retryonratelimit = TRUE, include_rts = FALSE)
  
  #Subset tweets
  tweets = subset(tweets,
         select = c(
           `screen_name`,
           `text`)
  )
  
  #Export dataframe
  write.csv(tweets, paste(username,"minedtweets.csv"))
}

A queue was then set up per category to mine each actors profile.

#Mining activist tweets
mineTweetsActivist {
  mineTweets(username = "rivercide_live")
  mineTweets(username = "cleansafethames")
  mineTweets(username = "GeorgeMonbiot")
  mineTweets(username = "Feargal_Sharkey")
  mineTweets(username = "AlisonJArcher")
  mineTweets(username = "deedeelea")
  mineTweets(username = "iantokelove")
  mineTweets(username = "FisherLady21")
  mineTweets(username = "sascampaigns")
  mineTweets(username = "LDNWaterkeeper")
  mineTweets(username = "PymmesBrookERS")
  mineTweets(username = "QWAG")
  mineTweets(username = "leaboaters")
  mineTweets(username = "StonebridgeLock")
  mineTweets(username = "SaveLeaMarshes")
  mineTweets(username = "silkstreamers")
  mineTweets(username = "LDNgreenspaces")
  mineTweets(username = "StopTheDrill")
  
}

Once all accounts had been mined, a second queue was established in order to merge all CSV files together.

mergeTweetsActivist {
  setwd("~/Twitter-Mining/mined-tweets")
  mydir = "activist"
  myfiles = list.files(path=mydir, pattern="*.csv", full.names=TRUE)
  myfiles
  activist = ldply(myfiles, read_csv)
  
  #Export dataframe
  write.csv(activist,"merged_activist_tweets.csv")
}

This process was then repeated for each category; generating 6 CSV files containing all over 70,00 tweets from our different categories.

Analysis

LS0tCnRpdGxlOiAiRW52aXJvbWVudGFsIFR3aXR0ZXIgQW5hbHlzaXMiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KPGgyPkJ5IEhlbGdlIFBldGVycyAoT3hmb3JkKSBhbmQgTmF0aGFuYWVsIFNoZWVoYW4gKFVDTCkgPC9oMj4KCjxoMz4gUGFydCAxOiBEaWZmZXJlbnQgYWN0b3JzIHByb2JsZW1hdGlzZSB3YXRlciBpbiBkaWZmZXJlbnQgd2F5cyA8L2gzPgpTb2NpYWwgc3R1ZGllcyBvZiBlbnZpcm9ubWVudGFsIGNvbnRyb3ZlcnNpZXMgc3VnZ2VzdCB0aGF0IGRpZmZlcmVudCBhY3RvcnMgcHJvYmxlbWF0aXNlICd3YXRlcicgaW4gZGlzdGluY3Qgd2F5cywgd2hpY2ggbGlrZWx5IHJlZmxlY3QgdGhlaXIgb3JnYW5pc2F0aW9uYWwgcHVycG9zZSwgcHJpb3JpdGllcywgY29uc3RyYWludHMsIGFjY291bnRhYmlsaXRpZXMsIGF1ZGllbmNlcyBldGMgU3VjaCBkaXN0aW5jdCBwcm9ibGVtYXRpc2F0aW9ucyBhcmUgbGlhYmxlIHRvIGNvbnRyaWJ1dGUgdG8gd2F0ZXIgbWFuYWdlbWVudCBhcyBhICd3aWNrZWQgcHJvYmxlbScgaW4gd2hpY2ggbXVsdGlwbGUgb250b2xvZ2llcyBvZiB3YXRlciBhcmUgYXQgc3Rha2UuIFdlIHByb3Bvc2UgdG8gdXNlIHRleHQgY2x1c3RlcmluZyBpbiBvcmRlciB0byBpZGVudGlmeSB0aGVzZSBkaXN0aW5jdCBvbnRvbG9naWVzL3Byb2JsZW1hdGlzYXRpb25zIG9mIHdhdGVyIGFuZCBzdWJzZXF1ZW50bHkgbWVhc3VyZSB0aGUgZXh0ZW50IHRvIHdoaWNoIHRoZXNlIGRpZmZlciBiZXR3ZWVuIGFjdG9yIGNhdGVnb3JpZXMuCgo8Yj4gUmVzZWFyY2ggcXVlc3Rpb24gMSkgPC9iPiA8aT4gSG93IGNhbiBuYXR1cmFsIGxhbmd1YWdlIHByb2Nlc3NpbmcgZnJvbSB0d2VldHMgaWRlbnRpZnkgbGF0ZW50IHRvcGljcyByZWdhcmRpbmcgd2F0ZXI/IDwvaT4KCjxiPiBSZXNlYXJjaCBxdWVzdGlvbiAyKSA8L2I+IDxpPiBDYW4gd2UgbWVhc3VyZSB0aGUgcHJveGltaXR5L2Rpc3RhbmNlIGJldHdlZW4gdGhlc2UgdG9waWNzIGFuZCBtYXAgdGhlc2Ugb250byBhY3RvciBjYXRlZ29yaWVzPyA8L2k+Cgo8aDQ+IERhdGEgY29sbGVjdGlvbiA8L2g0PgoKSW4gb3JkZXIgdG8gdW5kZXJzdGFuZCBob3cgZGlmZmVyZW50IGFjdG9ycyBzcGVhayBhYm91dCB3YXRlciBvbiB0d2l0dGVyLCB3ZSBuZWVkZWQgdG8gbWluZSB0d2l0dGVyIGRhdGEgZnJvbSB0aGUgcmVsZXZlbnQgYWN0b3JzLiBBIGxpc3Qgb2YgMTcwIGFjdG9ycyB3YXMgcHJvZHVjZWQgYW5kIHdlcmUgbGFiZWxlZCB1bmRlciBzaXggY2F0ZWdvcmllcyA8aT4gQWN0aXZpc3QsIENoYXJpdHksIEluZHVzdHJ5LCBHb3Zlcm5tZW50LCBKb3VybmFsaXN0IGFuZCBQYXJ0bmVyc2hpcCdzIDwvaT4gZGVwZW5kZW50IG9uIHRoZWlyIGFmZmlsaWF0aW9ucy4gCgpUbyBwZXJmb3JtIHRoZSBhbmFseXNpcywgYSBudW1iZXIgb2YgUiBsaWJyYXJpZXMgd2VyZSB1c2VkLiBJbiBvcmRlciB0byBhY2Nlc3MgYW5kIG1pbmUgdHdpdHRlciBkYXRhLCB3ZSB1c2VkIHRoZSA8YSBocmVmPSJodHRwczovL2NyYW4uci1wcm9qZWN0Lm9yZy93ZWIvcGFja2FnZXMvcnR3ZWV0L3J0d2VldC5wZGYiPnJ0d2VldDwvYT4gcGFja2FnZS4gQ29sbGVjdGVkIGRhdGEgd2FzIGNsZWFuZWQgdXNpbmcgPGEgaHJlZj0iaHR0cHM6Ly9jcmFuLnItcHJvamVjdC5vcmcvd2ViL3BhY2thZ2VzL3J0d2VldC9ydHdlZXQucGRmIj50bTwvYT4sIDxhIGhyZWY9Imh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy9ydHdlZXQvcnR3ZWV0LnBkZiI+c3RyaW5ncjwvYT4gYW5kIDxhIGhyZWY9Imh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy9ydHdlZXQvcnR3ZWV0LnBkZiI+cWRhcFJlZ2V4PC9hPi4gRm9yIGRhdGEgbWVyZ2luZywgZm9ybWF0dGluZyBhbmQgbXVuZ2luZyB3ZSB1c2VkIHRoZSA8YSBocmVmPSJodHRwczovL2NyYW4uci1wcm9qZWN0Lm9yZy93ZWIvcGFja2FnZXMvcnR3ZWV0L3J0d2VldC5wZGYiPnB1cnJyPC9hPiwgPGEgaHJlZj0iaHR0cHM6Ly9jcmFuLnItcHJvamVjdC5vcmcvd2ViL3BhY2thZ2VzL3J0d2VldC9ydHdlZXQucGRmIj5kcGx5cjwvYT4gYW5kIDxhIGhyZWY9Imh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy9ydHdlZXQvcnR3ZWV0LnBkZiI+cmVhZHI8L2E+IHBhY2thZ2VzLiAKCmBgYHtyfQpsaWJyYXJ5KGh0dHB1dikKbGlicmFyeShydHdlZXQpICAgICAgIApsaWJyYXJ5KHRtKSAgICAgICAgICAgIApsaWJyYXJ5KHN0cmluZ3IpICAgICAgIApsaWJyYXJ5KHFkYXBSZWdleCkgICAgCmxpYnJhcnkod29yZGNsb3VkMikgICAgCmxpYnJhcnkocHVycnIpCmxpYnJhcnkoZHBseXIpCmxpYnJhcnkocGx5cikKbGlicmFyeShyZWFkcikKYGBgCgpBdXRoZXRpY2F0aW9uIHRvIHRoZSB0d2l0dGVyIEFQSSB3YXMgZ2VuZXJhdGVkIHVzaW5nIG91ciBBUEkga2V5cyBhbmQgdG9rZW5zLiBUaGUgPGk+IHJ0d2VldCA8L2k+IHBhY2thZ2UgbWFrZXMgdGhpcyBlYXN5IHRocm91Z2ggdGhlIDxpPiBjcmVhdGVfdG9rZW48L2k+IGFuZCA8aT5nZXRfdG9rZW48L2k+IGZ1bmN0aW9ucy4KCmBgYHtyfQojI1NldCB1cCB0d2l0dGVyIGF1dGggLSB0aGVzZSBhcmUgbm90IG91ciBhY3R1YWwga2V5cyA6cAphcGlfa2V5IDwtICJoamtzZmRoa2plaDM0MjMyNCIKYXBpX3NlY3JldF9rZXkgPC0gImpoMzQya2poa2pmaGRramgyamszaDRramJmZHNramhramgiCmFjY2Vzc190b2tlbiA8LSAiMTIzdTA5OGY5MGRzOGZzaGprMjM0bG5oa2xqZmQiCmFjY2Vzc19zZWNyZXQgPC0gImZkc2praGFramZoMzI0ZmtqaGtqM3JpdWhpOSIKCgp0b2tlbiA8LSBjcmVhdGVfdG9rZW4oCiAgYXBwID0gIkNhbWVsbGlhIFJlc2VhcmNoIiwKICBjb25zdW1lcl9rZXkgPSBhcGlfa2V5LAogIGNvbnN1bWVyX3NlY3JldCA9IGFwaV9zZWNyZXRfa2V5LAogIGFjY2Vzc190b2tlbiA9IGFjY2Vzc190b2tlbiwKICBhY2Nlc3Nfc2VjcmV0ID0gYWNjZXNzX3NlY3JldCkKCmdldF90b2tlbigpCmBgYAoKT25jZSBjb25uZWN0aW9uIHdhcyBzZWN1cmUsIGEgZnVuY3Rpb24gd2FzIHdyaXR0ZWQgdG8gbWluZSB0d2l0dGVyIGRhdGEuIFRoZSBmdW5jdGlvbiA8aT4gbWluZVR3ZWV0cyA8L2k+IHRha2VzIHRoZSBzdHJpbmcgPGk+IHVzZXJuYW1lIDwvaT4gYXMgYSB2YXJpYWJsZSB0byBtaW5lIHRoZSByZWxldmVudCBhY2NvdW50LiBBIG1heGl1bXVtIG9mIDEwMDAgdHdlZXRzIHdlcmUgbWluZWQgcGVyIGFjY291bnQgKG5vdCBpbmNsdWRpbmcgcmV0d2VldHMpIGFuZCBhIGBgYHJldHJ5b25yYXRlbGltaXRgYGAgd2FzIHNldCB0byB0cnVlIGluIG9yZGVyIHRvIGVuc3VyZSB3ZSBzdWNlc3NmdWxseSBtaW5lZCBldmVyeW9uZXMgdHdlZXRzLiBNaW5lZCB0d2VldHMgd2VyZSBzdWJzZXR0ZWQgdG8gb25seSBpbmNsdWRlIHRoZSB0d2l0dGVyIGFuZCB0aGUgdHdlZXQuIEVhY2ggYWNjb3VudHMgdHdlZXRzIHdlcmUgdGhlIGV4cG9ydGVkIGFzIGEgQ1NWLiAgIAoKYGBge3J9Cm1pbmVUd2VldHMgPC0gZnVuY3Rpb24odXNlcm5hbWUpewogICNNaW5lIHR3ZWV0cwogIHR3ZWV0cyA9IGdldF90aW1lbGluZXModXNlcm5hbWUsIG4gPSAxMDAwLCByZXRyeW9ucmF0ZWxpbWl0ID0gVFJVRSwgaW5jbHVkZV9ydHMgPSBGQUxTRSkKICAKICAjU3Vic2V0IHR3ZWV0cwogIHR3ZWV0cyA9IHN1YnNldCh0d2VldHMsCiAgICAgICAgIHNlbGVjdCA9IGMoCiAgICAgICAgICAgYHNjcmVlbl9uYW1lYCwKICAgICAgICAgICBgdGV4dGApCiAgKQogIAogICNFeHBvcnQgZGF0YWZyYW1lCiAgd3JpdGUuY3N2KHR3ZWV0cywgcGFzdGUodXNlcm5hbWUsIm1pbmVkdHdlZXRzLmNzdiIpKQp9CmBgYAoKQSBxdWV1ZSB3YXMgdGhlbiBzZXQgdXAgcGVyIGNhdGVnb3J5IHRvIG1pbmUgZWFjaCBhY3RvcnMgcHJvZmlsZS4KCmBgYHtyfQojTWluaW5nIGFjdGl2aXN0IHR3ZWV0cwptaW5lVHdlZXRzQWN0aXZpc3QgewogIG1pbmVUd2VldHModXNlcm5hbWUgPSAicml2ZXJjaWRlX2xpdmUiKQogIG1pbmVUd2VldHModXNlcm5hbWUgPSAiY2xlYW5zYWZldGhhbWVzIikKICBtaW5lVHdlZXRzKHVzZXJuYW1lID0gIkdlb3JnZU1vbmJpb3QiKQogIG1pbmVUd2VldHModXNlcm5hbWUgPSAiRmVhcmdhbF9TaGFya2V5IikKICBtaW5lVHdlZXRzKHVzZXJuYW1lID0gIkFsaXNvbkpBcmNoZXIiKQogIG1pbmVUd2VldHModXNlcm5hbWUgPSAiZGVlZGVlbGVhIikKICBtaW5lVHdlZXRzKHVzZXJuYW1lID0gImlhbnRva2Vsb3ZlIikKICBtaW5lVHdlZXRzKHVzZXJuYW1lID0gIkZpc2hlckxhZHkyMSIpCiAgbWluZVR3ZWV0cyh1c2VybmFtZSA9ICJzYXNjYW1wYWlnbnMiKQogIG1pbmVUd2VldHModXNlcm5hbWUgPSAiTEROV2F0ZXJrZWVwZXIiKQogIG1pbmVUd2VldHModXNlcm5hbWUgPSAiUHltbWVzQnJvb2tFUlMiKQogIG1pbmVUd2VldHModXNlcm5hbWUgPSAiUVdBRyIpCiAgbWluZVR3ZWV0cyh1c2VybmFtZSA9ICJsZWFib2F0ZXJzIikKICBtaW5lVHdlZXRzKHVzZXJuYW1lID0gIlN0b25lYnJpZGdlTG9jayIpCiAgbWluZVR3ZWV0cyh1c2VybmFtZSA9ICJTYXZlTGVhTWFyc2hlcyIpCiAgbWluZVR3ZWV0cyh1c2VybmFtZSA9ICJzaWxrc3RyZWFtZXJzIikKICBtaW5lVHdlZXRzKHVzZXJuYW1lID0gIkxETmdyZWVuc3BhY2VzIikKICBtaW5lVHdlZXRzKHVzZXJuYW1lID0gIlN0b3BUaGVEcmlsbCIpCiAgCn0KYGBgCgpPbmNlIGFsbCBhY2NvdW50cyBoYWQgYmVlbiBtaW5lZCwgYSBzZWNvbmQgcXVldWUgd2FzIGVzdGFibGlzaGVkIGluIG9yZGVyIHRvIG1lcmdlIGFsbCBDU1YgZmlsZXMgdG9nZXRoZXIuCgpgYGB7cn0KbWVyZ2VUd2VldHNBY3RpdmlzdCB7CiAgc2V0d2QoIn4vVHdpdHRlci1NaW5pbmcvbWluZWQtdHdlZXRzIikKICBteWRpciA9ICJhY3RpdmlzdCIKICBteWZpbGVzID0gbGlzdC5maWxlcyhwYXRoPW15ZGlyLCBwYXR0ZXJuPSIqLmNzdiIsIGZ1bGwubmFtZXM9VFJVRSkKICBteWZpbGVzCiAgYWN0aXZpc3QgPSBsZHBseShteWZpbGVzLCByZWFkX2NzdikKICAKICAjRXhwb3J0IGRhdGFmcmFtZQogIHdyaXRlLmNzdihhY3RpdmlzdCwibWVyZ2VkX2FjdGl2aXN0X3R3ZWV0cy5jc3YiKQp9CmBgYAoKVGhpcyBwcm9jZXNzIHdhcyB0aGVuIHJlcGVhdGVkIGZvciBlYWNoIGNhdGVnb3J5OyBnZW5lcmF0aW5nIDYgQ1NWIGZpbGVzIGNvbnRhaW5pbmcgYWxsIG92ZXIgNzAsMDAgdHdlZXRzIGZyb20gb3VyIGRpZmZlcmVudCBjYXRlZ29yaWVzLgoKPGg0PiBBbmFseXNpcyA8L2g0PgoK