# install.packages("rtweet")
library(tidyverse)
library(rtweet)
library(xml2)
library(memoise)
options(width=120)
knitr::opts_chunk$set( warning = FALSE, message = FALSE)
A quick markdown file to search London gang slang on Twitter.
Rationale: The idea is to see whether these gang slang words produce sensible results from a simple twitter search in terms of references to crime and violent gang culture.
Involves:
Sources: Gang slang terms used in this document are taken from.
End Result:
A list of London gang slang terms (from shinobilifeonline.com) and comments whether tweets containing these terms refer to crime and gang activity https://github.com/sefabey/fear_of_crime_paper/blob/master/data/slangs_from_shinobi.csv
A list of London ‘Road’ slang terms (from Ebony Reid’s PhD thesis) and comments whether tweets containing these terms refer to crime and gang activity https://github.com/sefabey/fear_of_crime_paper/blob/master/data/ereid_glossary.csv
General Comments
Among inspected terms, very few terms would be useful for identifying tweets mentioning crime or criminal activity. See the above csv files.
Scraping data from: https://www.shinobilifeonline.com/index.php?topic=2973.0. Used selectorgadget to identify xpath. This code scrapes all slang terms in the web page and returns a list consisting of 120 slang terms.
html01 <- "https://www.shinobilifeonline.com/index.php?topic=2973.0"
slangs <- rvest::html(html01) %>%
rvest::html_nodes( xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "inner", " " ))]//*[contains(concat( " ", @class, " " ), concat( " ", "bbc_size", " " ))]') %>%
rvest::html_text() %>%
as.tibble() %>%
mutate(slang_term=value) %>%
mutate(letter_count=str_count(slang_term)) %>%
filter(letter_count>1) %>%
# mutate(double_words= case_when( str_detect(slang_term, "/") ~ "yes",TRUE ~ "no")) %>% #no need to do this, separate_rows works as expected
separate_rows(slang_term, sep = "/") %>%
distinct(slang_term,.keep_all = T) %>%
filter(!row_number()==1) %>% #drop the first row which was a header
mutate(slang_term_lower= str_to_lower(slang_term)) %>%
select(slang_term,slang_term_lower)
slangs <- slangs %>%
mutate(slang_term_query= paste0('\"',slangs$slang_term_lower,'\"'))
# wrote this slangs data to file just in case
Using SEARCH API, I queried 50 tweets for each term in the slang list. Apparently, not every slang term returned 50 tweets (some are really obscure and uncommon terms/spellings). Ultimately, this query resulted in a dataframe consisting of 5604 rows.
I am using below chunk for memoising and/or caching purposes future reference only. I wrote query results to a csv file and I will be working with that (otherwise I need to query twitter API every time I knit the rmd. This is impractical as it (1) returns different results each time, (2) twitter rate limits are pain). Therefore, not evaluating below chunk at all.
slangs <- slangs %>%
mutate(slang_term_query= paste0('\"',slangs$slang_term_lower,'\"'))
slang_tweets <- purrr::map_df(.x = slangs$slang_term_query, .f = rtweet::search_tweets,
n = 50, include_rts=F)
rate_limit() %>%
arrange(reset)
slang_tweets %>% rtweet::write_as_csv("slang_tweets.csv")
read slang tweets from csv
slang_tweets <- rtweet::read_twitter_csv("../data/slang_tweets.csv")
Below, I will define and use a function that (1)finds tweets (scraped previously) which match nth term from the slang list, (2) randomly sample 20 tweets matching nth term, (3) print tweet text.
Then, I will read these tweets and try to get a sense of what they refer to. I will print the tweets first and then add my comments. Since this is quite repetitive, I will do this for first 40 terms. See the complete list for all terms [https://github.com/sefabey/fear_of_crime_paper/blob/master/data/slangs_from_shinobi.csv]
Note for persons with a keen eye: The reason for using double distinct in the chunk below is, some slang terms returned less than 20 results so the chunk was throwing an error when using sample_n(20). Thus had to do distinct, sample 20 with replacement and then do distinct again (for cases where unique n<20). I could have tackled this more elegantly (by dropping first distinct and using sample_n with replacement and then disctinctafter that) but since I was using cache=T the some chunks, I was in too deep and I opted to carry on with not so elegant code.
print_slang_tweets <- function(n) {
slang_tweets %>%
select(text) %>%
filter(str_detect(slang_tweets$text, pattern = regex (slangs$slang_term_lower[n],ignore_case = T))) %>%
distinct(text) %>%
sample_n(20, replace = T) %>%
distinct(text)
}
print_slang_tweets(1) %>% rmarkdown:::print.paged_df()
Comments: Term used to query twitter API was Jump Out Gang. Tweets reference to a band called jump out gang. Not very useful for studying further.
print_slang_tweets(2) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was 67. As the term is an integer, tweets can refer to anything. Not very useful for studying further.
print_slang_tweets(3) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was 86. Similar to 67 above, as the term is an integer, tweets can refer to anything. Random topics observed. Not very useful for studying further.
print_slang_tweets(4) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Harlem Spartans. Observed references to a UK band called Harlem Spartans. HS are apparently banned for references to violence an UK gang culture in their music. Some references to violence (see the 3rd tweet for instance). Maybe useful for studying further.
print_slang_tweets(5) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Ounto Nation. Returned only 3 tweets, which reference to YouTube videos. Not much sentiment observed. Not very useful for studying further.
print_slang_tweets(6) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Aggy. No mention of crime but used to express negative sentiments. Maybe useful for detecting emotions but not references to crime.
print_slang_tweets(7) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Aggro. Very similar to aggy. No mention of crime but used to express negative sentiments. Maybe useful for detecting emotions but not references to crime.
print_slang_tweets(8) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Amm. Lots of foreign language tweets. not useful.
print_slang_tweets(9) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Ahlie. random topics observed. not suitable.
print_slang_tweets(10) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Allow it. interestingly observed two instances of counter-speech but hard to discriminate normal use from slang use. not suitable.
print_slang_tweets(11) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Aired. Hard to discriminate whether slang or literal. Slang version not observed. not suitable.
print_slang_tweets(12) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bare. Hard to discriminate whether slang or literal. Rare to see slang used on Twitter. not suitable.
print_slang_tweets(13) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bands. Hard to discriminate whether slang or literal. Slang usage not observed. not suitable.
print_slang_tweets(14) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bando. all tweets are in foreign language. Slang usage not observed. not suitable.
print_slang_tweets(15) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bredrins. some referral to gang and mandem culture. some unrelated tweets. Most tweets contain references to UK so better geolocation. may be useful.
print_slang_tweets(16) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bells. Hard to discriminate whether slang or literal. Slang usage not observed. not suitable.
print_slang_tweets(17) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bun. Hard to discriminate whether slang or literal. Slang usage not observed. not suitable.
print_slang_tweets(18) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bruck. Hard to discriminate whether slang or literal. Slang usage not observed. not suitable.
print_slang_tweets(19) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Brukk. Hard to discriminate whether slang or literal. Slang usage not observed. almost all tweets in foreign lang. not suitable.
print_slang_tweets(20) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Body. Slang usage not observed. not suitable.
print_slang_tweets(21) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bait. Slang usage not observed. not suitable.
print_slang_tweets(22) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Block. Almost all tweets refer to ‘blocking action’. Slang usage not observed. not suitable.
print_slang_tweets(23) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bird. Slang usage not observed. Even slang meaning is mild. not suitable.
print_slang_tweets(24) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Batty. Slang usage observed but no reference to crime. not suitable.
print_slang_tweets(25) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bossman. Slang usage observed but slang usage carries a positive sentiment. many references to YouTube music videos. not suitable.
print_slang_tweets(26) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Beef. Slang usage observed but adopted by a wider population so the sample does not include gang reference. also has has a non-slang meaning that is commonly used. not suitable.
print_slang_tweets(27) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Blanked. Slang usage observed. No reference to crime. not suitable.
print_slang_tweets(28) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Beg . slang usage is using this term as a noun, rather than a verb. need to use NLP to distinguish. not suitable.
print_slang_tweets(29) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bumbaclart. Jamaican origin. slang usage observed. used to refer to negative things or persons. possibly many tweets from the UK. not many references to crime.
print_slang_tweets(30) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bludclart. slang usage observed. very informal spelling. used to refer to extremely negative things or persons. many tweets from the UK but not many references to crime.
print_slang_tweets(31) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bloodclart. slang usage observed. used to refer to extremely negative things or persons. many tweets from the UK, and some references to crime or violence.
print_slang_tweets(32) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Bally. slang usage not observed. not suitable.
print_slang_tweets(33) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Ballie . slang usage not observed. not suitable.
print_slang_tweets(34) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Corn. slang usage not observed. not suitable.
print_slang_tweets(35) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Crash. slang usage not observed. not suitable.
print_slang_tweets(36) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Creps. slang usage observed but not related to crime. not suitable
print_slang_tweets(37) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Case . slang usage not observed. not suitable.
print_slang_tweets(38) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Chunky. slang usage referring to violence not observed. not suitable.
print_slang_tweets(39) %>% rmarkdown:::print.paged_df()
Comments: term used to query twitter API was Chase. slang usage referring to violence not observed. not suitable.
I am only using the glossary of London gang slangs but the full thesis can be found here [https://bura.brunel.ac.uk/bitstream/2438/14817/1/FulltextThesis.pdf]. Rather than scraping tweets using rtweet, I will do the twitter search manually as is gives more control overall. The end result will be in the csv file.
glossary_text <- 'Ackee- refers to a former road man who has converted to Islam
Baby mother- the mandem use the term to describe the mother of their child
Bad up- to treat someone in a disrespectful manner. It also refers to being
victimised or victimising others
Bait- being too obvious
Boy dem- the police
Bruk- refers to having no money
Bussing a skank- to dance
Fuckery- the mandem often used this term to describe criminal activity or
violence
Garms- clothing
Grind- refers to working hard in the illegal drug economy
Gwarning with tings- doing well on road
Head back lick off- shot in the head
Hype- exaggerated/over the top behaviour
Nuff- a lot
Prick- dick head/idiot
Stunting/Stunter- to show off, a person who shows off
Take set on you- refers to being targeted and, potentially victimised by rivals
from neighbouring estates
Wasteman- useless, poor, unsuccessful
Warring- to fight, ongoing conflict' #just copy paste from pdf
glossary <- glossary_text %>%
str_split(pattern = "\n") %>% #split rows
.[[1]] %>% #select elements in the list
as.tibble()%>% #convert to a tibble
separate(value, into=c("slang", "meaning"), sep="-") %>% #separate text into two columns using -
filter(!is.na(meaning))#remove excess rows
write_csv(glossary, "../data/ereid_glossary.csv")
After inspecting terms in this glossary, it can be said that not many references to crime and gang activity observed using the glossary from E Reid’s PhD thesis. See the resulting csv file yourself[https://github.com/sefabey/fear_of_crime_paper/blob/master/data/ereid_glossary.csv]