Text as Data Blog Post 3

Data Collection for the final project

MANI KANTA GOGULA
03-06-2022

Nowadays social media is the most popular place for the people to express their feelings ,opinions publicly and to interact with the other people online. Unfortunately people also uses these platforms to express their hate in many forms with the possible impact on their personal experience. Any message or post which is used to express hatred towards particular groups/people , religion,race, community,etc is considered as a hateful message. My project is based on the hate speech in social media like twitter and studying the impact of the post/tweet on the other people and their opinions.

I have created a student developer account for the twitter to extract tweets from the twitter which contains hatred/hateful tweets and their impact.

As the API code and other codes provided by twitter is the confidential one. so Im providing the basic code ,how i extracted tweets.

library(rtweet) mytoken <- create_token( app = “scrapetext”, consumer_key = “***********”,

consumer_secret = “*****************”,

access_token = “**********************”,

access_secret = “*****************”)

data <- search_tweets(“#hatespeech”, include_retweets = TRUE, n = 1000, retryonratelimit = TRUE, token = mytoken, lang = ‘en’)

Above code is used to extract the 1000 tweets from the twitter including retweets in english language.

Apart from the twitter extracted data , I have also downloaded few datasets from various sources . Loading and viewing the downlaoded dataset

library(readr)
TwitterHate <- read_csv("TwitterHate.csv")
head(TwitterHate)
# A tibble: 6 x 3
     id label tweet                                                   
  <dbl> <dbl> <chr>                                                   
1     1     0 "@user when a father is dysfunctional and is so selfish~
2     2     0 "@user @user thanks for #lyft credit i can't use cause ~
3     3     0 "bihday your majesty"                                   
4     4     0 "#model   i love u take with u all the time in urð\u009~
5     5     0 "factsguide: society now    #motivation"                
6     6     0 "[2/2] huge fan fare and big talking before they leave.~
dim(TwitterHate)
[1] 31962     3
library(dplyr)
library(tidyverse)
select(TwitterHate,tweet)
# A tibble: 31,962 x 1
   tweet                                                              
   <chr>                                                              
 1 "@user when a father is dysfunctional and is so selfish he drags h~
 2 "@user @user thanks for #lyft credit i can't use cause they don't ~
 3 "bihday your majesty"                                              
 4 "#model   i love u take with u all the time in urð\u009f\u0093±!!!~
 5 "factsguide: society now    #motivation"                           
 6 "[2/2] huge fan fare and big talking before they leave. chaos and ~
 7 "@user camping tomorrow @user @user @user @user @user @user @user ~
 8 "the next school year is the year for exams.ð\u009f\u0098¯ can't t~
 9 "we won!!! love the land!!! #allin #cavs #champions #cleveland #cl~
10 "@user @user welcome here !  i'm   it's so #gr8 !"                 
# ... with 31,952 more rows

Going forward iam looking to gather more data for my project and to find the various hate speech hashtags used by the people in online so that i can extract more tweets using specific hashtags which contains hatred messages.