This assignemnt we were tasked with taking down data from Twitter and to do something cool with it. I decided to look at data on #turtle, because I love turtles and wanted to see what I could tell from this assignment about this topic on Twitter.
First off, I set up the the developer connection for Twitter as we learned and then it was off to get more data! I find it quite fascinating that Twitter has this integration built to talk with a program software like R, and can tell there is so much one could do with Twitter data having the right tools and know-how. I felt slightly like a bull in a china shop with this assignment though, as personally I feel the assignments tend to highlight more all the things I don’t know about R and data management, rather than what I can do.
library(rtweet)
library(tidytext)
library(stringr)
library(dplyr)
library(ggplot2)
I first looked searched on just #turtle to see what data came back.
num_tweets <- 1000
tt <- search_tweets('#Turtle', n = num_tweets,
include_rts = FALSE)
head(tt)
## # A tibble: 6 x 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 480074~ 11996905~ 2019-11-27 14:05:01 donniethet~ Oh..~ Donni~
## 2 480074~ 11982397~ 2019-11-23 14:00:01 donniethet~ "Tod~ Donni~
## 3 480074~ 11996893~ 2019-11-27 14:00:01 donniethet~ "Tod~ Donni~
## 4 480074~ 11964278~ 2019-11-18 14:00:01 donniethet~ "Tod~ Donni~
## 5 480074~ 11967902~ 2019-11-19 14:00:01 donniethet~ "Tod~ Donni~
## 6 480074~ 11989658~ 2019-11-25 14:05:01 donniethet~ I'm ~ Donni~
## # ... with 84 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>,
## # ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## # lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## # quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## # quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## # quoted_name <chr>, quoted_followers_count <int>,
## # quoted_friends_count <int>, quoted_statuses_count <int>,
## # quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## # retweet_source <chr>, retweet_favorite_count <int>,
## # retweet_retweet_count <int>, retweet_user_id <chr>,
## # retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## # country_code <chr>, geo_coords <list>, coords_coords <list>,
## # bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## # description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## # friends_count <int>, listed_count <int>, statuses_count <int>,
## # favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
I tried a few code variations on source and screen name, but did not find they gave me any insight that I could follow.
turtle_platform <- tt %>% group_by(source) %>%
summarize(n = n()) %>%
mutate(percent_of_tweets = n/sum(n)) %>%
arrange(desc(n))
head(tt)
## # A tibble: 6 x 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 480074~ 11996905~ 2019-11-27 14:05:01 donniethet~ Oh..~ Donni~
## 2 480074~ 11982397~ 2019-11-23 14:00:01 donniethet~ "Tod~ Donni~
## 3 480074~ 11996893~ 2019-11-27 14:00:01 donniethet~ "Tod~ Donni~
## 4 480074~ 11964278~ 2019-11-18 14:00:01 donniethet~ "Tod~ Donni~
## 5 480074~ 11967902~ 2019-11-19 14:00:01 donniethet~ "Tod~ Donni~
## 6 480074~ 11989658~ 2019-11-25 14:05:01 donniethet~ I'm ~ Donni~
## # ... with 84 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>,
## # ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## # lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## # quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## # quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## # quoted_name <chr>, quoted_followers_count <int>,
## # quoted_friends_count <int>, quoted_statuses_count <int>,
## # quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## # retweet_source <chr>, retweet_favorite_count <int>,
## # retweet_retweet_count <int>, retweet_user_id <chr>,
## # retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## # country_code <chr>, geo_coords <list>, coords_coords <list>,
## # bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## # description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## # friends_count <int>, listed_count <int>, statuses_count <int>,
## # favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
turtle_platform %>% slice(1:10)
## # A tibble: 10 x 3
## source n percent_of_tweets
## <chr> <int> <dbl>
## 1 Tweets for Turtles 180 0.205
## 2 Twitter for iPhone 143 0.163
## 3 Instagram 132 0.151
## 4 Twitter Web App 118 0.135
## 5 Twitter for Android 73 0.0833
## 6 IFTTT 40 0.0457
## 7 Hootsuite Inc. 32 0.0365
## 8 Twitter Web Client 29 0.0331
## 9 TweetDeck 26 0.0297
## 10 Buffer 16 0.0183
tt %>% group_by(screen_name) %>%
summarize(n = n()) %>%
mutate(percent_of_tweets = n/sum(n)) %>%
arrange(desc(n)) %>% slice(1:10)
## # A tibble: 10 x 3
## screen_name n percent_of_tweets
## <chr> <int> <dbl>
## 1 aTurtlebot 180 0.205
## 2 TMNT_Wiz 27 0.0308
## 3 kame_fuji 15 0.0171
## 4 GreenieTurtle 14 0.0160
## 5 TurtleAloha 14 0.0160
## 6 kamepi24 13 0.0148
## 7 donnietheturtle 12 0.0137
## 8 NatureCutsTags 12 0.0137
## 9 StarCrystalDel 10 0.0114
## 10 TPE_connect 10 0.0114
Next, I looked at the words used in the tweets to determine what is being discussed when #turtle is used. (Outside of just turtles, of course.)
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#A]))"
turtle_words <- tt %>% select(status_id, text) %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text,
"https:t.co/[A-Za-z\\d]+|&",
"")) %>%
unnest_tokens(word, text, token = "regex",
pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(
word, "[a-z]"))
turtle_words <- turtle_words %>% group_by(word) %>%
summarize(n = n()) %>%
mutate(percent_of_tweets = n/sum(n)) %>%
arrange(desc(n)) %>% top_n(20)
## Selecting by percent_of_tweets
head(turtle_words)
## # A tibble: 6 x 3
## word n percent_of_tweets
## <chr> <int> <dbl>
## 1 https 976 0.0806
## 2 #turtle 871 0.0720
## 3 #plastic 282 0.0233
## 4 #cute 206 0.0170
## 5 turtle 193 0.0159
## 6 #turtlebot 182 0.0150
This gave me a bit more to work with, and you can see that the second return is #plastic, which leads me to believe there are many tweets involving pollution and sea turtles that we could deal with.
Now, I tried to plot the count of words used with #turtle. Unfortunately, while I was able to generate the graph, I could not get the y-axis with count to work correctly. It seemed to be setting all count to “1” and I tried several different graphs and attempted scaling, but could not determine why my integer count that I can see in the tibble did not translate to the graph.
turtle_words %>% count(word, sort = TRUE) %>% top_n(15) %>%
mutate(word = reorder(word, n)) %>% ggplot(aes(x = word, y = n)) +
geom_col() + xlab(NULL) + coord_flip() + labs(x = "Top Turtle Word Use",
y = "Count",
title = "Top Twitter Searches on #Turtle")
turtle_words %>% count(word, sort = TRUE) %>% top_n(15) %>%
mutate(word = reorder(word, n)) %>% ggplot(aes(x = word, y = n)) +
geom_col() + xlab(NULL) + coord_flip() + labs(x = "Top Turtle Word Use",
y = "Count",
title = "Top Twitter Searches on #Turtle") +
ylim(0, 10)
After looking at the words used in #turtle tweets I wanted to see if more information about these tweets could be gleaned by adding in the sentiment lexicon and looking at the tweets themselves based upon emotion attribute. It returned the list of common range of sentiments that you would expect to find.
turtle_words2 <- tt %>% select(status_id, text) %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text,
"https:t.co/[A-Za-z\\d]+|&",
"")) %>%
unnest_tokens(word, text, token = "regex",
pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(
word, "[a-z]"))
nrc <- get_sentiments("nrc") %>%
select(word, sentiment)
head(nrc)
## # A tibble: 6 x 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
turtle_words2_sentiments <- turtle_words2 %>%
inner_join(nrc, by = "word")
turtle_words2_sentiments %>%
group_by(sentiment) %>% summarize(n = n()) %>%
arrange(desc(n))
## # A tibble: 10 x 2
## sentiment n
## <chr> <int>
## 1 positive 601
## 2 joy 303
## 3 anticipation 252
## 4 trust 238
## 5 negative 200
## 6 fear 111
## 7 surprise 104
## 8 sadness 95
## 9 disgust 69
## 10 anger 66
Next, I pulled the positive posts to look at them more closely. However, when I did that I found the selection seemed to consist of tweets about jewelry, which was not what I was looking for.
pos_tt_id <- turtle_words2_sentiments %>%
filter(sentiment == "positive") %>% distinct(status_id)
tt %>% inner_join(pos_tt_id, by = "status_id") %>%
select(text) %>% slice(1:10)
## # A tibble: 10 x 1
## text
## <chr>
## 1 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 2 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 3 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 4 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 5 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 6 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 7 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 8 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 9 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 10 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
As the positive sentiment was pulling information not about real turtles, I thought looking at a sentiment more on the negative spectrum of emotions it might show a different picture. However, when looking at the sad sentiment, the returned tweets were the same jewelry ones found in the positive return.
sad_tt_id <- turtle_words2_sentiments %>% filter(sentiment == "sadness") %>%
distinct(status_id)
tt %>% inner_join(sad_tt_id, by = "status_id") %>% select(text) %>% slice(1:10)
## # A tibble: 10 x 1
## text
## <chr>
## 1 I'm not fat... I just have a big shell. #Java #Turtle #NinjaTurtles
## 2 "But most amazing ritual is that soft shell #turtle are left at pond of haya~
## 3 "@Activision @InfinityWard @ATVIAssist @JoeCecot @ashtonisVULCAN #CallofDuty~
## 4 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 5 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 6 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 7 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
## 8 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 9 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 10 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
Not to be daunted, I tried anger instead. Again looking for that search maybe talking about pollution as shown in the #plastic top results. Unfortuntely, yet again my results still showed the same tweets as in my other returns.
anger_tt_id <- turtle_words2_sentiments %>% filter(sentiment == "anger") %>%
distinct(status_id)
tt %>% inner_join(anger_tt_id, by = "status_id") %>% select(text) %>% slice(1:10)
## # A tibble: 10 x 1
## text
## <chr>
## 1 I'm not fat... I just have a big shell. #Java #Turtle #NinjaTurtles
## 2 "This #kitty prefers a #turtle over a teddy bear or cat nip.....~ #Cuteness\~
## 3 Tortious Confetti #naturecuts #confetti #cutout #partysupplies #favors #deco~
## 4 "But most amazing ritual is that soft shell #turtle are left at pond of haya~
## 5 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 6 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 7 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 8 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
## 9 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 10 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
So, my data was not giving me something I felt like I could work with. I could not figure out how to take the data I had already pulled and filter out tweets about generic turtle items such as jewelry. So instead I ran a more narrow seach on #seaturtle. I followed similar steps as executed above on #turtle.
stt <- search_tweets('#seaturtle', n = num_tweets,
include_rts = FALSE)
head(stt)
## # A tibble: 6 x 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 116855~ 11996897~ 2019-11-27 14:01:35 Dimpled548~ "Wha~ Twitt~
## 2 116855~ 11965854~ 2019-11-19 00:26:09 Dimpled548~ "Bes~ Tweet~
## 3 823792~ 11985623~ 2019-11-24 11:21:41 RGDives "Clo~ IFTTT
## 4 823792~ 11982365~ 2019-11-23 13:47:00 RGDives "Lot~ IFTTT
## 5 823792~ 11967518~ 2019-11-19 11:27:32 RGDives "Cru~ IFTTT
## 6 823792~ 11996734~ 2019-11-27 12:57:04 RGDives "Rel~ IFTTT
## # ... with 84 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>,
## # ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## # lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## # quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## # quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## # quoted_name <chr>, quoted_followers_count <int>,
## # quoted_friends_count <int>, quoted_statuses_count <int>,
## # quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## # retweet_source <chr>, retweet_favorite_count <int>,
## # retweet_retweet_count <int>, retweet_user_id <chr>,
## # retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## # country_code <chr>, geo_coords <list>, coords_coords <list>,
## # bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## # description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## # friends_count <int>, listed_count <int>, statuses_count <int>,
## # favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
stt %>% group_by(screen_name) %>%
summarize(n = n()) %>%
mutate(percent_of_tweets = n/sum(n)) %>%
arrange(desc(n)) %>% slice(1:10)
## # A tibble: 10 x 3
## screen_name n percent_of_tweets
## <chr> <int> <dbl>
## 1 Makalewakan2 14 0.0773
## 2 RGDives 5 0.0276
## 3 cehart03 4 0.0221
## 4 NomadicBrits 4 0.0221
## 5 AnthonyCatucci 3 0.0166
## 6 FallHolidaze 3 0.0166
## 7 KauaiMarionette 3 0.0166
## 8 NatureCutsTags 3 0.0166
## 9 sebphotog 3 0.0166
## 10 WIDECAST1 3 0.0166
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#A]))"
seaturtle_words <- stt %>% select(status_id, text) %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text,
"https:t.co/[A-Za-z\\d]+|&",
"")) %>%
unnest_tokens(word, text, token = "regex",
pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(
word, "[a-z]"))
seaturtle_words_sentiments <- seaturtle_words %>%
inner_join(nrc, by = "word")
seaturtle_words_sentiments2 <- seaturtle_words_sentiments %>%
group_by(sentiment) %>% summarize(n = n()) %>%
arrange(desc(n))
This time I was able to see a variation in the three emotions, which gave hope for better data to work with.
pos_stt_id <- seaturtle_words_sentiments %>%
filter(sentiment == "positive") %>% distinct(status_id)
stt %>% inner_join(pos_stt_id, by = "status_id") %>%
select(text) %>% slice(1:10)
## # A tibble: 10 x 1
## text
## <chr>
## 1 "Lot of fun, Search & Recovery specialty at Curacao ... \U0001f60e\U0001~
## 2 "A lot of fun, do your IDD navigation specialty under the sun... \U0001f60e\~
## 3 Sea Turtle Vinyl Stickers https://t.co/RTuf8EltB4 #naturecuts #vinyl #vinyls~
## 4 Sea Turtle Confetti #naturecuts #confetti #cutout #partysupplies #favors #de~
## 5 Dolphin Applique #naturecuts#favors #decoration #party #event #birthday #wed~
## 6 "Help make a difference on your #travels in 2020! \U0001f49a\U0001f30d We've~
## 7 Spent 16 days roaming around: California, Hawaii, & Mexico. I love being~
## 8 "@ejn_greencareer #SeaTurtle hatchlings found & safety released into oce~
## 9 SEA TURTLE Print Nautical https://t.co/V6xJmZTIkQ via @EtsySocial #etsymntt ~
## 10 HOT: Great work by the #Vietnam authorities saw an endangered green #seaturt~
sad_stt_id <- seaturtle_words_sentiments %>% filter(sentiment == "sadness") %>%
distinct(status_id)
stt %>% inner_join(sad_stt_id, by = "status_id") %>% select(text) %>% slice(1:10)
## # A tibble: 10 x 1
## text
## <chr>
## 1 Turtle art prints, Hawaiian art, Kauai art prints, Hawaii painting, Hawaiian~
## 2 Sea Turtle Painting Hawaii Art Sea Turtle Decor Sea Turtle Wall Art Kauai Po~
## 3 Blue Sea Sediment Stone Sea Turtle Pendant https://t.co/oX7obzsxaj #Etsy #Fa~
## 4 "Green Sea Turtle\n.\nToo cool for you or me, the green sea turtle always se~
## 5 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 6 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 7 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 8 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 9 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 10 Sea turtle key hanger, hand painted key hanger, beach hut key rack, seaside ~
Although, the top five tweets are repeats of the sad segment, here you can see some more appropriate tweets in line with the sentiment on tweets six through nine.
anger_stt_id <- seaturtle_words_sentiments %>% filter(sentiment == "anger") %>%
distinct(status_id)
stt %>% inner_join(anger_stt_id, by = "status_id") %>% select(text) %>% slice(1:10)
## # A tibble: 10 x 1
## text
## <chr>
## 1 "This could be the biggest #SeaTurtle swarm ever filmed - Hundreds of thousa~
## 2 HOT: Great work by the #Vietnam authorities saw an endangered green #seaturt~
## 3 HOT: Great work by the #Vietnam authorities saw an endangered green #seaturt~
## 4 Blue Sea Sediment Stone Sea Turtle Pendant https://t.co/oX7obzsxaj #Etsy #Fa~
## 5 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 6 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 7 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 8 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 9 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 10 "Grumpy sea turtle. Maybe she's unhappy with our treatment of the oceans and~
I wanted to try and use the country code of the specific user who did the tweets to try and plot where these people were located that were discussing sea turtles. I tried multiple things to pull in country to my sentiment tibbles. In the end the code that did not return an error message was:
seaturtle_words_sentiments2 <- merge(seaturtle_words_sentiments2, stt, "status_id")
However, it still did not work to what I wanted as you can see there is no country column in this tibble.
head(seaturtle_words_sentiments2)
## # A tibble: 6 x 2
## sentiment n
## <chr> <int>
## 1 positive 195
## 2 joy 74
## 3 anticipation 62
## 4 trust 54
## 5 negative 53
## 6 sadness 34
Therefore, when I tried to code it into a graph it could not pull the data. I ended up coding the sentiments without country data. However, it was not what I wanted to compare to.
ggplot(seaturtle_words_sentiments2, aes(x = sentiment, y = n)) +
geom_bar(stat = "identity", position = "dodge") + xlab("Sentiment") + ylab("Count") +
theme(axis.text.x = element_text(angle = 90,
hjust = 1))
I find R fascinating and can see how it potentially makes analysis so much easier and efficient than Excel. However, as we are approaching the end of this semester I find this course has taught me some basics but has more highlighted how much I don’t know still, as I back into what I see as coding failure time and time again. I look forward to the future analytic courses and hope they increase my little box of R knowledge I have started.