Assignment 4

Overview

This assignemnt we were tasked with taking down data from Twitter and to do something cool with it. I decided to look at data on #turtle, because I love turtles and wanted to see what I could tell from this assignment about this topic on Twitter.

First off, I set up the the developer connection for Twitter as we learned and then it was off to get more data! I find it quite fascinating that Twitter has this integration built to talk with a program software like R, and can tell there is so much one could do with Twitter data having the right tools and know-how. I felt slightly like a bull in a china shop with this assignment though, as personally I feel the assignments tend to highlight more all the things I don’t know about R and data management, rather than what I can do.

library(rtweet)
library(tidytext)
library(stringr)
library(dplyr)
library(ggplot2)

I first looked searched on just #turtle to see what data came back.

num_tweets <- 1000
tt <- search_tweets('#Turtle', n = num_tweets, 
                    include_rts = FALSE)
head(tt)
## # A tibble: 6 x 90
##   user_id status_id created_at          screen_name text  source
##   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
## 1 480074~ 11996905~ 2019-11-27 14:05:01 donniethet~ Oh..~ Donni~
## 2 480074~ 11982397~ 2019-11-23 14:00:01 donniethet~ "Tod~ Donni~
## 3 480074~ 11996893~ 2019-11-27 14:00:01 donniethet~ "Tod~ Donni~
## 4 480074~ 11964278~ 2019-11-18 14:00:01 donniethet~ "Tod~ Donni~
## 5 480074~ 11967902~ 2019-11-19 14:00:01 donniethet~ "Tod~ Donni~
## 6 480074~ 11989658~ 2019-11-25 14:05:01 donniethet~ I'm ~ Donni~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>

I tried a few code variations on source and screen name, but did not find they gave me any insight that I could follow.

turtle_platform <- tt %>% group_by(source) %>% 
  summarize(n = n()) %>% 
  mutate(percent_of_tweets = n/sum(n)) %>%
  arrange(desc(n))
head(tt)
## # A tibble: 6 x 90
##   user_id status_id created_at          screen_name text  source
##   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
## 1 480074~ 11996905~ 2019-11-27 14:05:01 donniethet~ Oh..~ Donni~
## 2 480074~ 11982397~ 2019-11-23 14:00:01 donniethet~ "Tod~ Donni~
## 3 480074~ 11996893~ 2019-11-27 14:00:01 donniethet~ "Tod~ Donni~
## 4 480074~ 11964278~ 2019-11-18 14:00:01 donniethet~ "Tod~ Donni~
## 5 480074~ 11967902~ 2019-11-19 14:00:01 donniethet~ "Tod~ Donni~
## 6 480074~ 11989658~ 2019-11-25 14:05:01 donniethet~ I'm ~ Donni~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>
turtle_platform %>% slice(1:10)
## # A tibble: 10 x 3
##    source                  n percent_of_tweets
##    <chr>               <int>             <dbl>
##  1 Tweets for Turtles    180            0.205 
##  2 Twitter for iPhone    143            0.163 
##  3 Instagram             132            0.151 
##  4 Twitter Web App       118            0.135 
##  5 Twitter for Android    73            0.0833
##  6 IFTTT                  40            0.0457
##  7 Hootsuite Inc.         32            0.0365
##  8 Twitter Web Client     29            0.0331
##  9 TweetDeck              26            0.0297
## 10 Buffer                 16            0.0183
tt %>% group_by(screen_name) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>% 
  arrange(desc(n)) %>% slice(1:10)
## # A tibble: 10 x 3
##    screen_name         n percent_of_tweets
##    <chr>           <int>             <dbl>
##  1 aTurtlebot        180            0.205 
##  2 TMNT_Wiz           27            0.0308
##  3 kame_fuji          15            0.0171
##  4 GreenieTurtle      14            0.0160
##  5 TurtleAloha        14            0.0160
##  6 kamepi24           13            0.0148
##  7 donnietheturtle    12            0.0137
##  8 NatureCutsTags     12            0.0137
##  9 StarCrystalDel     10            0.0114
## 10 TPE_connect        10            0.0114

Next, I looked at the words used in the tweets to determine what is being discussed when #turtle is used. (Outside of just turtles, of course.)

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#A]))"
turtle_words <- tt %>% select(status_id, text) %>% 
  filter(!str_detect(text, '^"')) %>% 
  mutate(text = str_replace_all(text,
                                "https:t.co/[A-Za-z\\d]+|&amp;",
                                "")) %>% 
  unnest_tokens(word, text, token = "regex", 
                pattern = reg) %>%
  filter(!word %in% stop_words$word, str_detect(
    word, "[a-z]"))
turtle_words <- turtle_words %>% group_by(word) %>% 
  summarize(n = n()) %>% 
  mutate(percent_of_tweets = n/sum(n)) %>% 
  arrange(desc(n)) %>% top_n(20) 
## Selecting by percent_of_tweets
head(turtle_words)
## # A tibble: 6 x 3
##   word           n percent_of_tweets
##   <chr>      <int>             <dbl>
## 1 https        976            0.0806
## 2 #turtle      871            0.0720
## 3 #plastic     282            0.0233
## 4 #cute        206            0.0170
## 5 turtle       193            0.0159
## 6 #turtlebot   182            0.0150

This gave me a bit more to work with, and you can see that the second return is #plastic, which leads me to believe there are many tweets involving pollution and sea turtles that we could deal with.

Now, I tried to plot the count of words used with #turtle. Unfortunately, while I was able to generate the graph, I could not get the y-axis with count to work correctly. It seemed to be setting all count to “1” and I tried several different graphs and attempted scaling, but could not determine why my integer count that I can see in the tibble did not translate to the graph.

turtle_words %>% count(word, sort = TRUE) %>% top_n(15) %>% 
  mutate(word = reorder(word, n)) %>% ggplot(aes(x = word, y = n)) + 
  geom_col() + xlab(NULL) + coord_flip() + labs(x = "Top Turtle Word Use", 
                                                y = "Count", 
                                                title = "Top Twitter Searches on #Turtle")

turtle_words %>% count(word, sort = TRUE) %>% top_n(15) %>% 
  mutate(word = reorder(word, n)) %>% ggplot(aes(x = word, y = n)) + 
  geom_col() + xlab(NULL) + coord_flip() + labs(x = "Top Turtle Word Use", 
                                                y = "Count", 
                                                title = "Top Twitter Searches on #Turtle") +
  ylim(0, 10)

A Deeper Dive

After looking at the words used in #turtle tweets I wanted to see if more information about these tweets could be gleaned by adding in the sentiment lexicon and looking at the tweets themselves based upon emotion attribute. It returned the list of common range of sentiments that you would expect to find.

turtle_words2 <- tt %>% select(status_id, text) %>% 
  filter(!str_detect(text, '^"')) %>% 
  mutate(text = str_replace_all(text,
                                "https:t.co/[A-Za-z\\d]+|&amp;",
                                "")) %>% 
  unnest_tokens(word, text, token = "regex", 
                pattern = reg) %>%
  filter(!word %in% stop_words$word, str_detect(
    word, "[a-z]"))
nrc <- get_sentiments("nrc") %>%
  select(word, sentiment)
head(nrc)
## # A tibble: 6 x 2
##   word      sentiment
##   <chr>     <chr>    
## 1 abacus    trust    
## 2 abandon   fear     
## 3 abandon   negative 
## 4 abandon   sadness  
## 5 abandoned anger    
## 6 abandoned fear
turtle_words2_sentiments <- turtle_words2 %>% 
  inner_join(nrc, by = "word")
turtle_words2_sentiments %>% 
  group_by(sentiment) %>% summarize(n = n()) %>% 
  arrange(desc(n))
## # A tibble: 10 x 2
##    sentiment        n
##    <chr>        <int>
##  1 positive       601
##  2 joy            303
##  3 anticipation   252
##  4 trust          238
##  5 negative       200
##  6 fear           111
##  7 surprise       104
##  8 sadness         95
##  9 disgust         69
## 10 anger           66
Positive #Turtle

Next, I pulled the positive posts to look at them more closely. However, when I did that I found the selection seemed to consist of tweets about jewelry, which was not what I was looking for.

pos_tt_id <- turtle_words2_sentiments %>% 
  filter(sentiment == "positive") %>% distinct(status_id)
tt %>% inner_join(pos_tt_id, by = "status_id") %>% 
  select(text) %>% slice(1:10)
## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
##  2 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
##  3 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
##  4 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
##  5 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
##  6 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
##  7 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
##  8 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
##  9 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
## 10 "Today's Routine:\n\nSunrise: 7:00 am\nBreakfast: 7:05 am\nDinner: 7:30 pm\n~
Sad #Turtle

As the positive sentiment was pulling information not about real turtles, I thought looking at a sentiment more on the negative spectrum of emotions it might show a different picture. However, when looking at the sad sentiment, the returned tweets were the same jewelry ones found in the positive return.

sad_tt_id <- turtle_words2_sentiments %>% filter(sentiment == "sadness") %>% 
  distinct(status_id)
tt %>% inner_join(sad_tt_id, by = "status_id") %>% select(text) %>% slice(1:10)
## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 I'm not fat... I just have a big shell. #Java #Turtle #NinjaTurtles          
##  2 "But most amazing ritual is that soft shell #turtle are left at pond of haya~
##  3 "@Activision @InfinityWard @ATVIAssist @JoeCecot @ashtonisVULCAN #CallofDuty~
##  4 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  5 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  6 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  7 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
##  8 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  9 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 10 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
Anger #Turtle

Not to be daunted, I tried anger instead. Again looking for that search maybe talking about pollution as shown in the #plastic top results. Unfortuntely, yet again my results still showed the same tweets as in my other returns.

anger_tt_id <- turtle_words2_sentiments %>% filter(sentiment == "anger") %>% 
  distinct(status_id)
tt %>% inner_join(anger_tt_id, by = "status_id") %>% select(text) %>% slice(1:10)
## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 I'm not fat... I just have a big shell. #Java #Turtle #NinjaTurtles          
##  2 "This #kitty prefers a #turtle over a teddy bear or cat nip.....~ #Cuteness\~
##  3 Tortious Confetti #naturecuts #confetti #cutout #partysupplies #favors #deco~
##  4 "But most amazing ritual is that soft shell #turtle are left at pond of haya~
##  5 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  6 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  7 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  8 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
##  9 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 10 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~

Tactical Change

So, my data was not giving me something I felt like I could work with. I could not figure out how to take the data I had already pulled and filter out tweets about generic turtle items such as jewelry. So instead I ran a more narrow seach on #seaturtle. I followed similar steps as executed above on #turtle.

stt <- search_tweets('#seaturtle', n = num_tweets, 
                     include_rts = FALSE)
head(stt)
## # A tibble: 6 x 90
##   user_id status_id created_at          screen_name text  source
##   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
## 1 116855~ 11996897~ 2019-11-27 14:01:35 Dimpled548~ "Wha~ Twitt~
## 2 116855~ 11965854~ 2019-11-19 00:26:09 Dimpled548~ "Bes~ Tweet~
## 3 823792~ 11985623~ 2019-11-24 11:21:41 RGDives     "Clo~ IFTTT 
## 4 823792~ 11982365~ 2019-11-23 13:47:00 RGDives     "Lot~ IFTTT 
## 5 823792~ 11967518~ 2019-11-19 11:27:32 RGDives     "Cru~ IFTTT 
## 6 823792~ 11996734~ 2019-11-27 12:57:04 RGDives     "Rel~ IFTTT 
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>
stt %>% group_by(screen_name) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>% 
  arrange(desc(n)) %>% slice(1:10)
## # A tibble: 10 x 3
##    screen_name         n percent_of_tweets
##    <chr>           <int>             <dbl>
##  1 Makalewakan2       14            0.0773
##  2 RGDives             5            0.0276
##  3 cehart03            4            0.0221
##  4 NomadicBrits        4            0.0221
##  5 AnthonyCatucci      3            0.0166
##  6 FallHolidaze        3            0.0166
##  7 KauaiMarionette     3            0.0166
##  8 NatureCutsTags      3            0.0166
##  9 sebphotog           3            0.0166
## 10 WIDECAST1           3            0.0166
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#A]))"
seaturtle_words <- stt %>% select(status_id, text) %>% 
  filter(!str_detect(text, '^"')) %>% 
  mutate(text = str_replace_all(text,
                                "https:t.co/[A-Za-z\\d]+|&amp;",
                                "")) %>% 
  unnest_tokens(word, text, token = "regex", 
                pattern = reg) %>%
  filter(!word %in% stop_words$word, str_detect(
    word, "[a-z]"))
seaturtle_words_sentiments <- seaturtle_words %>% 
  inner_join(nrc, by = "word")
seaturtle_words_sentiments2 <- seaturtle_words_sentiments %>% 
  group_by(sentiment) %>% summarize(n = n()) %>% 
  arrange(desc(n))

This time I was able to see a variation in the three emotions, which gave hope for better data to work with.

Positive #seaturtle
pos_stt_id <- seaturtle_words_sentiments %>% 
  filter(sentiment == "positive") %>% distinct(status_id)
stt %>% inner_join(pos_stt_id, by = "status_id") %>% 
  select(text) %>% slice(1:10)
## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 "Lot of fun, Search &amp; Recovery specialty at Curacao ... \U0001f60e\U0001~
##  2 "A lot of fun, do your IDD navigation specialty under the sun... \U0001f60e\~
##  3 Sea Turtle Vinyl Stickers https://t.co/RTuf8EltB4 #naturecuts #vinyl #vinyls~
##  4 Sea Turtle Confetti #naturecuts #confetti #cutout #partysupplies #favors #de~
##  5 Dolphin Applique #naturecuts#favors #decoration #party #event #birthday #wed~
##  6 "Help make a difference on your #travels in 2020! \U0001f49a\U0001f30d We've~
##  7 Spent 16 days roaming around: California, Hawaii, &amp; Mexico. I love being~
##  8 "@ejn_greencareer #SeaTurtle hatchlings found &amp; safety released into oce~
##  9 SEA TURTLE Print Nautical https://t.co/V6xJmZTIkQ via @EtsySocial #etsymntt ~
## 10 HOT: Great work by the #Vietnam authorities saw an endangered green #seaturt~
Sad #seaturtle
sad_stt_id <- seaturtle_words_sentiments %>% filter(sentiment == "sadness") %>% 
  distinct(status_id)
stt %>% inner_join(sad_stt_id, by = "status_id") %>% select(text) %>% slice(1:10)
## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 Turtle art prints, Hawaiian art, Kauai art prints, Hawaii painting, Hawaiian~
##  2 Sea Turtle Painting Hawaii Art Sea Turtle Decor Sea Turtle Wall Art Kauai Po~
##  3 Blue Sea Sediment Stone Sea Turtle Pendant https://t.co/oX7obzsxaj #Etsy #Fa~
##  4 "Green Sea Turtle\n.\nToo cool for you or me, the green sea turtle always se~
##  5 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  6 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  7 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  8 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  9 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 10 Sea turtle key hanger, hand painted key hanger, beach hut key rack, seaside ~
Anger #seaturtle

Although, the top five tweets are repeats of the sad segment, here you can see some more appropriate tweets in line with the sentiment on tweets six through nine.

anger_stt_id <- seaturtle_words_sentiments %>% filter(sentiment == "anger") %>% 
  distinct(status_id)
stt %>% inner_join(anger_stt_id, by = "status_id") %>% select(text) %>% slice(1:10)
## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 "This could be the biggest #SeaTurtle swarm ever filmed - Hundreds of thousa~
##  2 HOT: Great work by the #Vietnam authorities saw an endangered green #seaturt~
##  3 HOT: Great work by the #Vietnam authorities saw an endangered green #seaturt~
##  4 Blue Sea Sediment Stone Sea Turtle Pendant https://t.co/oX7obzsxaj #Etsy #Fa~
##  5 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  6 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  7 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  8 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  9 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 10 "Grumpy sea turtle. Maybe she's unhappy with our treatment of the oceans and~

Using the data

I wanted to try and use the country code of the specific user who did the tweets to try and plot where these people were located that were discussing sea turtles. I tried multiple things to pull in country to my sentiment tibbles. In the end the code that did not return an error message was:

seaturtle_words_sentiments2 <- merge(seaturtle_words_sentiments2, stt, "status_id")

However, it still did not work to what I wanted as you can see there is no country column in this tibble.

head(seaturtle_words_sentiments2)
## # A tibble: 6 x 2
##   sentiment        n
##   <chr>        <int>
## 1 positive       195
## 2 joy             74
## 3 anticipation    62
## 4 trust           54
## 5 negative        53
## 6 sadness         34

Therefore, when I tried to code it into a graph it could not pull the data. I ended up coding the sentiments without country data. However, it was not what I wanted to compare to.

ggplot(seaturtle_words_sentiments2, aes(x = sentiment, y = n)) +
  geom_bar(stat = "identity", position = "dodge") + xlab("Sentiment") + ylab("Count") +
  theme(axis.text.x = element_text(angle = 90,
                                   hjust = 1))

Conclusion

I find R fascinating and can see how it potentially makes analysis so much easier and efficient than Excel. However, as we are approaching the end of this semester I find this course has taught me some basics but has more highlighted how much I don’t know still, as I back into what I see as coding failure time and time again. I look forward to the future analytic courses and hope they increase my little box of R knowledge I have started.