Assignment 4

Overview

This assignemnt we were tasked with taking down data from Twitter and to do something cool with it. I decided to look at data on #turtle, because I love turtles and wanted to see what I could tell from this assignment about this topic on Twitter.

First off, I set up the the developer connection for Twitter as we learned and then it was off to get more data! I find it quite fascinating that Twitter has this integration built to talk with a program software like R, and can tell there is so much one could do with Twitter data having the right tools and know-how. I felt slightly like a bull in a china shop with this assignment though, as personally I feel the assignments tend to highlight more all the things I don’t know about R and data management, rather than what I can do.

app <- "MBA676 Assignment4"
consumer_key <- "3is9uJeDfw2S2jRm6ZB8gN79Q"
consumer_secret <-"JvOjFhtXAO0dmzfzKsmWUqZ2Xa5eMwOxPCtzMQcPBtYJFDJwPd"
access_token <- "402207199-i3cRQ0UVvJtM9bzuuUqiDpV5Sqo2myq8Xx0UEybz"
access_secret <- "BOYbHOvzkiIywGu8I3BD57hc8Y1MuOe4caOBRP2dnNuZs"
my_token <- create_token(app = app, consumer_key = consumer_key,
                         consumer_secret = consumer_secret,
                         access_token = access_token,
                         access_secret = access_secret)
(echo = FALSE)

## [1] FALSE

I first looked searched on just #turtle to see what data came back.

num_tweets <- 1000
tt <- search_tweets('#Turtle', n = num_tweets, 
                    include_rts = FALSE)
head(tt)

## # A tibble: 6 x 90
##   user_id status_id created_at          screen_name text  source
##   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
## 1 105368~ 11995505~ 2019-11-27 04:48:36 Thedailyme~ Mega~ Twitt~
## 2 852037~ 11995469~ 2019-11-27 04:34:03 Orange2016~ #<U+81ED><U+6854> ~ Twitt~
## 3 852037~ 11970859~ 2019-11-20 09:35:06 Orange2016~ #<U+81ED><U+6854> ~ Twitt~
## 4 108381~ 11995399~ 2019-11-27 04:06:22 crochetand~ Had ~ Twitt~
## 5 764651~ 11995399~ 2019-11-27 04:06:22 CecilsJust~ This~ Faceb~
## 6 307109~ 11995395~ 2019-11-27 04:04:48 RealCoastal This~ Faceb~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>

I tried a few code variations on source and screen name, but did not find they gave me any insight that I could follow.

turtle_platform <- tt %>% group_by(source) %>% 
  summarize(n = n()) %>% 
  mutate(percent_of_tweets = n/sum(n)) %>%
  arrange(desc(n))
head(tt)

## # A tibble: 6 x 90
##   user_id status_id created_at          screen_name text  source
##   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
## 1 105368~ 11995505~ 2019-11-27 04:48:36 Thedailyme~ Mega~ Twitt~
## 2 852037~ 11995469~ 2019-11-27 04:34:03 Orange2016~ #<U+81ED><U+6854> ~ Twitt~
## 3 852037~ 11970859~ 2019-11-20 09:35:06 Orange2016~ #<U+81ED><U+6854> ~ Twitt~
## 4 108381~ 11995399~ 2019-11-27 04:06:22 crochetand~ Had ~ Twitt~
## 5 764651~ 11995399~ 2019-11-27 04:06:22 CecilsJust~ This~ Faceb~
## 6 307109~ 11995395~ 2019-11-27 04:04:48 RealCoastal This~ Faceb~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>

turtle_platform %>% slice(1:10)

## # A tibble: 10 x 3
##    source                  n percent_of_tweets
##    <chr>               <int>             <dbl>
##  1 Tweets for Turtles    185            0.214 
##  2 Twitter for iPhone    147            0.170 
##  3 Instagram             130            0.150 
##  4 Twitter Web App       109            0.126 
##  5 Twitter for Android    72            0.0832
##  6 IFTTT                  42            0.0486
##  7 Hootsuite Inc.         31            0.0358
##  8 Twitter Web Client     30            0.0347
##  9 Buffer                 17            0.0197
## 10 TweetDeck              16            0.0185

tt %>% group_by(screen_name) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>% 
  arrange(desc(n)) %>% slice(1:10)

## # A tibble: 10 x 3
##    screen_name         n percent_of_tweets
##    <chr>           <int>             <dbl>
##  1 aTurtlebot        185            0.214 
##  2 TMNT_Wiz           26            0.0301
##  3 kame_fuji          15            0.0173
##  4 GreenieTurtle      14            0.0162
##  5 kamepi24           14            0.0162
##  6 TurtleAloha        14            0.0162
##  7 NatureCutsTags     11            0.0127
##  8 StarCrystalDel     11            0.0127
##  9 donnietheturtle    10            0.0116
## 10 RedEaredSliderz    10            0.0116

Next, I looked at the words used in the tweets to determine what is being discussed when #turtle is used. (Outside of just turtles, of course.)

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#A]))"
turtle_words <- tt %>% select(status_id, text) %>% 
  filter(!str_detect(text, '^"')) %>% 
  mutate(text = str_replace_all(text,
                                "https:t.co/[A-Za-z\\d]+|&amp;",
                                "")) %>% 
  unnest_tokens(word, text, token = "regex", 
                pattern = reg) %>%
  filter(!word %in% stop_words$word, str_detect(
    word, "[a-z]"))
turtle_words <- turtle_words %>% group_by(word) %>% 
  summarize(n = n()) %>% 
  mutate(percent_of_tweets = n/sum(n)) %>% 
  arrange(desc(n)) %>% top_n(20)

## Selecting by percent_of_tweets

head(turtle_words)

## # A tibble: 6 x 3
##   word           n percent_of_tweets
##   <chr>      <int>             <dbl>
## 1 https        971            0.0814
## 2 #turtle      860            0.0721
## 3 #plastic     291            0.0244
## 4 #cute        211            0.0177
## 5 turtle       194            0.0163
## 6 #turtlebot   187            0.0157

This gave me a bit more to work with, and you can see that the second return is #plastic, which leads me to believe there are many tweets involving pollution and sea turtles that we could deal with.

Now, I tried to plot the count of words used with #turtle. Unfortunately, while I was able to generate the graph, I could not get the y-axis with count to work correctly. It seemed to be setting all count to “1” and I tried several different graphs and attempted scaling, but could not determine why my integer count that I can see in the tibble did not translate to the graph.

turtle_words %>% count(word, sort = TRUE) %>% top_n(15) %>% 
  mutate(word = reorder(word, n)) %>% ggplot(aes(x = word, y = n)) + 
  geom_col() + xlab(NULL) + coord_flip() + labs(x = "Top Turtle Word Use", 
                                                y = "Count", 
                                                title = "Top Twitter Searches on #Turtle")

turtle_words %>% count(word, sort = TRUE) %>% top_n(15) %>% 
  mutate(word = reorder(word, n)) %>% ggplot(aes(x = word, y = n)) + 
  geom_col() + xlab(NULL) + coord_flip() + labs(x = "Top Turtle Word Use", 
                                                y = "Count", 
                                                title = "Top Twitter Searches on #Turtle") +
  ylim(0, 10)

A Deeper Dive

After looking at the words used in #turtle tweets I wanted to see if more information about these tweets could be gleaned by adding in the sentiment lexicon and looking at the tweets themselves based upon emotion attribute. It returned the list of common range of sentiments that you would expect to find.

turtle_words2 <- tt %>% select(status_id, text) %>% 
  filter(!str_detect(text, '^"')) %>% 
  mutate(text = str_replace_all(text,
                                "https:t.co/[A-Za-z\\d]+|&amp;",
                                "")) %>% 
  unnest_tokens(word, text, token = "regex", 
                pattern = reg) %>%
  filter(!word %in% stop_words$word, str_detect(
    word, "[a-z]"))
nrc <- get_sentiments("nrc") %>%
  select(word, sentiment)
head(nrc)

## # A tibble: 6 x 2
##   word      sentiment
##   <chr>     <chr>    
## 1 abacus    trust    
## 2 abandon   fear     
## 3 abandon   negative 
## 4 abandon   sadness  
## 5 abandoned anger    
## 6 abandoned fear

turtle_words2_sentiments <- turtle_words2 %>% 
  inner_join(nrc, by = "word")
turtle_words2_sentiments %>% 
  group_by(sentiment) %>% summarize(n = n()) %>% 
  arrange(desc(n))

## # A tibble: 10 x 2
##    sentiment        n
##    <chr>        <int>
##  1 positive       597
##  2 joy            304
##  3 anticipation   256
##  4 trust          234
##  5 negative       179
##  6 surprise       109
##  7 sadness         97
##  8 fear            96
##  9 anger           62
## 10 disgust         60

Positive #Turtle

Next, I pulled the positive posts to look at them more closely. However, when I did that I found the selection seemed to consist of tweets about jewelry, which was not what I was looking for.

pos_tt_id <- turtle_words2_sentiments %>% 
  filter(sentiment == "positive") %>% distinct(status_id)
tt %>% inner_join(pos_tt_id, by = "status_id") %>% 
  select(text) %>% slice(1:10)

## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 Megan Brittany of 32nd East Side said she found a turtle intruder in her apa~
##  2 This could be the biggest #turtle swarm ever filmed at sea - They were... ht~
##  3 This could be the biggest #turtle swarm ever filmed at sea - “This is the...~
##  4 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  5 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  6 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  7 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  8 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
##  9 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
## 10 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~

Sad #Turtle

As the positive sentiment was pulling information not about real turtles, I thought looking at a sentiment more on the negative spectrum of emotions it might show a different picture. However, when looking at the sad sentiment, the returned tweets were the same jewelry ones found in the positive return.

sad_tt_id <- turtle_words2_sentiments %>% filter(sentiment == "sadness") %>% 
  distinct(status_id)
tt %>% inner_join(sad_tt_id, by = "status_id") %>% select(text) %>% slice(1:10)

## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 Megan Brittany of 32nd East Side said she found a turtle intruder in her apa~
##  2 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  3 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  4 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  5 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  6 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
##  7 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  8 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  9 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
## 10 Sea Turtle Painting Hawaii Art Sea Turtle Decor Sea Turtle Wall Art Kauai Po~

Anger #Turtle

Not to be daunted, I tried anger instead. Again looking for that search maybe talking about pollution as shown in the #plastic top results. Unfortuntely, yet again my results still showed the same tweets as in my other returns.

anger_tt_id <- turtle_words2_sentiments %>% filter(sentiment == "anger") %>% 
  distinct(status_id)
tt %>% inner_join(anger_tt_id, by = "status_id") %>% select(text) %>% slice(1:10)

## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 Megan Brittany of 32nd East Side said she found a turtle intruder in her apa~
##  2 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  3 ".Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  4 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  5 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  6 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
##  7 "0Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  8 "`Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply av~
##  9 "Green Aventurine wire wrapped #turtle earrings! So cute! Limited supply ava~
## 10 #turtle #plushtoy #stuffedanimals Baby Greenie and the Donuts: One time when~

Tactical Change

So, my data was not giving me something I felt like I could work with. I could not figure out how to take the data I had already pulled and filter out tweets about generic turtle items such as jewelry. So instead I ran a more narrow seach on #seaturtle. I followed similar steps as executed above on #turtle.

stt <- search_tweets('#seaturtle', n = num_tweets, 
                     include_rts = FALSE)
head(stt)

## # A tibble: 6 x 90
##   user_id status_id created_at          screen_name text  source
##   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
## 1 315200~ 11995365~ 2019-11-27 03:53:04 AnthonyCat~ "Sea~ Twitt~
## 2 315200~ 11980452~ 2019-11-23 01:07:02 AnthonyCat~ "Woo~ Twitt~
## 3 315200~ 11990911~ 2019-11-25 22:23:05 AnthonyCat~ "WOW~ Twitt~
## 4 235010~ 11995225~ 2019-11-27 02:57:12 smarturban~ This~ Twitt~
## 5 235010~ 11977028~ 2019-11-22 02:26:20 smarturban~ Marc~ Twitt~
## 6 702145~ 11995179~ 2019-11-27 02:38:55 OfficialGa~ "10 ~ Twitt~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>

stt %>% group_by(screen_name) %>% 
  summarize(n = n()) %>%
  mutate(percent_of_tweets = n/sum(n)) %>% 
  arrange(desc(n)) %>% slice(1:10)

## # A tibble: 10 x 3
##    screen_name         n percent_of_tweets
##    <chr>           <int>             <dbl>
##  1 Makalewakan2       15            0.0838
##  2 NomadicBrits        5            0.0279
##  3 cehart03            4            0.0223
##  4 RGDives             4            0.0223
##  5 AnthonyCatucci      3            0.0168
##  6 FallHolidaze        3            0.0168
##  7 KauaiMarionette     3            0.0168
##  8 NatureCutsTags      3            0.0168
##  9 sebphotog           3            0.0168
## 10 StylingTech         3            0.0168

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#A]))"
seaturtle_words <- stt %>% select(status_id, text) %>% 
  filter(!str_detect(text, '^"')) %>% 
  mutate(text = str_replace_all(text,
                                "https:t.co/[A-Za-z\\d]+|&amp;",
                                "")) %>% 
  unnest_tokens(word, text, token = "regex", 
                pattern = reg) %>%
  filter(!word %in% stop_words$word, str_detect(
    word, "[a-z]"))
seaturtle_words_sentiments <- seaturtle_words %>% 
  inner_join(nrc, by = "word")
seaturtle_words_sentiments2 <- seaturtle_words_sentiments %>% 
  group_by(sentiment) %>% summarize(n = n()) %>% 
  arrange(desc(n))

This time I was able to see a variation in the three emotions, which gave hope for better data to work with.

Positive #seaturtle

pos_stt_id <- seaturtle_words_sentiments %>% 
  filter(sentiment == "positive") %>% distinct(status_id)
stt %>% inner_join(pos_stt_id, by = "status_id") %>% 
  select(text) %>% slice(1:10)

## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 "Sea Turtle!!! This is big 40”x20” and it’s awesome!!! This is ready to go f~
##  2 "Woohoo!!! Just finished this awesome 40”x17” Sea Turtle!! I love the colors~
##  3 "WOW!!! This is an incredible 40”x15” front facing Sea Turtle!!! Ready to go~
##  4 This could be the biggest turtle swarm ever filmed at sea https://t.co/5NJMj~
##  5 Marco Island could have new sea turtle ordinance for 2020 nesting season htt~
##  6 Sea Turtle Painting Hawaii Art Sea Turtle Decor Sea Turtle Wall Art Kauai Po~
##  7 Turtle art prints, Hawaiian art, Kauai art prints, Hawaii painting, Hawaiian~
##  8 Gemstone Sea Turtle Pendant https://t.co/2ZcFAJyWon #FallHolidaze #Etsy #Sea~
##  9 Gemstone Sea Turtle Pendant https://t.co/2ZcFAJyWon #FallHolidaze #Etsy #Sea~
## 10 Blue Sea Sediment Stone Sea Turtle Pendant https://t.co/oX7obzsxaj #Etsy #Fa~

Sad #seaturtle

sad_stt_id <- seaturtle_words_sentiments %>% filter(sentiment == "sadness") %>% 
  distinct(status_id)
stt %>% inner_join(sad_stt_id, by = "status_id") %>% select(text) %>% slice(1:10)

## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 Sea Turtle Painting Hawaii Art Sea Turtle Decor Sea Turtle Wall Art Kauai Po~
##  2 Turtle art prints, Hawaiian art, Kauai art prints, Hawaii painting, Hawaiian~
##  3 Blue Sea Sediment Stone Sea Turtle Pendant https://t.co/oX7obzsxaj #Etsy #Fa~
##  4 "Green Sea Turtle\n.\nToo cool for you or me, the green sea turtle always se~
##  5 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  6 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  7 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  8 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  9 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
## 10 Sea turtle key hanger, hand painted key hanger, beach hut key rack, seaside ~

Anger #seaturtle

Although, the top five tweets are repeats of the sad segment, here you can see some more appropriate tweets in line with the sentiment on tweets six through nine.

anger_stt_id <- seaturtle_words_sentiments %>% filter(sentiment == "anger") %>% 
  distinct(status_id)
stt %>% inner_join(anger_stt_id, by = "status_id") %>% select(text) %>% slice(1:10)

## # A tibble: 10 x 1
##    text                                                                         
##    <chr>                                                                        
##  1 Blue Sea Sediment Stone Sea Turtle Pendant https://t.co/oX7obzsxaj #Etsy #Fa~
##  2 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  3 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  4 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  5 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  6 He's not really grumpy! https://t.co/TkDYbZ1utZ #seaturtle children #babies ~
##  7 "Grumpy sea turtle. Maybe she's unhappy with our treatment of the oceans and~
##  8 "Devastating news \U0001f422 #seaturtle\nhttps://t.co/dMXDuTKrvC"            
##  9 Increase of #seaturtle death in Bengkulu. suspected caused by increase in pl~
## 10 "Our recent tweet is evidence of exactly this! Read more from @MongabayID ab~

Using the data

I wanted to try and use the country code of the specific user who did the tweets to try and plot where these people were located that were discussing sea turtles. I tried multiple things to pull in country to my sentiment tibbles. In the end the code that did not return an error message was:

seaturtle_words_sentiments2 <- merge(seaturtle_words_sentiments2, stt, "status_id")

However, it still did not work to what I wanted as you can see there is no country column in this tibble.

head(seaturtle_words_sentiments2)

## # A tibble: 6 x 2
##   sentiment        n
##   <chr>        <int>
## 1 positive       187
## 2 joy             70
## 3 anticipation    55
## 4 negative        48
## 5 trust           47
## 6 sadness         35

Therefore, when I tried to code it into a graph it could not pull the data. I ended up coding the sentiments without country data. However, it was not what I wanted to compare to.

ggplot(seaturtle_words_sentiments2, aes(x = sentiment, y = n)) +
  geom_bar(stat = "identity", position = "dodge") + xlab("Sentiment") + ylab("Count") +
  theme(axis.text.x = element_text(angle = 90,
                                   hjust = 1))

Conclusion

I find R fascinating and can see how it potentially makes analysis so much easier and efficient than Excel. However, as we are approaching the end of this semester I find this course has taught me some basics but has more highlighted how much I don’t know still, as I back into what I see as coding failure time and time again. I look forward to the future analytic courses and hope they increase my little box of R knowledge I have started.

MBA676 Assignment4

Amanda Casey

11/25/2019