Brand perception is a special result of a consumer’s experiences with a brand. With the Emergence of social media high quality of structured and unstructured information shared through various sources such as the data generated by Twitter or Facebook which depicts user sentiments.
The purpose of this article is to find image and awareness of the brand among the consumers, what they really think and feel about #nike as a brand though sentiment analysis.
Nike just announced its partnership with Colin Kaepernick to be the face of the 30th anniversary of its JustDoIt campaign. They used the slogan “Believe in something, even if it means sacrificing everything.” Kaepernick had made a controversial decision not to stand up during the national anthem, as a protest to police brutality, a while back. This has stirred a heated debate, and became a big national issue especially when Donald Trump commented on it.
This dataset contains 5,000 tweets that contain the hashtag #JustDoIt. All tweets happened on September 7, 2018, which is days after Nike made its announcement to endorse Kaepernick.
https://towardsdatascience.com/with-the-emergence-of-social-media-high-quality-of-structured-and-unstructured-information-shared-b16103f8bb2e http://rpubs.com/Malvika30/Brand-Perception-Sentiment-Analysis-R https://machinelearnings.co/how-to-apply-ai-to-marketing-31f45e517dcf https://www.kaggle.com/eliasdabbas/5000-justdoit-tweets-dataset
Text mining digunakan untuk mengolah data text dimana dari data text tersebut akan diextract kata-kata yang memiliki makna dengan tujuan untuk membuat suatu prediktif model.
This dataset contains 5,000 tweets that contain the hashtag #JustDoIt. All tweets happened on September 7, 2018, which is days after Nike made its announcement to endorse Kaepernick.
# load data
tweet <- read.csv("data/justdoit_tweets_2018_09_07_2.csv", encoding = "UTF-8")
str(tweet)## 'data.frame': 5089 obs. of 71 variables:
## $ tweet_coordinates : Factor w/ 105 levels "","{'type': 'Point', 'coordinates': [-0.2, 5.55]}",..: 1 1 1 72 1 1 1 1 1 1 ...
## $ tweet_created_at : Factor w/ 4730 levels "Fri Sep 07 00:17:13 +0000 2018",..: 4730 4729 4728 4727 4726 4725 4724 4723 4722 4721 ...
## $ tweet_display_text_range : Factor w/ 1027 levels "[0, 100]","[0, 101]",..: 254 148 81 267 33 697 268 39 45 23 ...
## $ tweet_entities : Factor w/ 4891 levels "{'hashtags': [{'text': '100cliches', 'indices': [73, 84]}, {'text': 'justsayno', 'indices': [93, 103]}, {'text'"| __truncated__,..: 4405 579 1156 3592 4538 3349 4432 1173 933 508 ...
## $ tweet_extended_entities : Factor w/ 1327 levels "","{'media': [{'id': 1029964547774238720, 'id_str': '1029964547774238720', 'indices': [235, 258], 'media_url': 'ht"| __truncated__,..: 1326 1 1325 1 1324 1 1 1 1317 1323 ...
## $ tweet_favorite_count : int 0 0 0 0 0 0 0 0 0 0 ...
## $ tweet_favorited : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ tweet_full_text : Factor w/ 5076 levels "'Dream Crazy' campaign has @Nike believing in something, even if it means sacrificing everything. https://t.co/"| __truncated__,..: 2443 4088 4460 278 3888 1268 4912 1005 2677 3453 ...
## $ tweet_geo : Factor w/ 105 levels "","{'type': 'Point', 'coordinates': [-1.27847075, 36.82136536]}",..: 1 1 1 50 1 1 1 1 1 1 ...
## $ tweet_id : num 1.04e+18 1.04e+18 1.04e+18 1.04e+18 1.04e+18 ...
## $ tweet_id_str : num 1.04e+18 1.04e+18 1.04e+18 1.04e+18 1.04e+18 ...
## $ tweet_in_reply_to_screen_name : Factor w/ 593 levels "","__mellybeann",..: 1 1 1 1 1 440 1 387 1 1 ...
## $ tweet_in_reply_to_status_id : num NA NA NA NA NA ...
## $ tweet_in_reply_to_status_id_str : num NA NA NA NA NA ...
## $ tweet_in_reply_to_user_id : num NA NA NA NA NA ...
## $ tweet_in_reply_to_user_id_str : num NA NA NA NA NA ...
## $ tweet_is_quote_status : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ tweet_lang : Factor w/ 1 level "en": 1 1 1 1 1 1 1 1 1 1 ...
## $ tweet_metadata : Factor w/ 1 level "{'iso_language_code': 'en', 'result_type': 'recent'}": 1 1 1 1 1 1 1 1 1 1 ...
## $ tweet_place : Factor w/ 309 levels "","{'id': '0008cb6457ff0b55', 'url': 'https://api.twitter.com/1.1/geo/id/0008cb6457ff0b55.json', 'place_type': 'ci"| __truncated__,..: 1 1 1 85 1 1 1 97 1 1 ...
## $ tweet_possibly_sensitive : logi FALSE FALSE FALSE FALSE FALSE NA ...
## $ tweet_quoted_status : Factor w/ 469 levels "","{'created_at': 'Fri Aug 31 18:32:47 +0000 2018', 'id': 1035596276778188800, 'id_str': '1035596276778188800', 'f"| __truncated__,..: 1 1 1 1 1 1 320 1 1 1 ...
## $ tweet_quoted_status_id : num NA NA NA NA NA ...
## $ tweet_quoted_status_id_str : num NA NA NA NA NA ...
## $ tweet_retweet_count : int 0 0 0 0 0 0 0 0 0 0 ...
## $ tweet_retweeted : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ tweet_source : Factor w/ 59 levels "<a href=\"http://dokmz.com\" rel=\"nofollow\">autoposta16</a>",..: 48 21 15 4 12 15 12 15 12 14 ...
## $ tweet_truncated : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ tweet_user : Factor w/ 4639 levels "{'id': 1000408470364065792, 'id_str': '1000408470364065792', 'name': 'raycheal\U0001f3a5\U0001f3ac\U0001f3a4', "| __truncated__,..: 2211 941 2276 882 1264 664 870 3711 1140 128 ...
## $ user_contributors_enabled : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ user_created_at : Factor w/ 4635 levels "Fri Apr 01 15:15:35 +0000 2011",..: 512 165 51 1974 1536 4280 3277 382 4152 4031 ...
## $ user_default_profile : logi TRUE FALSE FALSE TRUE FALSE TRUE ...
## $ user_default_profile_image : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ user_description : Factor w/ 4134 levels "","'96Olympic Champ/'95World Champion/HOFer. 4 bookings: davidghawk@gmail.com https://t.co/XblvzJQDmu - https://t.co/ecTUy1vK8n",..: 2137 704 2738 1082 1855 1400 1859 367 2586 392 ...
## $ user_entities : Factor w/ 1922 levels "","{'description': {'urls': []}}",..: 1014 172 843 2 181 2 188 2 2 696 ...
## $ user_favourites_count : int 307 1178 11864 487 32971 9622 16358 921 14866 15 ...
## $ user_follow_request_sent : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ user_followers_count : int 57983 13241 11377 218 13731 64 11555 88 393 7 ...
## $ user_following : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ user_friends_count : int 48721 5489 2386 965 13629 175 4760 129 412 8 ...
## $ user_geo_enabled : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ user_has_extended_profile : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ user_id : num 3.19e+09 1.84e+07 3.26e+07 1.76e+08 2.23e+07 ...
## $ user_id_str : num 3.19e+09 1.84e+07 3.26e+07 1.76e+08 2.23e+07 ...
## $ user_is_translation_enabled : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ user_is_translator : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ user_lang : Factor w/ 19 levels "","ar","de","en",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ user_listed_count : int 629 150 193 1 181 1 247 2 4 0 ...
## $ user_location : Factor w/ 2325 levels "","'Merica","'Murica, Zion",..: 403 1252 959 1972 760 239 2296 1403 1 708 ...
## $ user_name : Factor w/ 4598 levels "","— myah \U0001f36f\u2728\U0001f34c\U0001f49b",..: 4372 4571 3470 1527 3225 4082 3015 2118 1470 1874 ...
## $ user_notifications : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ user_profile_background_color : Factor w/ 424 levels "","0","0.00E+00",..: 306 72 423 306 33 306 306 388 150 2 ...
## $ user_profile_background_image_url : Factor w/ 21 levels "","http://abs.twimg.com/images/themes/theme1/bg.png",..: 2 7 2 2 16 2 2 1 17 2 ...
## $ user_profile_background_image_url_https: Factor w/ 21 levels "","https://abs.twimg.com/images/themes/theme1/bg.png",..: 2 7 2 2 16 2 2 1 17 2 ...
## $ user_profile_background_tile : logi FALSE TRUE FALSE FALSE FALSE FALSE ...
## $ user_profile_banner_url : Factor w/ 4006 levels "","https://pbs.twimg.com/profile_banners/1000408470364065792/1527354750",..: 1941 822 1995 769 1108 573 758 3219 1 108 ...
## $ user_profile_image_url : Factor w/ 4577 levels "","http://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png",..: 2230 3777 4344 2533 1791 2600 2775 3682 3369 1242 ...
## $ user_profile_image_url_https : Factor w/ 4577 levels "","https://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png",..: 2230 3777 4344 2533 1791 2600 2775 3682 3369 1242 ...
## $ user_profile_link_color : Factor w/ 571 levels "","0","0000FF",..: 132 204 242 132 93 132 20 132 163 124 ...
## $ user_profile_sidebar_border_color : Factor w/ 188 levels "","0","00000B",..: 129 188 188 129 14 129 2 129 88 2 ...
## $ user_profile_sidebar_fill_color : Factor w/ 330 levels "","0","000C29",..: 233 269 307 233 85 233 233 233 149 2 ...
## $ user_profile_text_color : Factor w/ 322 levels "","0","001A80",..: 87 87 128 87 44 87 87 87 104 2 ...
## $ user_profile_use_background_image : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ user_protected : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ user_screen_name : Factor w/ 4635 levels "","___biodun",..: 4363 4603 3396 1344 3122 4020 2856 1950 1300 1817 ...
## $ user_statuses_count : int 91870 618822 48075 1983 24989 2213 179968 2167 12964 16 ...
## $ user_time_zone : logi NA NA NA NA NA NA ...
## $ user_translator_type : Factor w/ 3 levels "","none","regular": 2 2 2 2 2 2 3 2 2 2 ...
## $ user_url : Factor w/ 1850 levels "","http://instagram.com/Khulu_Mosia",..: 944 102 773 1 111 1 118 1 1 626 ...
## $ user_utc_offset : logi NA NA NA NA NA NA ...
## $ user_verified : logi FALSE FALSE TRUE FALSE FALSE FALSE ...
Saat melakukan read data yang berisi text, terkadang perlu diberikan argumen tambahan yaitu encoding = "UTF-8" untuk membaca format-format tulisan yang sesuai dengan UTF-8. Jika ada tulisan yang tidak sesuai dengan format UTF-8, maka akan diubah menjadi suatu kotak kosong.
Lihat 6 data awal menggunakan head()
## tweet_coordinates
## 1
## 2
## 3
## 4 {'type': 'Point', 'coordinates': [-86.45594032, 35.85402047]}
## 5
## 6
## tweet_created_at tweet_display_text_range
## 1 Fri Sep 07 16:25:06 +0000 2018 [0, 75]
## 2 Fri Sep 07 16:24:59 +0000 2018 [0, 237]
## 3 Fri Sep 07 16:24:50 +0000 2018 [0, 176]
## 4 Fri Sep 07 16:24:44 +0000 2018 [0, 88]
## 5 Fri Sep 07 16:24:39 +0000 2018 [0, 132]
## 6 Fri Sep 07 16:24:35 +0000 2018 [17, 96]
## tweet_entities
## 1 {'hashtags': [{'text': 'quote', 'indices': [47, 53]}, {'text': 'motivation', 'indices': [54, 65]}, {'text': 'justdoit', 'indices': [66, 75]}], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1038100853872197632, 'id_str': '1038100853872197632', 'indices': [76, 99], 'media_url': 'http://pbs.twimg.com/media/DmgTOfwVAAAJqoh.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DmgTOfwVAAAJqoh.jpg', 'url': 'https://t.co/J9lLdszdW6', 'display_url': 'pic.twitter.com/J9lLdszdW6', 'expanded_url': 'https://twitter.com/UltraYOUwoman/status/1038100857932394496/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 612, 'h': 612, 'resize': 'fit'}, 'large': {'w': 612, 'h': 612, 'resize': 'fit'}, 'medium': {'w': 612, 'h': 612, 'resize': 'fit'}}}]}
## 2 {'hashtags': [{'text': 'hero', 'indices': [90, 95]}, {'text': 'fdny', 'indices': [96, 101]}, {'text': 'likesforlikes', 'indices': [102, 116]}, {'text': 'promo', 'indices': [117, 123]}, {'text': 'music', 'indices': [124, 130]}, {'text': 'instagood', 'indices': [131, 141]}, {'text': 'instadaily', 'indices': [142, 153]}, {'text': 'postoftheday', 'indices': [154, 167]}, {'text': 'bestoftheday', 'indices': [168, 181]}, {'text': 'justdoit', 'indices': [182, 191]}, {'text': 'nike', 'indices': [192, 197]}, {'text': 'picoftheday', 'indices': [198, 210]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/sFobQ2ukpo', 'expanded_url': 'https://www.facebook.com/241998672453/posts/10156871528042454/', 'display_url': 'facebook.com/241998672453/p…', 'indices': [214, 237]}]}
## 3 {'hashtags': [{'text': 'JustDoIt', 'indices': [127, 136]}, {'text': '4YourMorning', 'indices': [137, 150]}, {'text': '4YourMemeCollection', 'indices': [151, 171]}], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1038100773396041728, 'id_str': '1038100773396041728', 'indices': [177, 200], 'media_url': 'http://pbs.twimg.com/media/DmgTJz9UUAA57tu.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DmgTJz9UUAA57tu.jpg', 'url': 'https://t.co/6ok9qR6k6M', 'display_url': 'pic.twitter.com/6ok9qR6k6M', 'expanded_url': 'https://twitter.com/rachelbogle/status/1038100793147248640/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1957, 'h': 2048, 'resize': 'fit'}, 'medium': {'w': 1147, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 650, 'h': 680, 'resize': 'fit'}}}]}
## 4 {'hashtags': [{'text': 'kapernickeffect', 'indices': [0, 16]}, {'text': 'swoosh', 'indices': [17, 24]}, {'text': 'justdoit', 'indices': [25, 34]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/BhPBnjOkuU', 'expanded_url': 'https://www.instagram.com/p/Bnbna3mhMgN21kk-5cZNvdYBBLoalVeRhnUJh00/?utm_source=ig_twitter_share&igshid=17b366pgjetob', 'display_url': 'instagram.com/p/Bnbna3mhMgN2…', 'indices': [65, 88]}]}
## 5 {'hashtags': [{'text': 'shaquem', 'indices': [74, 82]}, {'text': 'NFL', 'indices': [84, 88]}, {'text': 'Seattle', 'indices': [89, 97]}, {'text': 'Seahawks', 'indices': [98, 107]}, {'text': 'griffin', 'indices': [108, 116]}, {'text': 'JustDoIt', 'indices': [117, 126]}, {'text': 'Nike', 'indices': [127, 132]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/0EbEmwULLF', 'expanded_url': 'https://www.noluckneeded.com/one-hand-one-dream-shaquem-griffin-the-nfl-rookie-t21056.html', 'display_url': 'noluckneeded.com/one-hand-one-d…', 'indices': [48, 71]}], 'media': [{'id': 1038100736595255296, 'id_str': '1038100736595255296', 'indices': [133, 156], 'media_url': 'http://pbs.twimg.com/media/DmgTHq3U4AAYsl3.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DmgTHq3U4AAYsl3.jpg', 'url': 'https://t.co/pr8eosDZS7', 'display_url': 'pic.twitter.com/pr8eosDZS7', 'expanded_url': 'https://twitter.com/NoLuckNeeded/status/1038100745344753665/photo/1', 'type': 'photo', 'sizes': {'medium': {'w': 759, 'h': 420, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 376, 'resize': 'fit'}, 'large': {'w': 759, 'h': 420, 'resize': 'fit'}}}]}
## 6 {'hashtags': [{'text': 'JUSTDOIT', 'indices': [87, 96]}], 'symbols': [], 'user_mentions': [{'screen_name': 'realDonaldTrump', 'name': 'Donald J. Trump', 'id': 25073877, 'id_str': '25073877', 'indices': [0, 16]}], 'urls': []}
## tweet_extended_entities
## 1 {'media': [{'id': 1038100853872197632, 'id_str': '1038100853872197632', 'indices': [76, 99], 'media_url': 'http://pbs.twimg.com/media/DmgTOfwVAAAJqoh.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DmgTOfwVAAAJqoh.jpg', 'url': 'https://t.co/J9lLdszdW6', 'display_url': 'pic.twitter.com/J9lLdszdW6', 'expanded_url': 'https://twitter.com/UltraYOUwoman/status/1038100857932394496/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 612, 'h': 612, 'resize': 'fit'}, 'large': {'w': 612, 'h': 612, 'resize': 'fit'}, 'medium': {'w': 612, 'h': 612, 'resize': 'fit'}}}]}
## 2
## 3 {'media': [{'id': 1038100773396041728, 'id_str': '1038100773396041728', 'indices': [177, 200], 'media_url': 'http://pbs.twimg.com/media/DmgTJz9UUAA57tu.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DmgTJz9UUAA57tu.jpg', 'url': 'https://t.co/6ok9qR6k6M', 'display_url': 'pic.twitter.com/6ok9qR6k6M', 'expanded_url': 'https://twitter.com/rachelbogle/status/1038100793147248640/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1957, 'h': 2048, 'resize': 'fit'}, 'medium': {'w': 1147, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 650, 'h': 680, 'resize': 'fit'}}}]}
## 4
## 5 {'media': [{'id': 1038100736595255296, 'id_str': '1038100736595255296', 'indices': [133, 156], 'media_url': 'http://pbs.twimg.com/media/DmgTHq3U4AAYsl3.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DmgTHq3U4AAYsl3.jpg', 'url': 'https://t.co/pr8eosDZS7', 'display_url': 'pic.twitter.com/pr8eosDZS7', 'expanded_url': 'https://twitter.com/NoLuckNeeded/status/1038100745344753665/photo/1', 'type': 'photo', 'sizes': {'medium': {'w': 759, 'h': 420, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 376, 'resize': 'fit'}, 'large': {'w': 759, 'h': 420, 'resize': 'fit'}}}]}
## 6
## tweet_favorite_count tweet_favorited
## 1 0 FALSE
## 2 0 FALSE
## 3 0 FALSE
## 4 0 FALSE
## 5 0 FALSE
## 6 0 FALSE
## tweet_full_text
## 1 Done is better than perfect. — Sheryl Sandberg #quote #motivation #justdoit https://t.co/J9lLdszdW6
## 2 Shout out to the Great Fire Department and the tour! <U+0001F468><U+200D><U+0001F692><U+0001F468><U+200D><U+0001F692> Much love to NYC! <U+0001F4AF><U+0001F3A5><U+0001F525><U+270A><U+0001F3FF>\n•\n•\n•\n#hero #fdny #likesforlikes #promo #music #instagood #instadaily #postoftheday #bestoftheday #justdoit #nike #picoftheday... https://t.co/sFobQ2ukpo
## 3 There are some AMAZINGLY hilarious Nike Ad memes happening on my newsfeed. Soooo, I decided to get a little creative too... \n\n#JustDoIt #4YourMorning #4YourMemeCollection \n\n<U+0001F36A><U+0001F602> https://t.co/6ok9qR6k6M
## 4 #kapernickeffect #swoosh #justdoit @ Lucas Bishop's Cigar Lounge https://t.co/BhPBnjOkuU
## 5 One Hand, One Dream: The Shaquem Griffin Story https://t.co/0EbEmwULLF #shaquem #NFL #Seattle #Seahawks #griffin #JustDoIt #Nike https://t.co/pr8eosDZS7
## 6 @realDonaldTrump It's time for me to stock up on some new running apparel. Nike it is! #JUSTDOIT
## tweet_geo tweet_id
## 1 1.04e+18
## 2 1.04e+18
## 3 1.04e+18
## 4 {'type': 'Point', 'coordinates': [35.85402047, -86.45594032]} 1.04e+18
## 5 1.04e+18
## 6 1.04e+18
## tweet_id_str tweet_in_reply_to_screen_name tweet_in_reply_to_status_id
## 1 1.04e+18 NA
## 2 1.04e+18 NA
## 3 1.04e+18 NA
## 4 1.04e+18 NA
## 5 1.04e+18 NA
## 6 1.04e+18 realDonaldTrump 1.04e+18
## tweet_in_reply_to_status_id_str tweet_in_reply_to_user_id
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 1.04e+18 25073877
## tweet_in_reply_to_user_id_str tweet_is_quote_status tweet_lang
## 1 NA FALSE en
## 2 NA FALSE en
## 3 NA FALSE en
## 4 NA FALSE en
## 5 NA FALSE en
## 6 25073877 FALSE en
## tweet_metadata
## 1 {'iso_language_code': 'en', 'result_type': 'recent'}
## 2 {'iso_language_code': 'en', 'result_type': 'recent'}
## 3 {'iso_language_code': 'en', 'result_type': 'recent'}
## 4 {'iso_language_code': 'en', 'result_type': 'recent'}
## 5 {'iso_language_code': 'en', 'result_type': 'recent'}
## 6 {'iso_language_code': 'en', 'result_type': 'recent'}
## tweet_place
## 1
## 2
## 3
## 4 {'id': '19e2bff2e89dc38e', 'url': 'https://api.twitter.com/1.1/geo/id/19e2bff2e89dc38e.json', 'place_type': 'city', 'name': 'Murfreesboro', 'full_name': 'Murfreesboro, TN', 'country_code': 'US', 'country': 'United States', 'contained_within': [], 'bounding_box': {'type': 'Polygon', 'coordinates': [[[-86.505805, 35.751433], [-86.313415, 35.751433], [-86.313415, 35.943407], [-86.505805, 35.943407]]]}, 'attributes': {}}
## 5
## 6
## tweet_possibly_sensitive tweet_quoted_status tweet_quoted_status_id
## 1 FALSE NA
## 2 FALSE NA
## 3 FALSE NA
## 4 FALSE NA
## 5 FALSE NA
## 6 NA NA
## tweet_quoted_status_id_str tweet_retweet_count tweet_retweeted
## 1 NA 0 FALSE
## 2 NA 0 FALSE
## 3 NA 0 FALSE
## 4 NA 0 FALSE
## 5 NA 0 FALSE
## 6 NA 0 FALSE
## tweet_source
## 1 <a href="https://statusbrew.com" rel="nofollow">Statusbrew</a>
## 2 <a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>
## 3 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 4 <a href="http://instagram.com" rel="nofollow">Instagram</a>
## 5 <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 6 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## tweet_truncated
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## tweet_user
## 1 {'id': 3188618684, 'id_str': '3188618684', 'name': 'Ultra YOU Woman', 'screen_name': 'UltraYOUwoman', 'location': 'California, USA', 'description': 'I share tips to achieve your health goals and be your best self inside & out! Plus healthy living, weight loss success stories, skincare & post-birth snap back!', 'url': 'https://t.co/jGlJswxjwS', 'entities': {'url': {'urls': [{'url': 'https://t.co/jGlJswxjwS', 'expanded_url': 'https://about.me/ultrayouwoman', 'display_url': 'about.me/ultrayouwoman', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 57983, 'friends_count': 48721, 'listed_count': 629, 'created_at': 'Fri May 08 10:27:51 +0000 2015', 'favourites_count': 307, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 91870, 'lang': 'en', 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'C0DEED', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/597000926272954368/eQ-8VrVk_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/597000926272954368/eQ-8VrVk_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/3188618684/1431170427', 'profile_link_color': '1DA1F2', 'profile_sidebar_border_color': 'C0DEED', 'profile_sidebar_fill_color': 'DDEEF6', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': False, 'default_profile': True, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}
## 2 {'id': 18387174, 'id_str': '18387174', 'name': 'Yung Cut Up (Videos)', 'screen_name': 'yungcutup', 'location': 'Miami, Florida', 'description': 'All Business inquiries contact cluuxx@gmail.com / Support & Download my new mixtape "Clear Skies" https://t.co/0tOeBuJHHH', 'url': 'http://t.co/lVm8vfDbfO', 'entities': {'url': {'urls': [{'url': 'http://t.co/lVm8vfDbfO', 'expanded_url': 'http://youtube.com/yungcutuptv', 'display_url': 'youtube.com/yungcutuptv', 'indices': [0, 22]}]}, 'description': {'urls': [{'url': 'https://t.co/0tOeBuJHHH', 'expanded_url': 'http://piff.me/6613310', 'display_url': 'piff.me/6613310', 'indices': [98, 121]}]}}, 'protected': False, 'followers_count': 13241, 'friends_count': 5489, 'listed_count': 150, 'created_at': 'Fri Dec 26 09:30:23 +0000 2008', 'favourites_count': 1178, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 618822, 'lang': 'en', 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '131516', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme14/bg.gif', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme14/bg.gif', 'profile_background_tile': True, 'profile_image_url': 'http://pbs.twimg.com/profile_images/945333114582298625/C8zA_uvh_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/945333114582298625/C8zA_uvh_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/18387174/1488819752', 'profile_link_color': '3B94D9', 'profile_sidebar_border_color': 'FFFFFF', 'profile_sidebar_fill_color': 'EFEFEF', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}
## 3 {'id': 32645612, 'id_str': '32645612', 'name': 'Rachel Bogle', 'screen_name': 'rachelbogle', 'location': 'Indianapolis, IN', 'description': 'Morning Traffic Reporter @CBS4Indy | Traffic Authority | Radio <U+0001F4FB> to TV <U+0001F4FA> | Indiana Raised | @IUBloomington Alum | Morkie Mom to Gizmo | Ms. USA Universal 2018', 'url': 'https://t.co/g9exqgZp9x', 'entities': {'url': {'urls': [{'url': 'https://t.co/g9exqgZp9x', 'expanded_url': 'http://www.cbs4indy.com', 'display_url': 'cbs4indy.com', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 11377, 'friends_count': 2386, 'listed_count': 193, 'created_at': 'Fri Apr 17 23:04:15 +0000 2009', 'favourites_count': 11864, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': True, 'statuses_count': 48075, 'lang': 'en', 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'FFFAFF', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/986345956357615619/4zpa5kxF_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/986345956357615619/4zpa5kxF_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/32645612/1485823278', 'profile_link_color': '050505', 'profile_sidebar_border_color': 'FFFFFF', 'profile_sidebar_fill_color': 'FC6A71', 'profile_text_color': '050505', 'profile_use_background_image': True, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}
## 4 {'id': 175932740, 'id_str': '175932740', 'name': 'Ervin Youngblood', 'screen_name': 'ErvGotti609', 'location': 'Tennessee by way of New Jersey', 'description': "Christ-Family-Career.. \\rNY\\nGiants, Mets, 76ers, Penguins, Florida State, Tar Heel Men's BB", 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 218, 'friends_count': 965, 'listed_count': 1, 'created_at': 'Sun Aug 08 02:02:56 +0000 2010', 'favourites_count': 487, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': False, 'statuses_count': 1983, 'lang': 'en', 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'C0DEED', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/724407937234550784/6Jrvt3mv_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/724407937234550784/6Jrvt3mv_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/175932740/1357086566', 'profile_link_color': '1DA1F2', 'profile_sidebar_border_color': 'C0DEED', 'profile_sidebar_fill_color': 'DDEEF6', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': True, 'default_profile': True, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}
## 5 {'id': 22306628, 'id_str': '22306628', 'name': 'NoLuckNeeded.com', 'screen_name': 'NoLuckNeeded', 'location': 'Gambleville', 'description': 'https://t.co/Lnr5uRql8x is a Friendly Online Gambling Forum that was established in 2004 <U+2663> Be Gamble Aware 18+ https://t.co/2RyHF1JlEt', 'url': 'http://t.co/MMGF9RfLz0', 'entities': {'url': {'urls': [{'url': 'http://t.co/MMGF9RfLz0', 'expanded_url': 'http://noluckneeded.com', 'display_url': 'noluckneeded.com', 'indices': [0, 22]}]}, 'description': {'urls': [{'url': 'https://t.co/Lnr5uRql8x', 'expanded_url': 'http://NoLuckNeeded.com', 'display_url': 'NoLuckNeeded.com', 'indices': [0, 23]}, {'url': 'https://t.co/2RyHF1JlEt', 'expanded_url': 'http://gambleaware.co.uk', 'display_url': 'gambleaware.co.uk', 'indices': [112, 135]}]}}, 'protected': False, 'followers_count': 13731, 'friends_count': 13629, 'listed_count': 181, 'created_at': 'Sat Feb 28 23:13:57 +0000 2009', 'favourites_count': 32971, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 24989, 'lang': 'en', 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '0A2185', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme4/bg.gif', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme4/bg.gif', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/3028148602/dfd898817c8d7e5c71e66df2f2fa6b48_normal.jpeg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/3028148602/dfd898817c8d7e5c71e66df2f2fa6b48_normal.jpeg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/22306628/1469370851', 'profile_link_color': '111BBB', 'profile_sidebar_border_color': '0B0C0F', 'profile_sidebar_fill_color': '3BA4CE', 'profile_text_color': '1A1E1A', 'profile_use_background_image': True, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}
## 6 {'id': 15566700, 'id_str': '15566700', 'name': 'tazman69', 'screen_name': 'tazman69', 'location': 'Austin, TX', 'description': 'Enjoys cycling, running & spending a relaxing day @ the lake. Equality and dignity for all human beings.', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 64, 'friends_count': 175, 'listed_count': 1, 'created_at': 'Wed Jul 23 16:43:42 +0000 2008', 'favourites_count': 9622, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 2213, 'lang': 'en', 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'C0DEED', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/743752426256142341/GJeLyn-J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/743752426256142341/GJeLyn-J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/15566700/1466159294', 'profile_link_color': '1DA1F2', 'profile_sidebar_border_color': 'C0DEED', 'profile_sidebar_fill_color': 'DDEEF6', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': False, 'default_profile': True, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}
## user_contributors_enabled user_created_at
## 1 FALSE Fri May 08 10:27:51 +0000 2015
## 2 FALSE Fri Dec 26 09:30:23 +0000 2008
## 3 FALSE Fri Apr 17 23:04:15 +0000 2009
## 4 FALSE Sun Aug 08 02:02:56 +0000 2010
## 5 FALSE Sat Feb 28 23:13:57 +0000 2009
## 6 FALSE Wed Jul 23 16:43:42 +0000 2008
## user_default_profile user_default_profile_image
## 1 TRUE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 TRUE FALSE
## 5 FALSE FALSE
## 6 TRUE FALSE
## user_description
## 1 I share tips to achieve your health goals and be your best self inside & out! Plus healthy living, weight loss success stories, skincare & post-birth snap back!
## 2 All Business inquiries contact cluuxx@gmail.com / Support & Download my new mixtape "Clear Skies" https://t.co/0tOeBuJHHH
## 3 Morning Traffic Reporter @CBS4Indy | Traffic Authority | Radio <U+0001F4FB> to TV <U+0001F4FA> | Indiana Raised | @IUBloomington Alum | Morkie Mom to Gizmo | Ms. USA Universal 2018
## 4 Christ-Family-Career.. \nNY\nGiants, Mets, 76ers, Penguins, Florida State, Tar Heel Men's BB
## 5 https://t.co/Lnr5uRql8x is a Friendly Online Gambling Forum that was established in 2004 <U+2663> Be Gamble Aware 18+ https://t.co/2RyHF1JlEt
## 6 Enjoys cycling, running & spending a relaxing day @ the lake. Equality and dignity for all human beings.
## user_entities
## 1 {'url': {'urls': [{'url': 'https://t.co/jGlJswxjwS', 'expanded_url': 'https://about.me/ultrayouwoman', 'display_url': 'about.me/ultrayouwoman', 'indices': [0, 23]}]}, 'description': {'urls': []}}
## 2 {'url': {'urls': [{'url': 'http://t.co/lVm8vfDbfO', 'expanded_url': 'http://youtube.com/yungcutuptv', 'display_url': 'youtube.com/yungcutuptv', 'indices': [0, 22]}]}, 'description': {'urls': [{'url': 'https://t.co/0tOeBuJHHH', 'expanded_url': 'http://piff.me/6613310', 'display_url': 'piff.me/6613310', 'indices': [98, 121]}]}}
## 3 {'url': {'urls': [{'url': 'https://t.co/g9exqgZp9x', 'expanded_url': 'http://www.cbs4indy.com', 'display_url': 'cbs4indy.com', 'indices': [0, 23]}]}, 'description': {'urls': []}}
## 4 {'description': {'urls': []}}
## 5 {'url': {'urls': [{'url': 'http://t.co/MMGF9RfLz0', 'expanded_url': 'http://noluckneeded.com', 'display_url': 'noluckneeded.com', 'indices': [0, 22]}]}, 'description': {'urls': [{'url': 'https://t.co/Lnr5uRql8x', 'expanded_url': 'http://NoLuckNeeded.com', 'display_url': 'NoLuckNeeded.com', 'indices': [0, 23]}, {'url': 'https://t.co/2RyHF1JlEt', 'expanded_url': 'http://gambleaware.co.uk', 'display_url': 'gambleaware.co.uk', 'indices': [112, 135]}]}}
## 6 {'description': {'urls': []}}
## user_favourites_count user_follow_request_sent user_followers_count
## 1 307 FALSE 57983
## 2 1178 FALSE 13241
## 3 11864 FALSE 11377
## 4 487 FALSE 218
## 5 32971 FALSE 13731
## 6 9622 FALSE 64
## user_following user_friends_count user_geo_enabled
## 1 FALSE 48721 FALSE
## 2 FALSE 5489 FALSE
## 3 FALSE 2386 FALSE
## 4 FALSE 965 TRUE
## 5 FALSE 13629 FALSE
## 6 FALSE 175 FALSE
## user_has_extended_profile user_id user_id_str
## 1 FALSE 3188618684 3188618684
## 2 FALSE 18387174 18387174
## 3 FALSE 32645612 32645612
## 4 TRUE 175932740 175932740
## 5 FALSE 22306628 22306628
## 6 FALSE 15566700 15566700
## user_is_translation_enabled user_is_translator user_lang
## 1 FALSE FALSE en
## 2 FALSE FALSE en
## 3 FALSE FALSE en
## 4 FALSE FALSE en
## 5 FALSE FALSE en
## 6 FALSE FALSE en
## user_listed_count user_location user_name
## 1 629 California, USA Ultra YOU Woman
## 2 150 Miami, Florida Yung Cut Up (Videos)
## 3 193 Indianapolis, IN Rachel Bogle
## 4 1 Tennessee by way of New Jersey Ervin Youngblood
## 5 181 Gambleville NoLuckNeeded.com
## 6 1 Austin, TX tazman69
## user_notifications user_profile_background_color
## 1 FALSE C0DEED
## 2 FALSE 131516
## 3 FALSE FFFAFF
## 4 FALSE C0DEED
## 5 FALSE 0A2185
## 6 FALSE C0DEED
## user_profile_background_image_url
## 1 http://abs.twimg.com/images/themes/theme1/bg.png
## 2 http://abs.twimg.com/images/themes/theme14/bg.gif
## 3 http://abs.twimg.com/images/themes/theme1/bg.png
## 4 http://abs.twimg.com/images/themes/theme1/bg.png
## 5 http://abs.twimg.com/images/themes/theme4/bg.gif
## 6 http://abs.twimg.com/images/themes/theme1/bg.png
## user_profile_background_image_url_https
## 1 https://abs.twimg.com/images/themes/theme1/bg.png
## 2 https://abs.twimg.com/images/themes/theme14/bg.gif
## 3 https://abs.twimg.com/images/themes/theme1/bg.png
## 4 https://abs.twimg.com/images/themes/theme1/bg.png
## 5 https://abs.twimg.com/images/themes/theme4/bg.gif
## 6 https://abs.twimg.com/images/themes/theme1/bg.png
## user_profile_background_tile
## 1 FALSE
## 2 TRUE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## user_profile_banner_url
## 1 https://pbs.twimg.com/profile_banners/3188618684/1431170427
## 2 https://pbs.twimg.com/profile_banners/18387174/1488819752
## 3 https://pbs.twimg.com/profile_banners/32645612/1485823278
## 4 https://pbs.twimg.com/profile_banners/175932740/1357086566
## 5 https://pbs.twimg.com/profile_banners/22306628/1469370851
## 6 https://pbs.twimg.com/profile_banners/15566700/1466159294
## user_profile_image_url
## 1 http://pbs.twimg.com/profile_images/597000926272954368/eQ-8VrVk_normal.jpg
## 2 http://pbs.twimg.com/profile_images/945333114582298625/C8zA_uvh_normal.jpg
## 3 http://pbs.twimg.com/profile_images/986345956357615619/4zpa5kxF_normal.jpg
## 4 http://pbs.twimg.com/profile_images/724407937234550784/6Jrvt3mv_normal.jpg
## 5 http://pbs.twimg.com/profile_images/3028148602/dfd898817c8d7e5c71e66df2f2fa6b48_normal.jpeg
## 6 http://pbs.twimg.com/profile_images/743752426256142341/GJeLyn-J_normal.jpg
## user_profile_image_url_https
## 1 https://pbs.twimg.com/profile_images/597000926272954368/eQ-8VrVk_normal.jpg
## 2 https://pbs.twimg.com/profile_images/945333114582298625/C8zA_uvh_normal.jpg
## 3 https://pbs.twimg.com/profile_images/986345956357615619/4zpa5kxF_normal.jpg
## 4 https://pbs.twimg.com/profile_images/724407937234550784/6Jrvt3mv_normal.jpg
## 5 https://pbs.twimg.com/profile_images/3028148602/dfd898817c8d7e5c71e66df2f2fa6b48_normal.jpeg
## 6 https://pbs.twimg.com/profile_images/743752426256142341/GJeLyn-J_normal.jpg
## user_profile_link_color user_profile_sidebar_border_color
## 1 1DA1F2 C0DEED
## 2 3B94D9 FFFFFF
## 3 50505 FFFFFF
## 4 1DA1F2 C0DEED
## 5 111BBB 0B0C0F
## 6 1DA1F2 C0DEED
## user_profile_sidebar_fill_color user_profile_text_color
## 1 DDEEF6 333333
## 2 EFEFEF 333333
## 3 FC6A71 50505
## 4 DDEEF6 333333
## 5 3BA4CE 1A1E1A
## 6 DDEEF6 333333
## user_profile_use_background_image user_protected user_screen_name
## 1 TRUE FALSE UltraYOUwoman
## 2 TRUE FALSE yungcutup
## 3 TRUE FALSE rachelbogle
## 4 TRUE FALSE ErvGotti609
## 5 TRUE FALSE NoLuckNeeded
## 6 TRUE FALSE tazman69
## user_statuses_count user_time_zone user_translator_type
## 1 91870 NA none
## 2 618822 NA none
## 3 48075 NA none
## 4 1983 NA none
## 5 24989 NA none
## 6 2213 NA none
## user_url user_utc_offset user_verified
## 1 https://t.co/jGlJswxjwS NA FALSE
## 2 http://t.co/lVm8vfDbfO NA FALSE
## 3 https://t.co/g9exqgZp9x NA TRUE
## 4 NA FALSE
## 5 http://t.co/MMGF9RfLz0 NA FALSE
## 6 NA FALSE
tweettweet$tweet_created_at <- as.Date(tweet$tweet_created_at, format= "%y-%m-%d")
tweet$text <- as.character(tweet$tweet_full_text)corpus, dimana corpus adalah tipe data untuk text mining. Setiap tweet/baris konten text nya akan diubah menjadi 1 document corpus.Untuk mengubah suatu data table menjadi corpus dapat menggunakan fungsi VCorpus(VectorSource()) dari package tm.
# Create document corpus with tweet text.
tweet_corpus <- tweet %>%
pull(tweet_full_text) %>%
VectorSource() %>%
VCorpus()
tweet_corpus## <<VCorpus>>
## Metadata: corpus specific: 0, document level (indexed): 0
## Content: documents: 5089
## [1] "Done is better than perfect. — Sheryl Sandberg #quote #motivation #justdoit https://t.co/J9lLdszdW6"
tolower : Mengubah semua huruf kapital menjadi huruf keciltolower
#remove URLs
removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
#removing punctuation
remove_punct<-function(x)gsub("[^[:alpha:][:space:]]*", "", x)
# Remove the @ (usernames)
removeUsername <- function(x) gsub("@[^[:space:]]*", "", x)
# remove single letter words
removeSingle <- function(x) gsub(" . ", " ", x)
# cleansing text `tweet_corpus`
tweet_corpus_clean <- tweet_corpus %>%
# The process of normalization involves transforming text uniformly.convert text to lowercase
tm_map(content_transformer(tolower)) %>%
tm_map(removePunctuation) %>%
tm_map(removeNumbers) %>%
#Stop words are just common words which are meaningless. If we look at the result of stop words (“English”) we can see what is getting removed.
tm_map(removeWords, stopwords("english")) %>%
tm_map(stripWhitespace) %>%
tm_map(stemDocument) %>%
tm_map(content_transformer(removeURL)) %>%
tm_map(content_transformer(remove_punct)) %>%
tm_map(content_transformer(removeUsername)) %>%
#Let’s create our own stopword removal dictionary to mine text further
# specify your stopwords as a character vector
tm_map(removeWords, c("keep", "check", "can","just","isnt","hey","ask","theyr","dont","theyre","cmon","htt","everything","even","enough","rt")) %>%
tm_map(content_transformer(removeSingle))## [1] "Done is better than perfect. — Sheryl Sandberg #quote #motivation #justdoit https://t.co/J9lLdszdW6"
## [1] "done better perfect sheryl sandberg quot motiv justdoit "
Mengubah corpus menjadi DocumentTermMatrix (DTM). Setiap document akan dipecah konten text nya menjadi kata per kata (prediktor/kolom). Outputnya berupa matriks berukuran document/baris x terms/kolom.
## <<DocumentTermMatrix (documents: 5089, terms: 11034)>>
## Non-/sparse entries: 60221/56091805
## Sparsity : 100%
## Maximal term length: 59
## Weighting : term frequency (tf)
## Sample :
## Terms
## Docs amp believ commerci crazi justdoit kaepernick like nike one
## 130 0 0 0 0 1 1 0 1 0
## 220 0 0 0 0 1 1 0 1 0
## 3395 0 0 0 0 1 1 0 1 0
## 4154 0 0 0 0 1 1 0 1 0
## 4156 0 0 0 0 2 0 0 1 0
## 4167 0 0 0 0 2 1 0 1 0
## 4194 0 1 0 0 2 0 0 1 0
## 4427 1 0 0 0 1 1 0 1 0
## 4433 1 0 0 0 2 0 0 0 0
## 4589 0 0 0 0 1 1 0 1 0
## Terms
## Docs realdonaldtrump
## 130 0
## 220 0
## 3395 0
## 4154 1
## 4156 0
## 4167 0
## 4194 0
## 4427 1
## 4433 0
## 4589 1
Based on the termdocumentmatrix() ouput tried to sort the keywords based on their frequency. The word with high frequency is justdoit as tweeted by the users where as trump is the most least occurring word in the corpus.
freq <- sort(colSums(as.matrix(tweet_dtm)), decreasing=TRUE)
wf <- data.frame(word=names(freq), freq=freq)
# Plot Histogram
subset(wf, freq>200) %>%
ggplot(aes(word, freq)) +
geom_bar(stat="identity", fill="darkred", colour="darkgreen") +
theme(axis.text.x=element_text(angle=45, hjust=1))# Word Cloud
set.seed(100)
wordcloud(names(freq), freq, min.freq=100, colors=brewer.pal(6, "Dark2"))