In this assignment, I will try to highlight the sentiment analysis difference in the opinions between the Medicare and the Medicaid. I will then focalise this analysis on doing a sentiments analysis about all tweets.
We start first by creating the dataset, importing the tweets and do some manipulation so we can better use it in our analysis. To authenticate, you will need your api_key, api_secret from app setting on twitter application management load directly. I loaded the data first using (search_tweets) in rtweet library and I set the number of tweets to load to 3000 then I converted the list of Tweets to a Data Frame so I can manipulate the dataset then I kept only variable I will use for this project and remove all unwanted variable.
## [1] "Using direct authentication"
The authors of Medicare and Medicaid tweets, we can establish the list of the 10 most prolific authors with the number of messages sent. Here we will count the message numbers by authors, sort them in descending order and then display the first 10.
##
## CovensureLLC scottsocial MaxRemind CCEHI pahealthaccess
## 14 14 11 10 9
## lusk_jr grumpybirdieS HealthActionUS lisa_maskevich Deemoney521
## 7 6 6 6 5
## [1] 2501
Finally, the concentration of the authors is not very strong. In relation to the number of tweets, the number of unique authors is also around two thousands.
We can graph the list of authors who sent 4 or more tweets using the count variable previously defined.
##
## CarolForden Shane_not_Shawn CCEHI pahealthaccess
## 15 11 10 9
## healthadvicefor charles_gaba estarianne fineout
## 8 6 6 6
## MaxRemind MikeBertaut
## 6 6
## [1] 2615
Finally, the concentration of the authors is not very strong. In relation to the number of tweets, the number of unique authors is also around two thousands.
We can graph the list of authors who sent 4 or more tweets using the count variable previously defined.
Here I wanted to analyse the source people use for their tweets and display it in percentage for both Medicare and Medicaid.
Here we will try to view the Top Screen Names by number and their location for both Medicaid and Medicare tweets and it’s not surprisingly, there are no a lot of common users repeatedly using both hashtags.
Web users retweet when they appreciate the content. Among the messages that are retweets, we will try to isolate the 2 tweets that are most popular. Then note the number of the messages that are retweets, then we will create a vector of the retweet counter for the retweeted messages and create a descending index index according to the number and finally display the first 2 messages with different authors and identifiers.
## screen_name user_id retweet_count
## 2324 NickAyer3 9.945182e+17 20620
## 2736 ryoatl 5.893344e+08 11457
## [1] Every Democrat running for House and Senate should be talking about this. The GOP has made it plain: if they win in 2018, they will cut Social Security and Medicare. It's THAT SIMPLE.\n\nhttps://t.co/87D6zXOnxh
## [2] .@google needs to explain why this isn’t a threat to the Republic. Watch the video. Google believes they can shape your search results and videos to make you “have their values”. Open borders. Socialism. Medicare 4 all. Congressional hearings! Investigate\n\nhttps://t.co/jlbSgMMrLT
## 1365 Levels: ...but let's allow those two horrific states go and form their own union. Let's stop supporting #Alabama & #Mississippi with federal tax dollars. Let them support themselves & see how it goes. You know, #SocialSecurity, #Medicare, #FEMA, #foodstamps, let's take it all away. ...
This result does not suit us, we must remove duplicate messages. We use the duplicated () function to identify them. We have two distinct messages now, which are repeated many times respectively.
## [1] Every Democrat running for House and Senate should be talking about this. The GOP has made it plain: if they win in 2018, they will cut Social Security and Medicare. It's THAT SIMPLE.\n\nhttps://t.co/87D6zXOnxh
## [2] .@google needs to explain why this isn’t a threat to the Republic. Watch the video. Google believes they can shape your search results and videos to make you “have their values”. Open borders. Socialism. Medicare 4 all. Congressional hearings! Investigate\n\nhttps://t.co/jlbSgMMrLT
## 1365 Levels: ...but let's allow those two horrific states go and form their own union. Let's stop supporting #Alabama & #Mississippi with federal tax dollars. Let them support themselves & see how it goes. You know, #SocialSecurity, #Medicare, #FEMA, #foodstamps, let's take it all away. ...
## [1] 20620 11457
We can represent the number of occurrence of 15 first tweets with their screename using barplot.
## screen_name user_id retweet_count
## 2442 AmandaThigpen4 9.534748e+17 17139
## 2482 RealJobRob 5.145079e+08 17139
## [1] When a valid ID is required:\n\n-Boarding an Airplane\n-Getting a Prescription \n-Applying for a Job\n-Cashing a Check\n-Applying for Food Stamps \n-Obtaining Medicare/Medicaid \n-Driving\n-Donating Blood \n-Getting Married \n-Adopting a Pet \n\nWhen a valid ID is NOT required:\n\nVoting
## [2] When a valid ID is required:\n\n-Boarding an Airplane\n-Getting a Prescription \n-Applying for a Job\n-Cashing a Check\n-Applying for Food Stamps \n-Obtaining Medicare/Medicaid \n-Driving\n-Donating Blood \n-Getting Married \n-Adopting a Pet \n\nWhen a valid ID is NOT required:\n\nVoting
## 1073 Levels: - Iowa children would be routinely screened for mental illness under new state plan\n- Iowa tries to... https://t.co/1MviR4UVoE ...
This result does not suit us, we must remove duplicate messages. We use the duplicated () function to identify them. We have two distinct messages now, which are repeated many times respectively.
## [1] When a valid ID is required:\n\n-Boarding an Airplane\n-Getting a Prescription \n-Applying for a Job\n-Cashing a Check\n-Applying for Food Stamps \n-Obtaining Medicare/Medicaid \n-Driving\n-Donating Blood \n-Getting Married \n-Adopting a Pet \n\nWhen a valid ID is NOT required:\n\nVoting
## [2] Valid ID is required:\n\n-Boarding an Airplane\n-Getting a Prescription \n-Applying for a Job\n-Cashing a Check\n-Applying for Food Stamps \n-Obtaining Medicare/Medicaid \n-Driving\n-Donating Blood \n-Getting Married \n-Adopting a Pet \n\nValid ID is NOT required:\n\n-Voting\n\nSee the problem?
## 1073 Levels: - Iowa children would be routinely screened for mental illness under new state plan\n- Iowa tries to... https://t.co/1MviR4UVoE ...
## [1] 17139 4049
We can represent the number of occurrence of 15 first tweets with their screename using barplot.
We first want to look at the references to themes (#) and authors (@) that appear in messages. A first cleaning is necessary to eliminate the elements that can generate noise, hindering the analysis. We know that there are repetitions in the tweets. We eliminate duplicates.
## [1] 1365
## [1] 1365
## [1] The WWFH team is on Capitol Hill today. Thanks to Anna Platt with @RepAnthonyBrown for taking the time to meet with us about saving Medicare #PartD. https://t.co/DCI7vvGHYn
## 1365 Levels: ...but let's allow those two horrific states go and form their own union. Let's stop supporting #Alabama & #Mississippi with federal tax dollars. Let them support themselves & see how it goes. You know, #SocialSecurity, #Medicare, #FEMA, #foodstamps, let's take it all away. ...
We have around 13 hundred tweets that we will collect in a specific vector.
## [1] 1073
## [1] 1073
## [1] The Hyde Amendment disproportionately impacts low-income women, women of color, immigrants, and young people who rely on Medicaid for their healthcare coverage. It’s time to #RepealHyde! \nhttps://t.co/W1j9hKNpAj
## 1073 Levels: - Iowa children would be routinely screened for mental illness under new state plan\n- Iowa tries to... https://t.co/1MviR4UVoE ...
We have around 10 hundred tweets that we will collect in a specific vector.
Several elements that we will not exploit can disrupt. We delete them and we reassess the same message.
## [1] "the so-called 'democrats' demanding more power for republicans are associated with no labels, a billionaire front group that's been trying to gut social security and medicare for years"
## [1] 1282
## [1] "@ccehi @cmsgov “we need our health care system to reach out to us, to extend a hand.” sherman pines, chair of ri’s implementation council, credits his health plan’s (a medicare/medicaid dual special needs plan) rn case manager with helping him recover from a recent hospital stay. #dualsfuture"
## [1] 1026
The “#” character plays a special role on Twitter. It allows to designate a hashtag, a subject related to the message that one writes or in connection with our concerns. Several hashtags can appear in the same message. We list all the topics mentioned as hashtags in all our tweets.
## [1] 1044
Then, we count their number of appearance, highlighting the 10 most popular hashtags, number of appearance of each hashtag, sorting according to the decreasing frequency and display of the 10 most frequent hastags.
## liste_hashtags_Medicare
## #medicare #healthcare #socialsecurity
## 153 31 24
## #medicaid #medicareforall #goptaxscam
## 19 19 16
## #dualsfuture #allhealthlive #medicaremarketing
## 13 12 12
## #cms
## 10
The second dominant theme in this hashtag is #healthcare, #socialsecurity, #medicaid, #medicareforall and #goptaxscam.We can identify all the hashtags associated with the term “Medicare”.
## [1] 604
Then, we count their number of appearance, highlighting the 10 most popular hashtags, number of appearance of each hashtag, sorting according to the decreasing frequency and display of the 10 most frequent hastags.
## liste_hashtags_Medicaid
## #medicaid #medicare #goptaxscam #dualsfuture #healthcare
## 107 26 15 14 14
## #aca #txlege # #cms #ct
## 8 6 5 4 4
The second dominant theme in this hashtag is #medicare, #goptaxscam, #dualsfuture, #healthcare and #aca.We can identify all the hashtags associated with the term “Medicaid”.
The result of both worlclouds are quite striking, they have some similar words even if they haven’t been used in the same proportions.
We post wordcloud themes excluding #Medicare which is too obvious that will have a lot of them as it is the initial hashtag.
We post wordcloud themes excluding #Medicaid which is too obvious that will have a lot of them as it is the initial hashtag.
The sentiment analysis frequency shows that people still have different opinion and view, there is a significant differences between the sentiments of the #Medicaid and #Medicare tweets, overall it’s not easy to clearly conclude somethings with #Medicaid tweets containing a greater frequency of negative, disgust, trust and joy words and #Medicare tweets containing more positive, anticipation, fear, sadness and surprise words.
The study of tweets is a strong focus of social media analysis because Twitter is become an important communication vector. This example shows that it is easy to initiate a first analysis based on data extracted directly online. When it comes to going in detail, explore in depth the information contained in the messages, the case is other. The data preparation phase is becoming as important as ever. Of the rigor that we demonstrate in this step will depend the credibility of the results that we will produce.