1 Introduction

In this assignment, I will try to highlight the sentiment analysis difference in the opinions between the Medicare and the Medicaid. I will then focalise this analysis on doing a sentiments analysis about all tweets.

2 Load Tweets

We start first by creating the dataset, importing the tweets and do some manipulation so we can better use it in our analysis. To authenticate, you will need your api_key, api_secret from app setting on twitter application management load directly. I loaded the data first using (search_tweets) in rtweet library and I set the number of tweets to load to 3000 then I converted the list of Tweets to a Data Frame so I can manipulate the dataset then I kept only variable I will use for this project and remove all unwanted variable.

## [1] "Using direct authentication"

3 Extract Tweets & Analysis

3.1 Medicare Tweets

3.2 Medicaid Tweets

4 Descriptives Analysis

The authors of Medicare and Medicaid tweets, we can establish the list of the 10 most prolific authors with the number of messages sent. Here we will count the message numbers by authors, sort them in descending order and then display the first 10.

4.1 Analysis of Medicare

## 
##   CovensureLLC    scottsocial      MaxRemind          CCEHI pahealthaccess 
##             14             14             11             10              9 
##        lusk_jr  grumpybirdieS HealthActionUS lisa_maskevich    Deemoney521 
##              7              6              6              6              5
## [1] 2501

Finally, the concentration of the authors is not very strong. In relation to the number of tweets, the number of unique authors is also around two thousands.

We can graph the list of authors who sent 4 or more tweets using the count variable previously defined.

4.2 Analysis of Medicaid

## 
##     CarolForden Shane_not_Shawn           CCEHI  pahealthaccess 
##              15              11              10               9 
## healthadvicefor    charles_gaba      estarianne         fineout 
##               8               6               6               6 
##       MaxRemind     MikeBertaut 
##               6               6
## [1] 2615

Finally, the concentration of the authors is not very strong. In relation to the number of tweets, the number of unique authors is also around two thousands.

We can graph the list of authors who sent 4 or more tweets using the count variable previously defined.

5 The Authors of Original Tweets

This first ranking gives a first point of view on the activity of the authors. But it may be biased by the fact that some tweets are actually simple retweets. To identify the authors of “original” tweets, which bring real added value in the exchanges, we try to isolate the messages that are not retweets, then we count again the authors. We first identify messages that are not retweets.

5.1 Original Tweets by Medicare

## [1] 1149

They are around one thousand for Medicare (out of 3000 initial tweets). We reiterate the operation of counting the number of tweets per author, then we sort them by descending order and then we print the graph of those who have more than 5 (included) tweets. We will notice that there is not a lot of authors who have written 3 or more original tweets.

5.2 Original Tweets by Medicaid

## [1] 926

They are around 9 hundred for Medicaid (out of 3000 initial tweets). We reiterate the operation of counting the number of tweets per author, then we sort them by descending order and then we print the graph of those who have more than 5 (included) tweets. We will notice that there is not a lot of authors who have written 3 or more original tweets.

6 Sources Tweets

Here I wanted to analyse the source people use for their tweets and display it in percentage for both Medicare and Medicaid.

6.1 Sources For Medicare

6.2 Sources For Medicaid

7 Top Tweets

Here we will try to view the Top Screen Names by number and their location for both Medicaid and Medicare tweets and it’s not surprisingly, there are no a lot of common users repeatedly using both hashtags.

7.1 Top Tweets by Medicare

7.2 Top Tweets by Medicaid

8 Popularity Analysis Through Retweets

Web users retweet when they appreciate the content. Among the messages that are retweets, we will try to isolate the 2 tweets that are most popular. Then note the number of the messages that are retweets, then we will create a vector of the retweet counter for the retweeted messages and create a descending index index according to the number and finally display the first 2 messages with different authors and identifiers.

8.1 Retweets Medicare

##      screen_name      user_id retweet_count
## 2324   NickAyer3 9.945182e+17         20620
## 2736      ryoatl 5.893344e+08         11457
## [1] Every Democrat running for House and Senate should be talking about this.  The GOP has made it plain:  if they win in 2018, they will cut Social Security and Medicare.  It's THAT SIMPLE.\n\nhttps://t.co/87D6zXOnxh                                                                     
## [2] .@google needs to explain why this isn’t a threat to the Republic. Watch the video. Google believes they can shape your search results and videos to make you “have their values”. Open borders. Socialism. Medicare 4 all. Congressional hearings! Investigate\n\nhttps://t.co/jlbSgMMrLT
## 1365 Levels: ...but let's allow those two horrific states go and form their own union. Let's stop supporting #Alabama & #Mississippi with federal tax dollars. Let them support themselves & see how it goes. You know, #SocialSecurity, #Medicare, #FEMA, #foodstamps, let's take it all away. ...

This result does not suit us, we must remove duplicate messages. We use the duplicated () function to identify them. We have two distinct messages now, which are repeated many times respectively.

## [1] Every Democrat running for House and Senate should be talking about this.  The GOP has made it plain:  if they win in 2018, they will cut Social Security and Medicare.  It's THAT SIMPLE.\n\nhttps://t.co/87D6zXOnxh                                                                     
## [2] .@google needs to explain why this isn’t a threat to the Republic. Watch the video. Google believes they can shape your search results and videos to make you “have their values”. Open borders. Socialism. Medicare 4 all. Congressional hearings! Investigate\n\nhttps://t.co/jlbSgMMrLT
## 1365 Levels: ...but let's allow those two horrific states go and form their own union. Let's stop supporting #Alabama & #Mississippi with federal tax dollars. Let them support themselves & see how it goes. You know, #SocialSecurity, #Medicare, #FEMA, #foodstamps, let's take it all away. ...
## [1] 20620 11457

We can represent the number of occurrence of 15 first tweets with their screename using barplot.

8.2 Retweets Medicaid

##         screen_name      user_id retweet_count
## 2442 AmandaThigpen4 9.534748e+17         17139
## 2482     RealJobRob 5.145079e+08         17139
## [1] When a valid ID is required:\n\n-Boarding an Airplane\n-Getting a Prescription \n-Applying for a Job\n-Cashing a Check\n-Applying for Food Stamps  \n-Obtaining Medicare/Medicaid \n-Driving\n-Donating Blood  \n-Getting Married \n-Adopting a Pet \n\nWhen a valid ID is NOT required:\n\nVoting
## [2] When a valid ID is required:\n\n-Boarding an Airplane\n-Getting a Prescription \n-Applying for a Job\n-Cashing a Check\n-Applying for Food Stamps  \n-Obtaining Medicare/Medicaid \n-Driving\n-Donating Blood  \n-Getting Married \n-Adopting a Pet \n\nWhen a valid ID is NOT required:\n\nVoting
## 1073 Levels: - Iowa children would be routinely screened for mental illness under new state plan\n- Iowa tries to... https://t.co/1MviR4UVoE ...

This result does not suit us, we must remove duplicate messages. We use the duplicated () function to identify them. We have two distinct messages now, which are repeated many times respectively.

## [1] When a valid ID is required:\n\n-Boarding an Airplane\n-Getting a Prescription \n-Applying for a Job\n-Cashing a Check\n-Applying for Food Stamps  \n-Obtaining Medicare/Medicaid \n-Driving\n-Donating Blood  \n-Getting Married \n-Adopting a Pet \n\nWhen a valid ID is NOT required:\n\nVoting       
## [2] Valid ID is required:\n\n-Boarding an Airplane\n-Getting a Prescription \n-Applying for a Job\n-Cashing a Check\n-Applying for Food Stamps  \n-Obtaining Medicare/Medicaid \n-Driving\n-Donating Blood  \n-Getting Married \n-Adopting a Pet \n\nValid ID is NOT required:\n\n-Voting\n\nSee the problem?
## 1073 Levels: - Iowa children would be routinely screened for mental illness under new state plan\n- Iowa tries to... https://t.co/1MviR4UVoE ...
## [1] 17139  4049

We can represent the number of occurrence of 15 first tweets with their screename using barplot.

9 Content Analysis of Tweets

9.1 Analysis of themes and individuals

We first want to look at the references to themes (#) and authors (@) that appear in messages. A first cleaning is necessary to eliminate the elements that can generate noise, hindering the analysis. We know that there are repetitions in the tweets. We eliminate duplicates.

9.1.1 Medicare

## [1] 1365
## [1] 1365
## [1] The WWFH team is on Capitol Hill today.  Thanks to Anna Platt with @RepAnthonyBrown for taking the time to meet with us about saving Medicare #PartD. https://t.co/DCI7vvGHYn
## 1365 Levels: ...but let's allow those two horrific states go and form their own union. Let's stop supporting #Alabama & #Mississippi with federal tax dollars. Let them support themselves & see how it goes. You know, #SocialSecurity, #Medicare, #FEMA, #foodstamps, let's take it all away. ...

We have around 13 hundred tweets that we will collect in a specific vector.

9.1.2 Medicaid

## [1] 1073
## [1] 1073
## [1] The Hyde Amendment disproportionately impacts low-income women, women of color, immigrants, and young people who rely on Medicaid for their healthcare coverage. It’s time to #RepealHyde! \nhttps://t.co/W1j9hKNpAj
## 1073 Levels: - Iowa children would be routinely screened for mental illness under new state plan\n- Iowa tries to... https://t.co/1MviR4UVoE ...

We have around 10 hundred tweets that we will collect in a specific vector.

9.2 Removing Line Breaks and Web References

Several elements that we will not exploit can disrupt. We delete them and we reassess the same message.

9.2.1 Remove Line Breaks for Medicare

## [1] "the so-called 'democrats' demanding more power for republicans are associated with no labels, a billionaire front group that's been trying to gut social security and medicare for years"
## [1] 1282

9.2.2 Remove Line Breaks for Medicaid

## [1] "@ccehi @cmsgov “we need our health care system to reach out to us, to extend a hand.” sherman pines, chair of ri’s implementation council, credits his health plan’s (a medicare/medicaid dual special needs plan) rn case manager with helping him recover from a recent hospital stay. #dualsfuture"
## [1] 1026

9.3 Analysis of Themes

The “#” character plays a special role on Twitter. It allows to designate a hashtag, a subject related to the message that one writes or in connection with our concerns. Several hashtags can appear in the same message. We list all the topics mentioned as hashtags in all our tweets.

9.3.1 Themes Medicare

## [1] 1044

Then, we count their number of appearance, highlighting the 10 most popular hashtags, number of appearance of each hashtag, sorting according to the decreasing frequency and display of the 10 most frequent hastags.

## liste_hashtags_Medicare
##          #medicare        #healthcare    #socialsecurity 
##                153                 31                 24 
##          #medicaid    #medicareforall        #goptaxscam 
##                 19                 19                 16 
##       #dualsfuture     #allhealthlive #medicaremarketing 
##                 13                 12                 12 
##               #cms 
##                 10

The second dominant theme in this hashtag is #healthcare, #socialsecurity, #medicaid, #medicareforall and #goptaxscam.We can identify all the hashtags associated with the term “Medicare”.

9.3.2 Themes Medicaid

## [1] 604

Then, we count their number of appearance, highlighting the 10 most popular hashtags, number of appearance of each hashtag, sorting according to the decreasing frequency and display of the 10 most frequent hastags.

## liste_hashtags_Medicaid
##    #medicaid    #medicare  #goptaxscam #dualsfuture  #healthcare 
##          107           26           15           14           14 
##         #aca      #txlege            #         #cms          #ct 
##            8            6            5            4            4

The second dominant theme in this hashtag is #medicare, #goptaxscam, #dualsfuture, #healthcare and #aca.We can identify all the hashtags associated with the term “Medicaid”.

10 Cloud Plot

The result of both worlclouds are quite striking, they have some similar words even if they haven’t been used in the same proportions.

10.1 Medicare Cloud

We post wordcloud themes excluding #Medicare which is too obvious that will have a lot of them as it is the initial hashtag.

10.2 Medicaid Cloud

We post wordcloud themes excluding #Medicaid which is too obvious that will have a lot of them as it is the initial hashtag.

11 Sentiments Analysis

The sentiment analysis frequency shows that people still have different opinion and view, there is a significant differences between the sentiments of the #Medicaid and #Medicare tweets, overall it’s not easy to clearly conclude somethings with #Medicaid tweets containing a greater frequency of negative, disgust, trust and joy words and #Medicare tweets containing more positive, anticipation, fear, sadness and surprise words.

11.1 Sentiment Score for Combined Words

11.2 Contribution to Sentiment by Retweets

11.2.1 For Medicare

11.2.2 For Medicaid

11.3 Words Contribution to Sentiment Score

11.3.1 For Medicare

11.3.2 For Medicaid

11.4 Sentiment by Sources

12 Conclusion

The study of tweets is a strong focus of social media analysis because Twitter is become an important communication vector. This example shows that it is easy to initiate a first analysis based on data extracted directly online. When it comes to going in detail, explore in depth the information contained in the messages, the case is other. The data preparation phase is becoming as important as ever. Of the rigor that we demonstrate in this step will depend the credibility of the results that we will produce.