The RPubs link to this report can be found here

Facebook Posts from Donald Trump and Hillary Clinton during their campaigns

The presence of social media was bigger during this election cycle than ever before. With Facebook and twitter becoming news sources, outlets for the candidates to express their opinions, etc., it is clear that social media played a role. Since social media is a relatively new political factor, I wanted to look into it more. I scraped the public facebook status data from both candidates using this python script. This data included number of reactions, number of each type of reaction, date of publication, number of shares, number of comments, and status type. It is from the date each of the candidates announced their candicacy until November 8, 2016. Trump announced candidacy June 16, 2015 and Clinton announced candidacy April 12, 2015.

I looked into what made a post popular. To measure popularity, I looked at the number of reactions. Another measure could have looked at is the number of shares, but the two are generally pretty closely correlated, so I didn’t think it mattered too much. Also, the top 25 posts by shares and top 25 posts by reactions are exactly the same.

I investigated whether the following variables had any effect on the popularity of a post. ### variables * The proportion of total reactions that is a certain type (like, love, wow, haha, sad, angry) * Status type (status, video, photo, link) * Time of publication * Trends in national polling averages * A key event for each candidate * Trump’s posts including the words “Crooked Hillary” in the post

In the end, most of the variables didn’t show any significant correlation. This report will offer an analysis of each of these variables, as well some suggestions as to what variables might be better to look at.

Initial investigation and overview of the data

Some basic summary facts about the facebook posts will be important to keep in mind during the rest of the analysis.

Below, we see that gets on average, Trump gets about 3.4 times more reactions per post than Clinton (21,668 compared to 74,354). This is depite the fact that Clinton had 25% more posts than Trump. Clinton’s frequency of posting also accounts for the fact that her total reactions over the campaign period outweigh Trump’s.

Total posts during campaign period
user total_posts
Clinton 4084
Trump 3254

The graph below shows the number of reactions per post over time. The smoother lines show the average trends. Vertical lines showing many unpopular posts on the same day correspond to advertisements. On January 1, Trump posted the same advertisement to donate to his campaign 35 times On July 21, Trump shared a advertisement photo about the GOP Convention 14 times.

Effect of the type of reaction

I analyzed whether or not the proportion of total reactions that was each type of reaction was associated with popularity of posts. Reactions options on Facebook were first made available Feb 24, 2016, so any posts from before that date have value “NA” for proportion of each reaction type.

Below, you can see that for the top 25 posts vs. all posts, the proportion of the reactions that are “likes” decreases drastically and the proportion of reactions that are “angry” increases drastically. prop_love actually decreases. It seems that perhaps posts that illicit anger are more popular.

Breakdown of reactions by type for all posts
user avg_p_like avg_p_love avg_p_wow avg_p_haha avg_p_sad avg_p_angry
Clinton 0.8818 0.0719 0.0065 0.0114 0.0064 0.0218
Trump 0.9036 0.0592 0.0059 0.0096 0.0032 0.0184
Breakdown of reactions by type for top 25 posts of each user
user avg_p_like avg_p_love avg_p_wow avg_p_haha avg_p_sad avg_p_angry
Clinton 0.6422 0.0099 0.0522 0.0144 0.0300 0.2514
Trump 0.6632 0.0062 0.0503 0.0122 0.0091 0.2590

However, in the general trend, there is not much of a correlation. It appears that “prop_angry” is mostly just related to the date of publication (closer to the election/after the primaries there are more posts with high “prop_angry” )

Just out of curiousity, I looked at what some of the most angering posts of the election were. Here they are!

Ouliers for Clinton:

  • Purple Heart
  • Trump endorses “some sort of punishment” for women who get abortions

Outliers for Trump

  • “Hillary Clinton will increase Syrian refugees by 550% without a realistic screening process”
  • “Four brace Americans lost their lives in Benghazi and Hillary Clinton falsely said tonight that we did not lose one American life in Libya. SAD!”
  • “Animals representing Hillary Clinton and Dems in North Carolina just firebombed our office in Orange County because we are winning.”
  • “Hours after being interviewed by the FBI, Hillary Clinton told MSNBC she broke no laws because she never sent or recieved confidential emails.”

Other reaction types

Next, I looked at the other reaction types. The plots for all except for “like” are set not to show in the report, but can be run in the Rmd file. I chose not to include them because they were not very interesting. However, with likes, there is a general (weak) trend that posts with more total reactions have less likes (thus more of variety of reaction type, as like is the default reaction type).

Investigating status type

The following graphics show how the average number of reactions on a post differ between status types.

user status_type mean_rxns num_posts
Trump status 96398.539 750
Trump video 79032.250 584
Trump photo 65198.850 1277
Trump link 62658.510 643
Clinton photo 27957.806 1349
Clinton status 25920.932 146
Clinton video 23037.512 1225
Clinton link 13823.703 1354
Trump event 6314.000 1
Clinton event 3300.000 3
Clinton note 3132.714 7

It is interesting to note that the most popular type of status is different for each candidate. Also that when looking at the table, we see that the more popular status types don’t have the most posts. It doesn’t appear that the campaign believes the type of status has enough of an effect for them to change their behavior.

Time of day posted

The number of reactions on a post is mostly unaffected by the time of day it is published. However, there pattern is similar for both candidates. Posts in the latest hours of the night are most popular. However, it is hard to use these results because the time of publication depends on the location of the candidate who is posting. They travel often during the campaign.

Polls

I compared the trends in reactions to the trens in the national polling percentages (I pulled from every 5th day on USA Today ). Looking at the polling precentage trends next to the facebook reaction trends, it is clear that there isn’t much of a correlation between change in polling percentage and number of reactions.

Key events

I looked at one key event for each candidate to see if they had an effect on popularity of a post. ### The events were: * Clinton calls half of Trump’s supporters in a “Basket of Deplorables”: September 9, 2016 * A video of Trump is released in which he says “grab her by the pussy”: October 7, 2016

In both cases, there was a slight increase in average number of reactions after their negative event, which is not what we would expect.

“Crooked Hillary”

183 of Trump’s posts included the words “Crooked Hillary.” Does his mention of his opponent affect the popularity of a post?

Breakdown of reactions by type for posts that include ‘Crooked Hillary’
crooked_hillary avg_p_like avg_p_love avg_p_wow avg_p_haha avg_p_sad avg_p_angry
FALSE 0.9035 0.0616 0.0059 0.0085 0.0032 0.0173
TRUE 0.9045 0.0330 0.0066 0.0226 0.0025 0.0305

Well, posts about “Crooked Hillary” definitely make Trump’s followers angrier! But do they draw more attention? Not really…

Conclusion

With so much variation and no strong associations in this dataset, I would conclude that there isn’t as much to be learned from Facebook data (that is accessible to the public) as I thought there would be. It is hard to interpret what the goal of a post is. It may be to have as many people see it as possible, or perhaps to have a certain impact on those who do see it. If we don’t know what the goal of a post is, there isn’t much use in analyzing its popularity. Additionally, reactions and shares don’t tell us everything about how many people saw the post. Not everyone who sees a post clicks like, comment or share. If I wanted to analyze the factors that influence the popularity of a post, I would want data on how many times a post is viewed as well as data on the demographics of the users who viewed it. This is important because most posts are targetted towards a specific demographic.