The RPubs link to this report can be found here
The presence of social media was bigger during this election cycle than ever before. With Facebook and twitter becoming news sources, outlets for the candidates to express their opinions, etc., it is clear that social media played a role. Since social media is a relatively new political factor, I wanted to look into it more. I scraped the public facebook status data from both candidates using this python script. This data included number of reactions, number of each type of reaction, date of publication, number of shares, number of comments, and status type. It is from the date each of the candidates announced their candicacy until November 8, 2016. Trump announced candidacy June 16, 2015 and Clinton announced candidacy April 12, 2015.
I looked into what made a post popular. To measure popularity, I looked at the number of reactions. Another measure could have looked at is the number of shares, but the two are generally pretty closely correlated, so I didn’t think it mattered too much. Also, the top 25 posts by shares and top 25 posts by reactions are exactly the same.
I investigated whether the following variables had any effect on the popularity of a post. ### variables * The proportion of total reactions that is a certain type (like, love, wow, haha, sad, angry) * Status type (status, video, photo, link) * Time of publication * Trends in national polling averages * A key event for each candidate * Trump’s posts including the words “Crooked Hillary” in the post
In the end, most of the variables didn’t show any significant correlation. This report will offer an analysis of each of these variables, as well some suggestions as to what variables might be better to look at.
Some basic summary facts about the facebook posts will be important to keep in mind during the rest of the analysis.
Below, we see that gets on average, Trump gets about 3.4 times more reactions per post than Clinton (21,668 compared to 74,354). This is depite the fact that Clinton had 25% more posts than Trump. Clinton’s frequency of posting also accounts for the fact that her total reactions over the campaign period outweigh Trump’s.
| user | total_posts |
|---|---|
| Clinton | 4084 |
| Trump | 3254 |
The graph below shows the number of reactions per post over time. The smoother lines show the average trends. Vertical lines showing many unpopular posts on the same day correspond to advertisements. On January 1, Trump posted the same advertisement to donate to his campaign 35 times On July 21, Trump shared a advertisement photo about the GOP Convention 14 times.
I analyzed whether or not the proportion of total reactions that was each type of reaction was associated with popularity of posts. Reactions options on Facebook were first made available Feb 24, 2016, so any posts from before that date have value “NA” for proportion of each reaction type.
Below, you can see that for the top 25 posts vs. all posts, the proportion of the reactions that are “likes” decreases drastically and the proportion of reactions that are “angry” increases drastically. prop_love actually decreases. It seems that perhaps posts that illicit anger are more popular.
| user | avg_p_like | avg_p_love | avg_p_wow | avg_p_haha | avg_p_sad | avg_p_angry |
|---|---|---|---|---|---|---|
| Clinton | 0.8818 | 0.0719 | 0.0065 | 0.0114 | 0.0064 | 0.0218 |
| Trump | 0.9036 | 0.0592 | 0.0059 | 0.0096 | 0.0032 | 0.0184 |
| user | avg_p_like | avg_p_love | avg_p_wow | avg_p_haha | avg_p_sad | avg_p_angry |
|---|---|---|---|---|---|---|
| Clinton | 0.6422 | 0.0099 | 0.0522 | 0.0144 | 0.0300 | 0.2514 |
| Trump | 0.6632 | 0.0062 | 0.0503 | 0.0122 | 0.0091 | 0.2590 |
However, in the general trend, there is not much of a correlation. It appears that “prop_angry” is mostly just related to the date of publication (closer to the election/after the primaries there are more posts with high “prop_angry” )
Just out of curiousity, I looked at what some of the most angering posts of the election were. Here they are!
Next, I looked at the other reaction types. The plots for all except for “like” are set not to show in the report, but can be run in the Rmd file. I chose not to include them because they were not very interesting. However, with likes, there is a general (weak) trend that posts with more total reactions have less likes (thus more of variety of reaction type, as like is the default reaction type).
The following graphics show how the average number of reactions on a post differ between status types.
| user | status_type | mean_rxns | num_posts |
|---|---|---|---|
| Trump | status | 96398.539 | 750 |
| Trump | video | 79032.250 | 584 |
| Trump | photo | 65198.850 | 1277 |
| Trump | link | 62658.510 | 643 |
| Clinton | photo | 27957.806 | 1349 |
| Clinton | status | 25920.932 | 146 |
| Clinton | video | 23037.512 | 1225 |
| Clinton | link | 13823.703 | 1354 |
| Trump | event | 6314.000 | 1 |
| Clinton | event | 3300.000 | 3 |
| Clinton | note | 3132.714 | 7 |
It is interesting to note that the most popular type of status is different for each candidate. Also that when looking at the table, we see that the more popular status types don’t have the most posts. It doesn’t appear that the campaign believes the type of status has enough of an effect for them to change their behavior.
The number of reactions on a post is mostly unaffected by the time of day it is published. However, there pattern is similar for both candidates. Posts in the latest hours of the night are most popular. However, it is hard to use these results because the time of publication depends on the location of the candidate who is posting. They travel often during the campaign.
I compared the trends in reactions to the trens in the national polling percentages (I pulled from every 5th day on USA Today ). Looking at the polling precentage trends next to the facebook reaction trends, it is clear that there isn’t much of a correlation between change in polling percentage and number of reactions.
I looked at one key event for each candidate to see if they had an effect on popularity of a post. ### The events were: * Clinton calls half of Trump’s supporters in a “Basket of Deplorables”: September 9, 2016 * A video of Trump is released in which he says “grab her by the pussy”: October 7, 2016
In both cases, there was a slight increase in average number of reactions after their negative event, which is not what we would expect.
183 of Trump’s posts included the words “Crooked Hillary.” Does his mention of his opponent affect the popularity of a post?
| crooked_hillary | avg_p_like | avg_p_love | avg_p_wow | avg_p_haha | avg_p_sad | avg_p_angry |
|---|---|---|---|---|---|---|
| FALSE | 0.9035 | 0.0616 | 0.0059 | 0.0085 | 0.0032 | 0.0173 |
| TRUE | 0.9045 | 0.0330 | 0.0066 | 0.0226 | 0.0025 | 0.0305 |
Well, posts about “Crooked Hillary” definitely make Trump’s followers angrier! But do they draw more attention? Not really…
With so much variation and no strong associations in this dataset, I would conclude that there isn’t as much to be learned from Facebook data (that is accessible to the public) as I thought there would be. It is hard to interpret what the goal of a post is. It may be to have as many people see it as possible, or perhaps to have a certain impact on those who do see it. If we don’t know what the goal of a post is, there isn’t much use in analyzing its popularity. Additionally, reactions and shares don’t tell us everything about how many people saw the post. Not everyone who sees a post clicks like, comment or share. If I wanted to analyze the factors that influence the popularity of a post, I would want data on how many times a post is viewed as well as data on the demographics of the users who viewed it. This is important because most posts are targetted towards a specific demographic.