Here I am going to use 538’s database of President Trump approval ratings, and data on his tweets from the Trump Tweet Archive. The data consists of the president’s tweets from 2017 and 2018.
My method: 1) exploratory analysis, 2) running a model, 3) generalizing the model.
## `summarise()` ungrouping output (override with `.groups` argument)
| Name | Piped data |
| Number of rows | 4929 |
| Number of columns | 21 |
| _______________________ | |
| Column type frequency: | |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| total_tweets | 0 | 1 | 19.73 | 28.41 | 0.0 | 0.0 | 0 | 41 | 109 | ▇▁▂▁▁ |
| approve | 0 | 1 | 41.36 | 4.08 | 23.9 | 38.1 | 41 | 44 | 59 | ▁▂▇▃▁ |
## Warning: Removed 429 rows containing missing values (geom_point).
| Effect of Number of Tweets and Poll Quality on Reported Approval Rating | |||
|---|---|---|---|
| Data from fivethirtyeight and Trump Tweet Archive | |||
| Variable | Estimate | Lower bound | Upper bound |
| (Intercept) | 41.60 | 41.46 | 41.74 |
| total_tweets | 0.00 | -0.01 | 0.00 |
| high_q | -2.35 | -2.79 | -1.90 |
The estimated average treatment effect of high_q is -2.347. This model expresses that being in the A+, A, and A- range of a poll decreases the approval rating by 2.347 points. Since this is a multiple regression, the only way to justify the causal relationship of the effect of high_q on approval rating is ensuring that the total number of tweets is the only confounding factor affecting the approval rating. This is obviously not true, thus this cannot be a true ATE on approval ratings. Frequentist viewpoint on coefficients assumes that the approvak rating is a combination of weights multiplied by total tweet count and grade of poll, while the Bayesian viewpoint uses probability distribution to formulate a linear regression of different relationships.
| Effect of Number of Tweets and Poll Quality on Reported Approval Rating | |||
|---|---|---|---|
| Data from fivethirtyeight and Trump Tweet Archive | |||
| Variable | Estimate | Lower bound | Upper bound |
| (Intercept) | 41.63 | 41.49 | 41.77 |
| total_tweets | -0.01 | -0.01 | 0.00 |
| high_q | -2.70 | -3.23 | -2.18 |
| total_tweets:high_q | 0.02 | 0.00 | 0.04 |
approve = 41.629 - 0.0056 * total_tweets - 2.701 * high_q approve = 41.629 - 0.0056 * 84 - 2.701 * 1 approve = 38.458
## [1] 38.89575
The “total_tweets” coefficient measures the average treatment effect of Trump sending an extra tweet a week on the approval rating of Trump. The “democrat” coefficient measures the average treatment effect of being a member of the opposing party on Trump’s polled approval rating. The “total_tweets:democrat” coefficient measures the ATE of total tweets on Trump’s approval rating when considering the party of the member. This is an explanatory model because it accounts for confounding variables and the Rubin Causal model is used to measure the causal effects given appropriate randomization and accounting of confounding factors.
## `summarise()` ungrouping output (override with `.groups` argument)