Objective

Here I am going to use 538’s database of President Trump approval ratings, and data on his tweets from the Trump Tweet Archive. The data consists of the president’s tweets from 2017 and 2018.

My method: 1) exploratory analysis, 2) running a model, 3) generalizing the model.

Conduct exploratory data analysis

Saummary Statistics

## `summarise()` ungrouping output (override with `.groups` argument)
Data summary
Name Piped data
Number of rows 4929
Number of columns 21
_______________________
Column type frequency:
numeric 2
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
total_tweets 0 1 19.73 28.41 0.0 0.0 0 41 109 ▇▁▂▁▁
approve 0 1 41.36 4.08 23.9 38.1 41 44 59 ▁▂▇▃▁

Bivariate correlations

## Warning: Removed 429 rows containing missing values (geom_point).

Run a multivariate regression

Using lm()

Effect of Number of Tweets and Poll Quality on Reported Approval Rating
Data from fivethirtyeight and Trump Tweet Archive
Variable Estimate Lower bound Upper bound
(Intercept) 41.60 41.46 41.74
total_tweets 0.00 -0.01 0.00
high_q -2.35 -2.79 -1.90

Interpreting results

The estimated average treatment effect of high_q is -2.347. This model expresses that being in the A+, A, and A- range of a poll decreases the approval rating by 2.347 points. Since this is a multiple regression, the only way to justify the causal relationship of the effect of high_q on approval rating is ensuring that the total number of tweets is the only confounding factor affecting the approval rating. This is obviously not true, thus this cannot be a true ATE on approval ratings. Frequentist viewpoint on coefficients assumes that the approvak rating is a combination of weights multiplied by total tweet count and grade of poll, while the Bayesian viewpoint uses probability distribution to formulate a linear regression of different relationships.

Inetraction Variables

Effect of Number of Tweets and Poll Quality on Reported Approval Rating
Data from fivethirtyeight and Trump Tweet Archive
Variable Estimate Lower bound Upper bound
(Intercept) 41.63 41.49 41.77
total_tweets -0.01 -0.01 0.00
high_q -2.70 -3.23 -2.18
total_tweets:high_q 0.02 0.00 0.04

Estimating Fitted Values

approve = 41.629 - 0.0056 * total_tweets - 2.701 * high_q approve = 41.629 - 0.0056 * 84 - 2.701 * 1 approve = 38.458

## [1] 38.89575

Multiple Regression and the Rubin Causal Model

The “total_tweets” coefficient measures the average treatment effect of Trump sending an extra tweet a week on the approval rating of Trump. The “democrat” coefficient measures the average treatment effect of being a member of the opposing party on Trump’s polled approval rating. The “total_tweets:democrat” coefficient measures the ATE of total tweets on Trump’s approval rating when considering the party of the member. This is an explanatory model because it accounts for confounding variables and the Rubin Causal model is used to measure the causal effects given appropriate randomization and accounting of confounding factors.

Generalize to many regressions

## `summarise()` ungrouping output (override with `.groups` argument)