Objective

Here I use the database of DW-Nominate scores. At its core, the data seeks to use data about legislators’ choices (in this case, roll-call voting) to “map” legislators’ ideological positions. I am using the dataset to practice conducting exploratory data analysis, practice basic concepts in regression, correlation and causation, and running regressions.

Conduct exploratory data analysis

Summary Statistics

I am going to examine the hypothesis that older senators are more conservative.

## ── Data Summary ────────────────────────
##                            Values    
## Name                       Piped data
## Number of rows             537       
## Number of columns          25        
## _______________________              
## Column type frequency:               
##   numeric                  2         
## ________________________             
## Group variables            None      
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────
##   skim_variable       n_missing complete_rate  mean    sd    p0   p25   p50
## 1 nominate_percentile         0             1  49.9  29.0     0  24.8    50
## 2 age                         0             1  58.9  11.8    30  50      59
##     p75  p100 hist 
## 1    75   100 ▇▇▇▇▇
## 2    68    86 ▂▅▇▇▂

Visualizing a Single Variable

Bivariate Correlations

The correlation coefficient between age and nominate_percentile in the 116th Congress is -0.159. The two variables are weakly negatively correlated, meaning that as age increases, the percentile of NOMINATE decreases, but trivially.*

Plotting Bivariate Relationships

## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

Run a single regression

Using lm()

Effect of Age on DW Nominate Percentile
Higher percentile suggests more conservative
Variable Estimate Lower bound Upper bound
(Intercept) 45.7469111 37.4458230 54.0479993
age -0.3322965 -0.4687675 -0.1958254

Interpreting Results

The equation of the regression follows nominate_percent = b0 + b1 * Age. The y-intercept of 45.75 means that in the fictitious case that age is 0, the percent rank of nominate will be 45.75. The average treatment effect of increasing 1 year of age for a congressman is negative 0.33 percent. With every increase in age by 1 year, the percent rank decreases by 0.33 percent. This does not represent a definite causal relationship, however, as there can factors other than age that affect someone’s ideology. A senator might go through more experiences as he or she ages, which can shift his or her ideology from conservative to liberal. The change in ideology, in this case, is caused by the senator’s experience, rather than directly the age. The confidence interval allows us to get a more accurate idea of the range of the correlation coefficient, as if we did many boostrapped resamples of the congress.

Regression and the Rubin Casual Model

The Rubin Causal Model states that no causation without manipulation. If we interpreted the coefficient on military causally, we would say that the slope we get from our calculation would be the average causal effect of military on the percent rank, which is the difference between the coefficient of the group that is treated the the group that is not. However, since we are unable to see both results at the same time, this way of interpreting would not be truly causal.

Generalize to many regressions