Load the data set “anes_2016.csv”
anes <- read.csv("anes_2016.csv")
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'anes_2016.csv'
The names and descriptions of variables in the data set
| Name | Description |
|---|---|
age |
Age of individual at time of the survey |
atten tiontopolitics |
Some people don’t pay much attention to politics How about you? 1 Not much / 5 Very much attention |
a ttentiontonews |
Some people don’t pay much attention to news. How about you? 1 Not much / 5 Very much attention |
votein2012 |
Did you vote for President in 2012? 1 Yes / 0 No |
votein2016 |
Did you vote for President in 2016? 1 Yes / 0 No |
female |
Indicator variable for whether individual identifies as female (1) or not (0) |
white |
Indicator variable for whether individual identifies as white (1) or not (0) |
latino |
Indicator variable for whether individual identifies as latino (1) or not (0) |
black |
Indicator variable for whether individual is identifies as black (1) or not (0) |
asian |
Indicator variable for whether individual is identifies as asian (1) or not (0) |
registered |
Indicator variable for whether an individual was registered to vote (1) or not (0) |
gotochurch |
Indicator variable for whether an individual goes to church (1) or not (0) |
internetathome |
Indicator variable for whether an individual has internet at home (1) or not (0) |
homeowner |
Indicator variable for whether an individual was home owner (1) or not (0) |
clintonfeel |
Self-placement on feeling regarding Clinton from dislike (1) to like a lot (100) |
trumpfeel |
Self-placement on feeling regarding Trump from dislike (1) to like a lot (100) |
conservatism |
Self-placement on the level of conservatism from Low (1) to High (7) |
Operators
# used to comment code<- Assignment operator used to create new
objects$ used to access an element inside an object, such as a
variable inside a dataframe (data$variable)== relational operator[] subsetting populationFunctions
read.csv(”filename.csv”) reads CSV fileshead() or tail() shows the first/last
observations in a dataframedim() provides the dimensions of a dataframemean(Data$Variable) calculates the mean of a
variablemedian(Data$Variable) calculates the median of a
variablesd(Data$Variable) calculates the standard
deviationvar(Data$Variable) calculates the variancetable(Data$Variable) creates a frequency tableprop.table(table(Data$Variable)) creates a table of
proportionstable(Data$Variable1, Data$Variable1) creates a two-way
frequency tablehist(Data$Variable) creates an histogramplot(Data$Variable1, Data$Variable1) creates a scatter
plotcor(Data$Variable1, Data$Variable1) calculates the
correlation coefficientlm() fits a linear model. It requires a formula of
the type: Y~X, where Y identifies the outcome variable and X identifies
the X variable. lm(data$y_var~data$x_var) or
lm(y_var~x_var, data=data)
summary(lm()) provides a summary of the fitted
linear model.
abline() adds a straight line to a graph. To add the
fitted line, we specify as the main argument the object that contains
the output of the lm() function.
fit<-lm(Y~X);abline(fit)
Step 1 - Fit a Model:
Step 2 - Make a Prediction:
\[\hat{Y}_i = \hat{\alpha} + \hat{\beta} X_i\] ### Interpretation of \(\hat{\alpha}\) and \(\hat{\beta}\)
R functions
lm \(\rightarrow\)
linear modellm(data$outcomeYvariable ~ data$predictorXVariable)
or
lm(outcomeYvariable ~ predictorXVariable, data=dataname)
The outcome variable \(Y\) is predicted by the predictor variable \(X\)
Let’s estimating a linear model predicting attention to politics with education
\[\widehat{Attention to Politics}_i = \hat{\alpha} + \hat{\beta} Education_i\]
#
#
\[\widehat{Attention to Politics}_i = 3.364 + 0.093 Education_i\]
My friend Jane’s election level is 6, which is her predicted value for attention to politics?
3.364 + 0.093 * 6
## [1] 3.922
To visualize a summary of the results we use
summary(lm(data$Y ~ data$Y))
\(R^2\) (R-Squared): - What proportion of variation in the outcome is explained by the model? - Ranges from 0 to 1
#summary(lm(anes$attentiontopolitics ~ anes$education))
Note: \(R^2=0.018 \rightarrow\) this model explains 1.8% of the variation of the outcome (attention to politics)
#abline(lm_attention) # adds a fitted line
Estimate a linear model where you predict trumpfeel by
education
FORMAT: plot(data$variable1,data$variable2)
# read.csv
trumpfeel and educationTo calculate the correlation coefficient between two variables in R,
we use the function cor(). FORMAT:
cor(data$variable1, data$variable2)
# read.csv
Here write your interpretation:
The fitted line: \[\hat Y_i=\hat\alpha+\hat\beta X_i\] - \(\hat\alpha\) : the estimated intercept - \(\hat\beta\) : the estimated slope
To estimate the coefficients of the linear model using the least
squares method in R, we use the function lm(), which stands
for linear model. This function requires that we specify as the main
argument a formula of the type: Y~X, where Y identifies the outcome
variable and X identifies the predictor. There are two ways to establish
a linear model.
FORMAT1:
lm(data$outcomevariable ~ data$predictorvariable)
FORMAT2:
lm(outcomevariable ~ predictorvariable, data=data)
variables: trumpfeel and education
# read.csv
the estimated intercept (\(\hat\alpha\)) is:
the estimated slope (\(\hat\beta\)), the coefficient for the variable education: is:
Here write the fitted linear model
Now write the interpretation
We can get more detailed information with the function of
summary() including \(R^2\) Format:
summary(lm(outcomevariable ~ predictorvariable, data=data))
# read.csv
What proportion of variation in the outcome is explained by the model? Answer here
abline() adds a straight line to a graph. To add the
fitted line, we specify as the main argument the object that contains
the output of the lm() function. In other words, we should
specify the output of the fitted model when adding the fitted line with
abline() function. This will add the fitted line to the
most recently created plot and will give you an error message if you
have yet to create any plot. If you get error message (‘plot.new has not
been called yet’), then run the functions, plot() and
abline(), together.
FORMAT: fit<-lm(Y~X); abline(fit)
Estimate a linear model predicting attentiontonews with
age. In other words, Y=attentiontonews,
X=age. Please, repeat what you did before and interpret the
outcome.
# read.csv