1. Today’s Data

Load the data set “anes_2016.csv”

anes <- read.csv("anes_2016.csv")

## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'anes_2016.csv'

The names and descriptions of variables in the data set

Name	Description
`age`	Age of individual at time of the survey
`atten tiontopolitics`	Some people don’t pay much attention to politics How about you? 1 Not much / 5 Very much attention
`a ttentiontonews`	Some people don’t pay much attention to news. How about you? 1 Not much / 5 Very much attention
`votein2012`	Did you vote for President in 2012? 1 Yes / 0 No
`votein2016`	Did you vote for President in 2016? 1 Yes / 0 No
`female`	Indicator variable for whether individual identifies as female (1) or not (0)
`white`	Indicator variable for whether individual identifies as white (1) or not (0)
`latino`	Indicator variable for whether individual identifies as latino (1) or not (0)
`black`	Indicator variable for whether individual is identifies as black (1) or not (0)
`asian`	Indicator variable for whether individual is identifies as asian (1) or not (0)
`registered`	Indicator variable for whether an individual was registered to vote (1) or not (0)
`gotochurch`	Indicator variable for whether an individual goes to church (1) or not (0)
`internetathome`	Indicator variable for whether an individual has internet at home (1) or not (0)
`homeowner`	Indicator variable for whether an individual was home owner (1) or not (0)
`clintonfeel`	Self-placement on feeling regarding Clinton from dislike (1) to like a lot (100)
`trumpfeel`	Self-placement on feeling regarding Trump from dislike (1) to like a lot (100)
`conservatism`	Self-placement on the level of conservatism from Low (1) to High (7)

2. R Cheat sheet

1. Quick review

Operators

# used to comment code
<- Assignment operator used to create new objects
$ used to access an element inside an object, such as a variable inside a dataframe (data$variable)
== relational operator
[] subsetting population

Functions

read.csv(”filename.csv”) reads CSV files
head() or tail() shows the first/last observations in a dataframe
dim() provides the dimensions of a dataframe
mean(Data$Variable) calculates the mean of a variable
median(Data$Variable) calculates the median of a variable
sd(Data$Variable) calculates the standard deviation
var(Data$Variable) calculates the variance
table(Data$Variable) creates a frequency table
prop.table(table(Data$Variable)) creates a table of proportions
table(Data$Variable1, Data$Variable1) creates a two-way frequency table
hist(Data$Variable) creates an histogram
plot(Data$Variable1, Data$Variable1) creates a scatter plot
cor(Data$Variable1, Data$Variable1) calculates the correlation coefficient

R functions for today

lm() fits a linear model. It requires a formula of the type: Y~X, where Y identifies the outcome variable and X identifies the X variable. lm(data$y_var~data$x_var) or lm(y_var~x_var, data=data)
summary(lm()) provides a summary of the fitted linear model.
abline() adds a straight line to a graph. To add the fitted line, we specify as the main argument the object that contains the output of the lm() function. fit<-lm(Y~X);abline(fit)

2. Prediction

Step 1 - Fit a Model:
- we will use variables (predictors) $X$ $\rightarrow$ to explain/predict the outcome $Y$
Step 2 - Make a Prediction:
- From Step 1, we use our model to get a predicted value for a new observation $\rightarrow$ $\widehat{Y}$

\[\hat{Y}_i = \hat{\alpha} + \hat{\beta} X_i\] ### Interpretation of $\hat{\alpha}$ and $\hat{\beta}$

$\hat{\alpha}$ is the estimated intercept, corresponds to the prediction when $X_i$ =0
$\hat{\beta}$ is the estimated slope, a positive $\hat{\beta}$ corresponds to a positive relationship, and a negative $\hat{\beta}$ corresponds to a negative relationship.
It can be interpreted as:
Changing $X_i$ by some amount $\Delta X$ changes the prediction in $\widehat{Y}_i$ by $\hat{\beta} \Delta X$

R functions

lm $\rightarrow$ linear model

lm(data$outcomeYvariable  ~ data$predictorXVariable)

lm(outcomeYvariable  ~ predictorXVariable, data=dataname)

The outcome variable $Y$ is predicted by the predictor variable $X$

2.1 Practice Prediction

Let’s estimating a linear model predicting attention to politics with education

\[\widehat{Attention to Politics}_i = \hat{\alpha} + \hat{\beta} Education_i\]

\[\widehat{Attention to Politics}_i = 3.364 + 0.093 Education_i\]

My friend Jane’s election level is 6, which is her predicted value for attention to politics?

3.364 + 0.093 * 6

## [1] 3.922

2.1 Summarizing predicted LM & R-Squared

To visualize a summary of the results we use summary(lm(data$Y ~ data$Y))

$R^2$ (R-Squared): - What proportion of variation in the outcome is explained by the model? - Ranges from 0 to 1

#summary(lm(anes$attentiontopolitics  ~ anes$education))

Note: $R^2=0.018 \rightarrow$ this model explains 1.8% of the variation of the outcome (attention to politics)

2.2 Visualizing results

#abline(lm_attention) # adds a fitted line

3. Practice

Estimate a linear model where you predict trumpfeel by education

1. Exploration before the prediction

1.1 Scatter plot - create a scatter plot showing the relationship

FORMAT: plot(data$variable1,data$variable2)

# read.csv

1.2 Correlation - calculates the correlation between `trumpfeel` and `education`

To calculate the correlation coefficient between two variables in R, we use the function cor(). FORMAT: cor(data$variable1, data$variable2)

# read.csv

Here write your interpretation:

2. Linear Model

2.1 Estimating a linear model

The fitted line: \[\hat Y_i=\hat\alpha+\hat\beta X_i\] - $\hat\alpha$ : the estimated intercept - $\hat\beta$ : the estimated slope

To estimate the coefficients of the linear model using the least squares method in R, we use the function lm(), which stands for linear model. This function requires that we specify as the main argument a formula of the type: Y~X, where Y identifies the outcome variable and X identifies the predictor. There are two ways to establish a linear model.

FORMAT1: lm(data$outcomevariable ~ data$predictorvariable)

FORMAT2: lm(outcomevariable ~ predictorvariable, data=data)

variables: trumpfeel and education

# read.csv

the estimated intercept ($\hat\alpha$) is:
the estimated slope ($\hat\beta$), the coefficient for the variable education: is:

Here write the fitted linear model

Now write the interpretation

(Intercept):
(slope):

2.2 Summarize the linear model estimated

We can get more detailed information with the function of summary() including $R^2$ Format: summary(lm(outcomevariable ~ predictorvariable, data=data))

# read.csv

What proportion of variation in the outcome is explained by the model? Answer here

2.3 Adding the fitted line to the scatter plot

abline() adds a straight line to a graph. To add the fitted line, we specify as the main argument the object that contains the output of the lm() function. In other words, we should specify the output of the fitted model when adding the fitted line with abline() function. This will add the fitted line to the most recently created plot and will give you an error message if you have yet to create any plot. If you get error message (‘plot.new has not been called yet’), then run the functions, plot() and abline(), together.

FORMAT: fit<-lm(Y~X); abline(fit)

4. More practice

Estimate a linear model predicting attentiontonews with age. In other words, Y=attentiontonews, X=age. Please, repeat what you did before and interpret the outcome.

# read.csv

Week-8 Practice

Zimirah Wilson

1. Today’s Data

2. R Cheat sheet

1. Quick review

R functions for today

2. Prediction

2.1 Practice Prediction

2.1 Summarizing predicted LM & R-Squared

2.2 Visualizing results

3. Practice

1. Exploration before the prediction

1.1 Scatter plot - create a scatter plot showing the relationship

1.2 Correlation - calculates the correlation between `trumpfeel` and `education`

2. Linear Model

2.1 Estimating a linear model

2.2 Summarize the linear model estimated

2.3 Adding the fitted line to the scatter plot

4. More practice

Week-8 Practice

Zimirah Wilson

1. Today’s Data

2. R Cheat sheet

1. Quick review

R functions for today

2. Prediction

2.1 Practice Prediction

2.1 Summarizing predicted LM & R-Squared

2.2 Visualizing results

3. Practice

1. Exploration before the prediction

1.1 Scatter plot - create a scatter plot showing the relationship

1.2 Correlation - calculates the correlation between trumpfeel and education

2. Linear Model

2.1 Estimating a linear model

2.2 Summarize the linear model estimated

2.3 Adding the fitted line to the scatter plot

4. More practice

1.2 Correlation - calculates the correlation between `trumpfeel` and `education`