R Code Quick Reference for Assignment 2 (SO648, 2026)
Assignment 2
1500 words (excluding code), due Monday May 11th, 23:59
- Select a dataset from those provided to you on Moodle.
- Identify a dependent variable, and three independent variables to base your assignment on.
- Produce a linear regression model using the variables identified in step 2.
- Interpret the results of your model.
- Give a short discussion, with supporting references, of your results.
- Include the code used to produce your results at the end of your assignment.
This page contains everything you need to answer this. You should use this page by following along in RStudio as you read. Wherever you encounter code, you can copy it over to RStudio and run it yourself. The key to doing the assignment in this way, with this page as a guide, is (1) using the supplied code, do the examples yourself as you do, and (2) substitute the variables, titles, labels, etc with those chosen by you, for your assignment. Ultimately, you can complete this assignment with four variables from any one of the course datasets. All of the steps you need to follow are given below. Please read the text fully and carefully before emailing for assistance (which you are welcome to do - but if the answer is in this page somewhere, I will politely send you back here). The backup video tutorials are available from my YouTube course playlist at this link.
01. Setting up Your Session in R
01.1 Setting your directory
The first thing you should do in any R session is ensure you have saved all of your required data files to a folder, and then set your working directory to point to this folder. Download and save all of the required files from Moodle, put them in a folder titled ‘so648_2026’, then set your working directory using the code below. Remember, you will need to change the folder address to the working directory on your own computer. Mine is in a folder located in a sub folder ‘…Teaching/so648_2026’. Use the code below to set yours.
01.2 Packages and libraries
Next you need to load the packages (libraries) that you will use for your work session. The ones you will need for this assignment can be loaded with the following code.
library(pacman)
pacman::p_load(dplyr, readr, tidyverse, ggplot2, haven, janitor, ggcorrplot, labelled,
Hmisc, pastecs, vtable, broom, GGally, cowplot, ggspatial,
expss, patchwork, gapminder, dendroTools, gtsummary, forcats, xfun)This is the new procedure we introduced in week 8, using the
pacman package. This allows yuou to specify a list of
packages and load them into R. It will detect if any are not installed,
and complete the installation for you. If you ever need to manually
install a package, for example, ggplot, then you can use the following
code.
01.3 Loading data from .csv (spreadsheet) format
Next, you will need some data. Making sure your datasets are saved
into your working directory, as outlined above, run the following code
(which depends on the ‘readr’ library) to read .csv spreadsheet data
into R. The resulting dataset will save to the Environment panel to the
right in RStudio. The code below will get two .csv files from your
working directory, so make sure the data files from Moodle are saved
into the folder set above as your working directory. If you loading in
your own data that you have produced yourself in Excel, ensure that it
is saved in .csv format, and not .xls or .xlsx format if you are using
the commands below as they rely on read_csv. R can handle
other Excel formats however, and you can use the manual Import Dataset
procedure in the Environment window to import your own data if you
want
01.4 Loading data from .dta (Stata) format
The European Social Survey file is supplied in a format native to the
statistical analysis programme Stata. Stata uses a .dta extension to
save its datasets. We can use the haven package from R to
import this, and you have already installed it using the above combined
package installation and library loading process provided by
pacman. The code to import the ESS dataset is as follows -
make sure you have set your working directory already by following the
procedures above.
ess11_ireland <- read_dta("ess11_ireland.dta")
Stata-formatted datasets will be imported to R with a few specific
differences that we will need to take care of with some additional code.
First, Stata has a particular way of assigning different codes for
different kinds of missing data. Whilst we tend to use NA
in R as a universal designation for a missing value, Stata uses codes
such as ‘.a, .b, .c,’ to designate cases where the question is refused,
the respondent doesn’t know, or where data are missing possibly due to
mis-coding. You will notice some additional code in some of the
procedures below, and this is to account for two things: (1) how R has
handled the missing cases when the data were imported from the original
.dta Stata file, and (2) how R assigns a variable type to the imported
variables. The second issue will largely affect how R identifies certain
variables as factors, and sometimes where you see additional code, it
will be to ‘force’ R to treat a variable as a factor, without making
complex or permanent changes to the dataset. This is in line with good
practice in data pipelining, where we want to pass, or ‘pipe’ our data
through certain filters or settings, before applying our commands. This
will leave the original dataset largely untouched, and will allow us to
do many iterations of certain actions using our code, without altering
the original dataset. We will deal with these issues as we encounter
them below, and don’t worry, you won’t have to figure this part out
yourselves. Let’s see how this looks in practice. In the output below,
we query the structure of a specific variable, first in the World Bank
dataset, then in the ESS dataset.
## dbl+lbl [1:2017] 3, 10, 3, 9, 10, 5, 7, 4, ...
## @ label : chr "Trust in the police"
## @ format.stata: chr "%4.0g"
## @ labels : Named num [1:14] 0 1 2 3 4 5 6 7 8 9 ...
## ..- attr(*, "names")= chr [1:14] "No trust at all" "1" "2" "3" ...
## chr [1:47] "Non-EU" "EU" "EU" "Non-EU" "EU" "Non-EU" "Non-EU" "Non-EU" ...
As you can see from the output, the variable eu from the
so648_country dataset is a character variable recording
countries as either EU or Non-EU. The variable trstplt from
the ess11_ireland dataset contains more information. It’s
type is dbl_lbl which is a labelled format variable, and
the assigned labels are listed in the @ labelsrow. We could
query for further information on this, for example using the
unique command before the bracketed dataset and variable
name, but for now all we need to appreciate is that when R imports the
ESS dataset from its native Stata format, it assigns a specific type to
it, and also retains the labelling system created in Stata. Later, we
may want to treat these variables as ‘factors’ in our regression
analysis, and to do so will require some fine-tuning in the code. We
will do this by ‘piping’ the data to the command, applying some
additional settings in the process.
02. Summary Statistics and Basic Visaulisations in R
02.1 Simple summary statistics
If you want to check your variables as you did with assignment 1, you
can use a small selection of commands. The focus of the assignment is on
regression modelling, but it is always necessary to perform some basic
inspections of the data to get familiar with the variables you are
working with. Let’s start with trstplc from the European
Social Survey. Let’s first check the variable structure, and get a look
at the unique values in the variable.
## dbl+lbl [1:2017] 3, 10, 3, 9, 10, 5, 7, 4, ...
## @ label : chr "Trust in the police"
## @ format.stata: chr "%4.0g"
## @ labels : Named num [1:14] 0 1 2 3 4 5 6 7 8 9 ...
## ..- attr(*, "names")= chr [1:14] "No trust at all" "1" "2" "3" ...
## <labelled<double>[12]>: Trust in the police
## [1] 3 10 9 5 7 4 8 0 2 NA(a) 6 1
##
## Labels:
## value label
## 0 No trust at all
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## 7 7
## 8 8
## 9 9
## 10 Complete trust
## NA(a) Refusal
## NA(b) Don't know
## NA(c) No answer
If you want, you could also use
val_labels(ess11_ireland$trstplc) although the output is
slightly less legible for large variables. Next, we take a look at the
summary statistics. This is a little different, since
trstplc is a variable that by definition is ordinal, but
which we can sometimes treat as ratio, and therefore calculate a mean.
The mean
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 5.000 7.000 6.508 8.000 10.000 18
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1999 6.51 2.36 7 6.51 1.48 0 10 10 -0.93 0.54 0.05
##
## No trust at all 1 2 3 4
## 3.2 1.9 2.7 3.6 5.1
## 5 6 7 8 9
## 11.8 11.2 20.0 23.1 10.4
## Complete trust Refusal Don't know No answer
## 6.3 0.0 0.8 0.0
So from the above pieces of output, we can see that
trstplc is a variable coded from 0-10, where 0 = NO trust
at all, and 10 = Complete trust. It has a mean of 6.51, a standard
deviation of 2.36, and a median of 7. 23.26% of respondents recorded
their trust in the police as ‘8’.
02.2 Basic visualisations
The example graph below uses the same variable, this time drawing a
bar chart to show the distribution of cases across each category. This
is the same code you used in assignment 1, but note some additions in
the code, namely %>% filter(!is.na(trstplc) (a filter
that removes any NA categories from the procedure), and
as_factor(trstplc) (a temporary transformation that will
instruct R to treat the variable trstplc as a factor
variable). This will ensure that it recognises the variable as made up
of factors with a fixed set of values.
ggplot(data = ess11_ireland %>% filter(!is.na(trstplc)),
aes(x = as_factor(trstplc))) +
labs(
y = "Percentage",
x = "Trust in the Police"
) +
geom_bar(aes(y = after_stat(count / sum(count) * 100))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))Repeating the process with another of our possible independent
variables, maritalb (marital status), we need to take a
little more care. This is a nominal variable, and therefore calculating
a mean or median would not make sense. Instead, we can see the
percentage of respondents falling into each marital status category, and
graph this. Taking the code from above and adapting it with this
variable, we get the following output.
##
## Legally married
## 51.0
## In a legally registered civil union
## 1.3
## Legally separated
## 2.7
## Legally divorced/Civil union dissolved
## 5.1
## Widowed/Civil partner died
## 8.5
## None of these (NEVER married or in legally registered civil union)
## 30.3
## Refusal
## 0.7
## Don't know
## 0.3
## No answer
## 0.0
ggplot(data = ess11_ireland %>% filter(!is.na(maritalb)),
aes(x = as_factor(maritalb))) +
labs(
y = "Percentage",
x = "Marital status"
) +
geom_bar(aes(y = after_stat(count / sum(count) * 100))) +
theme(axis.text.x = element_text(angle = 35, hjust = 1))03. Linear Regression in R
03.1 Doing linear regression
Once you have chosen your set of variables (three independent and one
dependent), you are ready to implement your regression model. Remember,
the method we have reviewed in class covers linear regression for ratio
dependent variables only. This means your dependent variable should be
measures at the ratio level, or at least be subject to the same
assumptions applied to the trstplc variable above - a
variable with sufficient categories to be meaningfully treated as
numerical, even if by definition it is not.
Taking the example we looked at in class, we tried to model political
placement in relation to education. Graphically, this involved plotting
the distribution of respondent scores on both variables, and fitting a
linear trend to this plot to see the general direction of association.
We found a small propensity toward leftist scores with higher levels of
education, and we implemented this using the code as shown below. In
particualar, notice the inclusion of another layer,
geom-jitter. One of the nice features of graphing in R with
ggplot is that we can layer up our graphs as we like. Without the jitter
function, the graph will show heaping on the individual scores of
lrscale, and make the graph harder to read. Try running the code below
again with this line removed, if you want to compare the difference.
ggplot(data = ess11_ireland) +
geom_point(mapping = aes(x = eduyrs, y = lrscale)) +
theme_gray(base_size = 12) +
geom_jitter(aes(x = eduyrs, y = lrscale), alpha = .5, width = 2, height = .5) +
geom_smooth(aes(x = eduyrs, y = lrscale), method = "lm")At the conclusion of week 10, we fit out model, plotted, and
interpreted the results using the <- lm procedure to
assign a linear model to an object in the global environment. In the
example below, we model placement on the left-right scale (lrscle) as a
function of eduyrs, nwspol, atchctr, and iphppla. By interpreting the
results shown below, we were table to conlcude that education had a
small effect on political placement, whilst attachment to country and
prosocial values also showed some evidence of association. Applying the
‘unit effect interpretation’, each additional year of education is
associated with a change of -.027 units of lrscale - a modest left-ward
driaft, as 0 = Left in the coding of lrscale.
model1 <- lm(formula = lrscale ~ eduyrs + nwspol + atchctr + iphlppla,
data = ess11_ireland)
summary(model1)##
## Call:
## lm(formula = lrscale ~ eduyrs + nwspol + atchctr + iphlppla,
## data = ess11_ireland)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7403 -0.8961 0.0550 0.8566 6.0187
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0640813 0.3146564 12.916 < 2e-16 ***
## eduyrs -0.0269096 0.0119747 -2.247 0.0248 *
## nwspol 0.0005866 0.0004597 1.276 0.2021
## atchctr 0.1029085 0.0261734 3.932 8.77e-05 ***
## iphlppla 0.1949188 0.0478931 4.070 4.92e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.941 on 1695 degrees of freedom
## (317 observations deleted due to missingness)
## Multiple R-squared: 0.02058, Adjusted R-squared: 0.01827
## F-statistic: 8.905 on 4 and 1695 DF, p-value: 4.084e-07
There are instances, however, where our variables can not be
interpreted as either ratio (‘real’ numbers), or ordinal (rank-ordered),
and to demonstrate this, we looked at the case of religiosity as
measured by frequency of religious service attendance. First, examine
the variable using the procedures shown above. We might use
str or unique and then create a table to do
so. This shows us that rlgatnd is an ordinal variable that
measures attendance in the form of frequency, but with intervals that
are not evenly spaced. In other words, we cannot say precisely how many
‘units’ more of attendance ‘less often’ is compared to ‘daily’.
## dbl+lbl [1:2017] 5, 6, 7, 7, 3, 1, 3, 3, ...
## @ label : chr "How often attend religious services apart from special occasions"
## @ format.stata: chr "%4.0g"
## @ labels : Named num [1:10] 1 2 3 4 5 6 7 NA NA NA
## ..- attr(*, "names")= chr [1:10] "Every day" "More than once a week" "Once a week" "At least once a month" ...
## <labelled<double>[8]>: How often attend religious services apart from special occasions
## [1] 5 6 7 3 1 NA(a) 2 4
##
## Labels:
## value label
## 1 Every day
## 2 More than once a week
## 3 Once a week
## 4 At least once a month
## 5 Only on special holy days
## 6 Less often
## 7 Never
## NA(a) Refusal
## NA(b) Don't know
## NA(c) No answer
##
## Every day More than once a week Once a week
## 1.9 4.8 21.0
## At least once a month Only on special holy days Less often
## 12.5 16.3 21.4
## Never Refusal Don't know
## 21.6 0.4 0.1
## No answer
## 0.0
Including this variable in a regression without modification would
give us a result that does not make sense, since what we want to know is
the effect of each category membership (each variety of attendance
frequency) on the dependent variable (political orientation). The
coefficient in the rlgatnd line under Coefficients below
cannot be interpreted. We need to transform the variable first into a
factor variable, before supplying it to the lm command.
model2 <- lm(formula = lrscale ~ eduyrs + nwspol + rlgatnd + iphlppla,
data = ess11_ireland)
summary(model2)##
## Call:
## lm(formula = lrscale ~ eduyrs + nwspol + rlgatnd + iphlppla,
## data = ess11_ireland)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8833 -0.8851 0.0954 0.7868 5.7286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.8558267 0.2476185 23.649 < 2e-16 ***
## eduyrs -0.0149707 0.0119302 -1.255 0.20971
## nwspol 0.0004853 0.0004550 1.067 0.28634
## rlgatnd -0.2157747 0.0282370 -7.642 3.57e-14 ***
## iphlppla 0.1521931 0.0467505 3.255 0.00115 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.92 on 1690 degrees of freedom
## (322 observations deleted due to missingness)
## Multiple R-squared: 0.04463, Adjusted R-squared: 0.04237
## F-statistic: 19.74 on 4 and 1690 DF, p-value: 6.778e-16
This is how we do it for rlgatnd. Although the code
looks lengthy, take a moment to look at the component parts line by
line. We have asked R to mutate the variable into a new
variable called rlgatnd_factor, supply this to the original
dataset ess11_ireland, and to calculate it by treating the
original rlgatnd as a factor with
as_factor.
ess11_ireland <- ess11_ireland %>%
mutate(rlgatnd_factor = as_factor(rlgatnd) %>%
fct_recode(NULL = "Refusal", NULL = "Don't know", NULL = "No answer", NULL = "Not applicable"))Now, we can supply this to the regresion command, and observe what happens with the results.
model3 <- lm(formula = lrscale ~ eduyrs + nwspol + rlgatnd_factor + iphlppla,
data = ess11_ireland)
summary(model3)##
## Call:
## lm(formula = lrscale ~ eduyrs + nwspol + rlgatnd_factor + iphlppla,
## data = ess11_ireland)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0240 -0.8865 0.1001 0.8461 5.8155
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.7853799 0.3943382 14.671 < 2e-16
## eduyrs -0.0152520 0.0119738 -1.274 0.202918
## nwspol 0.0005167 0.0004560 1.133 0.257294
## rlgatnd_factorMore than once a week -0.8890253 0.3896850 -2.281 0.022649
## rlgatnd_factorOnce a week -0.4371420 0.3395978 -1.287 0.198189
## rlgatnd_factorAt least once a month -1.0005650 0.3496561 -2.862 0.004268
## rlgatnd_factorOnly on special holy days -0.8911214 0.3443221 -2.588 0.009735
## rlgatnd_factorLess often -1.1756930 0.3391098 -3.467 0.000540
## rlgatnd_factorNever -1.5294429 0.3396458 -4.503 7.16e-06
## iphlppla 0.1563222 0.0466660 3.350 0.000827
##
## (Intercept) ***
## eduyrs
## nwspol
## rlgatnd_factorMore than once a week *
## rlgatnd_factorOnce a week
## rlgatnd_factorAt least once a month **
## rlgatnd_factorOnly on special holy days **
## rlgatnd_factorLess often ***
## rlgatnd_factorNever ***
## iphlppla ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.916 on 1685 degrees of freedom
## (322 observations deleted due to missingness)
## Multiple R-squared: 0.05183, Adjusted R-squared: 0.04676
## F-statistic: 10.23 on 9 and 1685 DF, p-value: 1.737e-15
Now we can interpret the results, because we can see the unique effect of ‘being in’ each group, relative to the reference group of ‘Daily’. A note on how to interpret this. R will always designate the first category of the variable the reference category unless we tell it otherwise. So, the coefficient for ‘Never’ is interpreted as the difference in lrscale scores for those who Never attend relative to those who attend Daily. The effect on lrscale is -1.53 for those in the never group, a considerable leftward effect. Similarly, those who attend At least once a month exhibit a difference of -1 points on lrscale relative to those who attend Daily. The unit effect interpretation still applied, it is just interpreted now as the effect of category membership on dependent variable scores. In plain language, less frequent religious service attendance is associated with more leftist placement on the political continuum.
03.2 Visualising your results
Finally, though not necessarily, you might want to visualise your
results to help with interpretation. There are several ways to do this,
but the broom/tidy method is most straightforward. Taking our model
output from the above regression as assigned to the object
model3, we can tidy up the results and pass them to a plot
in ggplot.
## # A tibble: 10 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 5.78 0.394 14.7 0
## 2 eduyrs -0.015 0.012 -1.27 0.203
## 3 nwspol 0.001 0 1.13 0.257
## 4 rlgatnd_factorMore than once a week -0.889 0.39 -2.28 0.023
## 5 rlgatnd_factorOnce a week -0.437 0.34 -1.29 0.198
## 6 rlgatnd_factorAt least once a month -1.00 0.35 -2.86 0.004
## 7 rlgatnd_factorOnly on special holy days -0.891 0.344 -2.59 0.01
## 8 rlgatnd_factorLess often -1.18 0.339 -3.47 0.001
## 9 rlgatnd_factorNever -1.53 0.34 -4.50 0
## 10 iphlppla 0.156 0.047 3.35 0.001
ggplot(data = filter(out_model3, term != "(Intercept)")) +
geom_point(mapping = aes(x = estimate, y = reorder(term, estimate)))Don’t forget your earlier ggplot annotation code - we can make this a lot neater by adding labels and source details.
## # A tibble: 10 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 5.78 0.394 14.7 0
## 2 eduyrs -0.015 0.012 -1.27 0.203
## 3 nwspol 0.001 0 1.13 0.257
## 4 rlgatnd_factorMore than once a week -0.889 0.39 -2.28 0.023
## 5 rlgatnd_factorOnce a week -0.437 0.34 -1.29 0.198
## 6 rlgatnd_factorAt least once a month -1.00 0.35 -2.86 0.004
## 7 rlgatnd_factorOnly on special holy days -0.891 0.344 -2.59 0.01
## 8 rlgatnd_factorLess often -1.18 0.339 -3.47 0.001
## 9 rlgatnd_factorNever -1.53 0.34 -4.50 0
## 10 iphlppla 0.156 0.047 3.35 0.001
ggplot(data = filter(out_model3, term != "(Intercept)")) +
geom_point(mapping = aes(x = estimate, y = reorder(term, estimate))) +
labs(
x = "Coefficient",
y = "variable / factor level",
title = "Regression Output: DV = lrscale",
subtitle = "0 = Left, 10 = Right",
caption = "Source European Social Survey Round 11, 2023"
)03.3 Categorical dependent variables
Sometimes, you might want to use a dependent variable that is
non-numeric, either nominal or ordinal. This belongs to a different set
of techniques more broadly known as logistic regression. Logistic
regression is very common in the social sciences especially as so much
of what we want to measure or explain does not fit the conditions or
assumptions of ratio measurement. The easiest way to implement this
initially, is to do so with a binary dependent variable. This is a
variable that can only take on two categories. Let’s look at an example
with the vote variable. Inspecting the variable using the
code below, we see how it takes on three categories (Yes, No, and Not
eligible to vote).
## <labelled<double>[4]>: Voted last national election
## [1] 1 2 3 NA(a)
##
## Labels:
## value label
## 1 Yes
## 2 No
## 3 Not eligible to vote
## NA(a) Refusal
## NA(b) Don't know
## NA(c) No answer
##
## Yes No Not eligible to vote
## 75.2 17.4 7.1
## Refusal Don't know No answer
## 0.1 0.2 0.0
What if we wanted to model voting as our dependent variable, with
gender, education, and political orientation as our independent
variables? First, we need to do some recoding to get the
vote variable into two categories.
vote_factor <- to_factor(ess11_ireland$vote)
ess11_ireland$vote_binary <- factor(
ifelse(vote_factor == "Yes", "Yes", "No"),
levels = c("No", "Yes")
)
table(ess11_ireland$vote_binary)##
## No Yes
## 493 1517
What have we done here? In the ifelseline we have
identified, first of all our key category of ‘Yes’, and that the binary
will be composed of those answering ‘Yes’, with others assigned to ‘No’.
The two levels of the variable will be defined as ‘No’ and ‘Yes’
respectively. Looking at this another way, and with a more complex
variable (marital status), we apply the same logic - identifying the key
category (the ‘thing we want to explain’ - in this case, being single),
and designate this as 1, and all others as 0. For marital status, we
could do it like this.
maritalb_factor <- to_factor(ess11_ireland$maritalb)
ess11_ireland$never_married <- factor(
ifelse(maritalb_factor == "None of these (NEVER married or in legally registered civil union)",
"Never Married/Registered",
"Ever Married/Registered"),
levels = c("Ever Married/Registered", "Never Married/Registered")
)
table(ess11_ireland$never_married, useNA = "ifany")##
## Ever Married/Registered Never Married/Registered <NA>
## 1383 611 23
Now, if we want to model this, we can use the logistic regression procedure to do so. The interpretation is slightly different here, as the coefficients must be interpreted as the effect of a unit change in the independent variable on the
model4 <- glm(vote_binary ~ trstplc + eduyrs + rlgatnd_factor,
data = ess11_ireland,
family = binomial)
tidy(model4, exponentiate = TRUE, conf.int = TRUE)## # A tibble: 9 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.43 0.586 2.10 3.56e-2 1.19 12.5
## 2 trstplc 1.12 0.0222 4.94 7.81e-7 1.07 1.17
## 3 eduyrs 1.01 0.0141 0.802 4.22e-1 0.984 1.04
## 4 rlgatnd_factorMore th… 0.422 0.589 -1.46 1.44e-1 0.116 1.23
## 5 rlgatnd_factorOnce a … 0.696 0.549 -0.660 5.09e-1 0.202 1.84
## 6 rlgatnd_factorAt leas… 0.576 0.557 -0.991 3.22e-1 0.165 1.55
## 7 rlgatnd_factorOnly on… 0.331 0.545 -2.03 4.26e-2 0.0964 0.865
## 8 rlgatnd_factorLess of… 0.362 0.543 -1.87 6.10e-2 0.106 0.939
## 9 rlgatnd_factorNever 0.241 0.540 -2.63 8.52e-3 0.0708 0.623
What we have here is a table reporting the results of the model in
odds ratios, given by the line of code
tidy(model, exponentiate = TRUE, conf.int = TRUE), calling
on the broom package that we have used already to tidy up
our regression output. The odds ratios can be interpreted using the same
unit interpretation, only this time the effect is not read in terms of
the score on the dependent variable, but on the odds of being in the
‘Yes’ category. In other words, each line in the term column above shows
the effect of each variable or category on the odds of voting. So, for
trstplc, the odds ratio is 1.116 (with rounding).
Therefore, a one-point or unit increase in trust in the police is
associated with an 11.6% increase in the odds of voting. For religious
attendance, particularly the rlgatnd_factorNever group, the
odds ratio is 0.241. Whenever we have an odds ratio less than 1, we
interpret this as ‘negative’, since the odds ratios in this form can be
read as differences between proportions in a table. Taking 1-OR gives us
.759, and we interpret the odds of voting for those in the ‘Never’ group
as having 75.9% lower odds of voting than those who attend daily. You
can also map the resulting output using the same mapping code as we did
for the linear regression above.
## # A tibble: 9 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1.23 0.586 2.10 0.036
## 2 trstplc 0.11 0.022 4.94 0
## 3 eduyrs 0.011 0.014 0.802 0.422
## 4 rlgatnd_factorMore than once a week -0.862 0.589 -1.46 0.144
## 5 rlgatnd_factorOnce a week -0.362 0.549 -0.66 0.509
## 6 rlgatnd_factorAt least once a month -0.552 0.557 -0.991 0.322
## 7 rlgatnd_factorOnly on special holy days -1.11 0.545 -2.03 0.043
## 8 rlgatnd_factorLess often -1.02 0.543 -1.87 0.061
## 9 rlgatnd_factorNever -1.42 0.54 -2.63 0.009
ggplot(data = filter(out_model4, term != "(Intercept)")) +
geom_point(mapping = aes(x = estimate, y = reorder(term, estimate))) +
labs(
x = "Odds ratio",
y = "variable / factor level",
title = "Logistic Regression Output: DV = Vote",
subtitle = "1 = Voted, 0 = Did not vote",
caption = "Source European Social Survey Round 11, 2023"
)04. Saving Output
Want to save something? If you have an image you want to save, you can save it manually by going to the plot pane (the lower-right quadrant of RStudio) and clicking Export then Save as Image as shown on the image below. Give it a file name, and it will save to your working directory once you hit save.
Saving Your Graph
If you are happy with your summary statistics table, and want to save that, then you can also use the export option as shown below. You have two options here. By selecting ‘Save as Image…’ you can change the resolution and file type. By selecting ‘Copy to Clipboard’ you can then go straight to a word document and paste it in. Either method is fine.
Saving Your Tables