This document outlines the assessed homework assignments you need to complete for EC3133.
To complete the assessed homework assignments, you will have two
options to choose from. You only need to choose one of these
alternatives. Each option involves exactly the same amount of work. The
only difference is which dataset you use. For Option 1, you will be
using the
Crime Survey for England and Wales, 2017-2018 Teaching Dataset.
For Option 2, you will be using the
Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021.
Choose the data that is related to topics you are most interested in.
This document includes information about Option 2. See the Moodle page
for information on Option 2.
Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021
is an open dataset which means the data are available to download
without the need to register with the UK Data Service. The purpose of
this assignment is to get you started with analyzing data and exploring
potential research questions. The Covid-19 dataset contains two
files:
Cross-sectional: contains data collected in Wave 4 in July 2020 (with some additional variables from other waves)
Longitudinal: Contains mainly data from Waves 1, 4 and 9 with key variables measured at three time points.
This document first outlines the steps that you need to take in order to familiarize yourself with the dataset and with using R. These are Preparation Steps 1-6 and they include links to the data file as well as links to user guides and information documents that provide details about the variables in the dataset.
After Steps 1-6, you can find information on each assessed homework assignment for this class i.e. Graded Homework 1A, 1B, 2A and 2B. The submission deadlines are also included here, but you can also see the same information in the section titled “Assessments and Deadlines” on the Moodle page for EC3133.
In order to practice some empirical analysis tools in R, a good reference is Data Skills Module: Exploring crime surveys with R. Especially Units 4-7 are useful for getting practice with exploring and visualizing variables. If you are a beginner in R, you should also go over Unit 3. It is not mandatory that you go through this skills module and it is only provided as a reference, but is strongly advised that you at least look through it (the data skills module by UKDS is based on the crime survey, not the Covid-19 survey but it is still relevant for your homework assignments even if you opt for Option 2 because it introduces general tools for empirical analysis and these tools apply the same way to the Covid-19 data).
Each graded homework assignment accounts for 10 percent of your total course mark. This document also includes some information about your project assignment which is due at the end of the term and the project assignment accounts for 20 percent of our total mark. Each homework is designed to prepare you for your next homework, and ultimately, for the project.
Creating an R Studio Project sets a working directory and keeps all
scripts and data in one place. When the .Rproj file is
open, the current working directory points to the root folder where that
.Rproj file is saved. This means that R Studio will find
all the data and other objects once you open the project file even if
you move the project folder.
To create a .Rproj file, go to
File > New Project> New Directory> New Project,
then choose a directory name and your desired location. For this
example, let’s say that we want the location to be
“/users/myname/ec3133” (if you are Windows user, then these directories
would be c:\... , for example). Let’s name this project
“covid”. When you create a project file with this name using these
steps, it will automatically create a new directory
/users/myname/ec3133/covid.
Projects are very effective tools for keeping all your related files
in one place. Moreover, while you are working on something, a project
file enables you to have continuity i.e. if you turn off your computer
and stop working on the project, the next day you can always pick up
exactly where you left off by simply opening your project file. It will
bring up the same tabs and files as you left them, similar to the
restore tabs function of a browser.
For more details on setting working directories, please read the
section titled Setting a working directory in the
Data Skills Module: Exploring crime surveys with R. You can
click here or on the title of the data skills module to access this
page.
Understanding Society Teaching Dataset - User Guide.
The Covid-19 dataset comes in two files. One is a cross-sectional
data file and is titled c19teaching_xw.sav and the other is
a longitudinal data file titled c19teaching_lw.sav. You
need to download both files. Note that this does not mean there is more
work to do. The only difference with respect to the Crime Survey is that
it just turns out that the Covid-19 dataset splits information in two
separate files. Click on the below links to download these files:
Cross-sectional data file: c19teaching_xw.sav
Longitudinal data file: c19teaching_lw.sav
Save these data files in your project directory i.e. in “/users/myname/ec3133/covid/”.
(Added Note: If the above data links do NOT work for you, you can
also directly download the dataset from the Moodle page. Please go to
“Download Data for Option 2” in the Assessment Information
section.)
Note that these data files are obtained from the UK Data Service webpage and I am including then here so you can easily access it. However, you could also directly download them from the UK Data Service webpage.
# Load the data file and save it into an object called "covidcross" and "covidlong"
covidcross<-read_sav("c19teaching_xw.sav")
covidlong<-read_sav("c19teaching_lw.sav")
Note that you do not need to include the full file path here since
your code will automatically look for your files in your project
directory /users/myname/ec3133/covid.
The first step of visualizing a dataset is to simply look at it! Just scroll through the rows and columns to get a good idea about the structure of the dataset and what kinds of variables are included in it.
#View(covidcross)
#View(covidlong)
We can summarize a particular variable of interest by first preparing
a table of summary statistics for that variable. You can do this by
using the table command as below:
table(covidlong$ca_sclfsato_cv)
##
## -9 -8 -2 -1 1 2 3 4 5 6 7
## 1 15 8 4 151 573 1056 808 1272 3578 543
The variable ca_sclfsato_cv is respondent’s subjective
well-being from Wave 1 (conducted in April 2020). Note that variables in
the longitudinal file are labelled with a, d,
i to indicate what wave of the survey they come from. For
example, the subjective well-being variable from Wave 4 (conducted in
July 2020) is labelled cd_sclfsato_cv and the same variable
from Wave 9 (conducted in September 2021) is labelled with
ci_sclfsato_cv. Same respondents are asked the same
question about how they feel about their well-being in April 2020, in
July 2020 and September 2021. This longitudinal structure of the dataset
allows researchers to look at how various aspects of people’s economic
and social lives have changed over time (in this case this panel aspect
allows researchers to make all kinds of comparisons between the period
before and after the Covid-19 outbreak.
Note that the above table is not very informative because we don’t know what those numbers mean! To make it more readable (or to understand what the table means), we can do the following:
# Create a new factor variable from the original variable.
covidlong$ca_sclfsato_cvf<-as_factor(covidlong$ca_sclfsato_cv)
Now if we look at the same table again (but with the newly generated variable “ca_sclfsato_cvf”, we will see that we have added the information about what each value of the variable refers to:
table(covidlong$ca_sclfsato_cvf)
##
## Missing Inapplicable
## 1 15
## Refusal Don't know
## 8 4
## Completely dissatisfied Mostly dissatisfied
## 151 573
## Somewhat dissatisfied Neither satisfied nor dissatisfied
## 1056 808
## Somewhat satisfied Mostly satisfied
## 1272 3578
## Completely satisfied
## 543
For this homework assignment, we will focus on subjective well-being variables i.e. a respondent’s self-report about their mental health and well-being. This is a continuous variable (note: how you analyze a variable is strongly determined by the kind of values that variable takes (i.e. whether it is continuous, ordinal, nominal, etc.). Always take note of this before you begin your analysis and before you decide what tools are appropriate for analyzing a variable. See the user guide and the list of variables (links provided above) for information on the type of each variable in this dataset.
Measure the strength of the association between the dependent/outcome variable subjective well-being and some explanatory variables by doing the following in R:
Compare the mean values of subjective well-being
ca_sclfsato_cv between categorical variables
sex and ethgrp2a. In other words, compare the
mean scores of ca_sclfsato_cv between men and women. Also,
compare the mean scores of ca_sclfsato_cv between people of
different ethnicity groups as specified by ethgrp2a.Using
an appropriate test (e.g. t-test, ANOVA) measure whether these
differences are statistically significant. Write 1-2 sentences about how
we can interpret these results. For example, is there a difference in
men and women’s subjective well-being? Or is there a difference in
subjective well-being between people of different ethnicity? If there
are, why do you think such differences exist? If there are no
differences, what does this mean?
Plot how respondents’ perception of their well-being has changed
over time. You can use any plot that you deem appropriate. Note again
that the time dimension of the variables are indicated by the
b, d and i suffixes in the
variable titles.
Use multiple linear regressions to assess how your dependent/outcome variable for subjective well-being is associated with income and education, controlling for age, sex and other socio-demographic characteristics. For these regressions, you can simply use one of the waves (e.g. either just Wave 1, or just Wave 4 or Wave 9 etc.). Alternatively, you can pool all the waves together.
Important!!! When analyzing the data, make sure that you pay
attention to missing values for variables. For example, some variables
have values -9 or NA (not applicable). You
need to reassign appropriate values to these missing numbers before
preparing a graph or running a regression. There is information on how
to handle missing values in the Data Skills
Module: Exploring crime surveys with R. Please make sure that you
look through this information before you begin your analysis.
Continue exploring other variables in the dataset. Follow similar steps as outlined in GRADED HOMEWORK 1A. In other words, choose some outcome variables that interest you and then think about how it is related to potential explanatory variables. Create comparisons of mean scores, scatter plots or regression results that demonstrate these relationships. You do not need to use all of these tools at once. The purpose of this exercise is for you to explore various aspects of the dataset in order to find an interesting research question. If you are unsure about where to start, use your analysis in Graded Homework 1A as a guide: Simply do exactly what you did in parts a through c but with some different outcome and explanatory variables.
Note that this homework is simply asking you to explore the data on your own and to show me some of your findings. It is perfectly ok to look at different variables at different parts of your analysis. Your analysis does not need to be coherent nor does it need to have one specific research question in mind. The purpose of this exercise is for you to become more familiar with the data so that you can choose some potential research questions that can be answered using this dataset.
For this homework, you will again need to submit your R markdown file, which will include your R code together with the output.
In class, you will do a 5-minute presentation about your research question, why you chose it and what variables you will use in the dataset you chose. The presentation can include one or more graphs and/or tables from your data, but this is not mandatory. In the presentation, you should briefly talk about the existing literature on the subject. You do not need to read all the papers on the subject but it will suffice if you give a brief summary of what’s been done on this subject so far.
In class, you will do a 5-minute presentation of your empirical findings. This will just be a summary of what you have analyzed in the data so far.
This will be 20 percent of your total mark. Your project is a cumulative exercise, where the assessed homework assignments throughout the semester allow you to get prepared for your final project submission. I will post a separate document that includes information about what your project submission should include. In summary, it will be another R Markdown file that includes your data analysis, your research question, a brief literature review and a brief discussion of your empirical findings.
All homework submissions will need to be in the form of a R markdown file. This will allow you to share your text, code and output with me in a single document and is an effective way to communicate your data analysis results.
In the YAML header of your .Rmd document, make sure you
write your name and date of submission and replace the output with
output: pdf_document instead of
output: html_document. This allows you to render the Rmd
file as a pdf document instead of html. For the first homework, write
Graded Homework 1A in the title.
If you are a beginner with creating markdown files, please read Tutorial for Creating R
Markdown files. I have also posted the source file of all my slides
and this current file to help you see examples of R markdown documents.
These are all the .Rmd files that create the slides and documents I have
been posting. If you open these .Rmd files in R, it will give you a good
idea about how to get started writing .Rmd documents. Once you write
them, all you need to do is click on Knit or
Run to render them as pdf, html files or interactive
documents.
Citation of the data:
Proper citation is an important part of academic research. All works
which use various sources need to acknowledge these sources by means of
citation. The citation for this Covid-19 survey data collection is
included below. Please make sure you include a References
section at the end your report and include the below citation.
University of Essex, Institute for Social and Economic Research, University of Manchester, Cathie Marsh Institute for Social Research (CMIST). (2022). Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021. [data collection]. University of Essex, Institute for Social and Economic Research, [original data producer(s)]. University of Essex, Institute for Social and Economic Research. SN: 9019, DOI: 10.5255/UKDA-SN-9019-1