Option 2: Covid-19 Survey

This document outlines the assessed homework assignments you need to complete for EC3133.

To complete the assessed homework assignments, you will have two options to choose from. You only need to choose one of these alternatives. Each option involves exactly the same amount of work. The only difference is which dataset you use. For Option 1, you will be using the Crime Survey for England and Wales, 2017-2018 Teaching Dataset. For Option 2, you will be using the Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021. Choose the data that is related to topics you are most interested in. This document includes information about Option 2. See the Moodle page for information on Option 2.

Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021 is an open dataset which means the data are available to download without the need to register with the UK Data Service. The purpose of this assignment is to get you started with analyzing data and exploring potential research questions. The Covid-19 dataset contains two files:

Cross-sectional: contains data collected in Wave 4 in July 2020 (with some additional variables from other waves)

Longitudinal: Contains mainly data from Waves 1, 4 and 9 with key variables measured at three time points.

This document first outlines the steps that you need to take in order to familiarize yourself with the dataset and with using R. These are Preparation Steps 1-6 and they include links to the data file as well as links to user guides and information documents that provide details about the variables in the dataset.

After Steps 1-6, you can find information on each assessed homework assignment for this class i.e. Graded Homework 1A, 1B, 2A and 2B. The submission deadlines are also included here, but you can also see the same information in the section titled “Assessments and Deadlines” on the Moodle page for EC3133.

In order to practice some empirical analysis tools in R, a good reference is Data Skills Module: Exploring crime surveys with R. Especially Units 4-7 are useful for getting practice with exploring and visualizing variables. If you are a beginner in R, you should also go over Unit 3. It is not mandatory that you go through this skills module and it is only provided as a reference, but is strongly advised that you at least look through it (the data skills module by UKDS is based on the crime survey, not the Covid-19 survey but it is still relevant for your homework assignments even if you opt for Option 2 because it introduces general tools for empirical analysis and these tools apply the same way to the Covid-19 data).

Each graded homework assignment accounts for 10 percent of your total course mark. This document also includes some information about your project assignment which is due at the end of the term and the project assignment accounts for 20 percent of our total mark. Each homework is designed to prepare you for your next homework, and ultimately, for the project.

Prep Step 1: Create a project file

Creating an R Studio Project sets a working directory and keeps all scripts and data in one place. When the .Rproj file is open, the current working directory points to the root folder where that .Rproj file is saved. This means that R Studio will find all the data and other objects once you open the project file even if you move the project folder.

To create a .Rproj file, go to File > New Project> New Directory> New Project, then choose a directory name and your desired location. For this example, let’s say that we want the location to be “/users/myname/ec3133” (if you are Windows user, then these directories would be c:\... , for example). Let’s name this project “covid”. When you create a project file with this name using these steps, it will automatically create a new directory /users/myname/ec3133/covid.

Projects are very effective tools for keeping all your related files in one place. Moreover, while you are working on something, a project file enables you to have continuity i.e. if you turn off your computer and stop working on the project, the next day you can always pick up exactly where you left off by simply opening your project file. It will bring up the same tabs and files as you left them, similar to the restore tabs function of a browser.

For more details on setting working directories, please read the section titled Setting a working directory in the Data Skills Module: Exploring crime surveys with R. You can click here or on the title of the data skills module to access this page.

Prep Step 2: Familiarize yourself with the data and the list of variables

The below user guide is a helpful resource if you are unsure about the meaning of a certain variable or if you want to read more about the data structure. You do not need to carefully read this user guide but you can use it as reference whenever you need it.

Understanding Society Teaching Dataset - User Guide.

The below file has information about all the variables included in this dataset. Look through it before you start loding the data and use it as reference whenever you want to check what the data entails.

Understanding Society Teaching Dataset - List of Variables.

Prep Step 3: Download the data file

The Covid-19 dataset comes in two files. One is a cross-sectional data file and is titled c19teaching_xw.sav and the other is a longitudinal data file titled c19teaching_lw.sav. You need to download both files. Note that this does not mean there is more work to do. The only difference with respect to the Crime Survey is that it just turns out that the Covid-19 dataset splits information in two separate files. Click on the below links to download these files:

Cross-sectional data file: c19teaching_xw.sav

Longitudinal data file: c19teaching_lw.sav

Save these data files in your project directory i.e. in “/users/myname/ec3133/covid/”.

(Added Note: If the above data links do NOT work for you, you can also directly download the dataset from the Moodle page. Please go to “Download Data for Option 2” in the Assessment Information section.)

Note that these data files are obtained from the UK Data Service webpage and I am including then here so you can easily access it. However, you could also directly download them from the UK Data Service webpage.

# Load the data file and save it into an object called "covidcross" and "covidlong"
covidcross<-read_sav("c19teaching_xw.sav") 
covidlong<-read_sav("c19teaching_lw.sav")

Note that you do not need to include the full file path here since your code will automatically look for your files in your project directory /users/myname/ec3133/covid.

Prep Step 5: View the data

The first step of visualizing a dataset is to simply look at it! Just scroll through the rows and columns to get a good idea about the structure of the dataset and what kinds of variables are included in it.

#View(covidcross)
#View(covidlong)

Prep Step 6: Summary statistics

We can summarize a particular variable of interest by first preparing a table of summary statistics for that variable. You can do this by using the table command as below:

table(covidlong$ca_sclfsato_cv)

## 
##   -9   -8   -2   -1    1    2    3    4    5    6    7 
##    1   15    8    4  151  573 1056  808 1272 3578  543

The variable ca_sclfsato_cv is respondent’s subjective well-being from Wave 1 (conducted in April 2020). Note that variables in the longitudinal file are labelled with a, d, i to indicate what wave of the survey they come from. For example, the subjective well-being variable from Wave 4 (conducted in July 2020) is labelled cd_sclfsato_cv and the same variable from Wave 9 (conducted in September 2021) is labelled with ci_sclfsato_cv. Same respondents are asked the same question about how they feel about their well-being in April 2020, in July 2020 and September 2021. This longitudinal structure of the dataset allows researchers to look at how various aspects of people’s economic and social lives have changed over time (in this case this panel aspect allows researchers to make all kinds of comparisons between the period before and after the Covid-19 outbreak.

Note that the above table is not very informative because we don’t know what those numbers mean! To make it more readable (or to understand what the table means), we can do the following:

# Create a new factor variable from the original variable.
covidlong$ca_sclfsato_cvf<-as_factor(covidlong$ca_sclfsato_cv)

Now if we look at the same table again (but with the newly generated variable “ca_sclfsato_cvf”, we will see that we have added the information about what each value of the variable refers to:

table(covidlong$ca_sclfsato_cvf)

## 
##                            Missing                       Inapplicable 
##                                  1                                 15 
##                            Refusal                         Don't know 
##                                  8                                  4 
##            Completely dissatisfied                Mostly dissatisfied 
##                                151                                573 
##              Somewhat dissatisfied Neither satisfied nor dissatisfied 
##                               1056                                808 
##                 Somewhat satisfied                   Mostly satisfied 
##                               1272                               3578 
##               Completely satisfied 
##                                543

Graded Homework 1A (Due October 31)

For this homework assignment, we will focus on subjective well-being variables i.e. a respondent’s self-report about their mental health and well-being. This is a continuous variable (note: how you analyze a variable is strongly determined by the kind of values that variable takes (i.e. whether it is continuous, ordinal, nominal, etc.). Always take note of this before you begin your analysis and before you decide what tools are appropriate for analyzing a variable. See the user guide and the list of variables (links provided above) for information on the type of each variable in this dataset.

Measure the strength of the association between the dependent/outcome variable subjective well-being and some explanatory variables by doing the following in R:

Compare the mean values of subjective well-being ca_sclfsato_cv between categorical variables sex and ethgrp2a. In other words, compare the mean scores of ca_sclfsato_cv between men and women. Also, compare the mean scores of ca_sclfsato_cv between people of different ethnicity groups as specified by ethgrp2a.Using an appropriate test (e.g. t-test, ANOVA) measure whether these differences are statistically significant. Write 1-2 sentences about how we can interpret these results. For example, is there a difference in men and women’s subjective well-being? Or is there a difference in subjective well-being between people of different ethnicity? If there are, why do you think such differences exist? If there are no differences, what does this mean?
Plot how respondents’ perception of their well-being has changed over time. You can use any plot that you deem appropriate. Note again that the time dimension of the variables are indicated by the b, d and i suffixes in the variable titles.
Use multiple linear regressions to assess how your dependent/outcome variable for subjective well-being is associated with income and education, controlling for age, sex and other socio-demographic characteristics. For these regressions, you can simply use one of the waves (e.g. either just Wave 1, or just Wave 4 or Wave 9 etc.). Alternatively, you can pool all the waves together.

Important!!! When analyzing the data, make sure that you pay attention to missing values for variables. For example, some variables have values -9 or NA (not applicable). You need to reassign appropriate values to these missing numbers before preparing a graph or running a regression. There is information on how to handle missing values in the Data Skills Module: Exploring crime surveys with R. Please make sure that you look through this information before you begin your analysis.

Graded Homework 1B (Due November 14)

Continue exploring other variables in the dataset. Follow similar steps as outlined in GRADED HOMEWORK 1A. In other words, choose some outcome variables that interest you and then think about how it is related to potential explanatory variables. Create comparisons of mean scores, scatter plots or regression results that demonstrate these relationships. You do not need to use all of these tools at once. The purpose of this exercise is for you to explore various aspects of the dataset in order to find an interesting research question. If you are unsure about where to start, use your analysis in Graded Homework 1A as a guide: Simply do exactly what you did in parts a through c but with some different outcome and explanatory variables.

Note that this homework is simply asking you to explore the data on your own and to show me some of your findings. It is perfectly ok to look at different variables at different parts of your analysis. Your analysis does not need to be coherent nor does it need to have one specific research question in mind. The purpose of this exercise is for you to become more familiar with the data so that you can choose some potential research questions that can be answered using this dataset.

For this homework, you will again need to submit your R markdown file, which will include your R code together with the output.

Graded Homework 2A (Due November 28)

In class, you will do a 5-minute presentation about your research question, why you chose it and what variables you will use in the dataset you chose. The presentation can include one or more graphs and/or tables from your data, but this is not mandatory. In the presentation, you should briefly talk about the existing literature on the subject. You do not need to read all the papers on the subject but it will suffice if you give a brief summary of what’s been done on this subject so far.

Graded Homework 2B (Due December 12)

In class, you will do a 5-minute presentation of your empirical findings. This will just be a summary of what you have analyzed in the data so far.

Project (Due December 19)

This will be 20 percent of your total mark. Your project is a cumulative exercise, where the assessed homework assignments throughout the semester allow you to get prepared for your final project submission. I will post a separate document that includes information about what your project submission should include. In summary, it will be another R Markdown file that includes your data analysis, your research question, a brief literature review and a brief discussion of your empirical findings.

How to submit homework assignments

All homework submissions will need to be in the form of a R markdown file. This will allow you to share your text, code and output with me in a single document and is an effective way to communicate your data analysis results.

In the YAML header of your .Rmd document, make sure you write your name and date of submission and replace the output with output: pdf_document instead of output: html_document. This allows you to render the Rmd file as a pdf document instead of html. For the first homework, write Graded Homework 1A in the title.

If you are a beginner with creating markdown files, please read Tutorial for Creating R Markdown files. I have also posted the source file of all my slides and this current file to help you see examples of R markdown documents. These are all the .Rmd files that create the slides and documents I have been posting. If you open these .Rmd files in R, it will give you a good idea about how to get started writing .Rmd documents. Once you write them, all you need to do is click on Knit or Run to render them as pdf, html files or interactive documents.

Citation of the data:

Proper citation is an important part of academic research. All works which use various sources need to acknowledge these sources by means of citation. The citation for this Covid-19 survey data collection is included below. Please make sure you include a References section at the end your report and include the below citation.

University of Essex, Institute for Social and Economic Research, University of Manchester, Cathie Marsh Institute for Social Research (CMIST). (2022). Understanding Society: COVID-19 Study Teaching Dataset, 2020-2021. [data collection]. University of Essex, Institute for Social and Economic Research, [original data producer(s)]. University of Essex, Institute for Social and Economic Research. SN: 9019, DOI: 10.5255/UKDA-SN-9019-1