Replication of `empirical sample 1’ by Cheema (2014, Journal of Modern Applied Statistical Methods)

Cheema 2014

Introduction

Cheema (2014) evaluates different treatment methods for missing data in surveys by comparing their effect on the accuracy of estimation. The paper conducts the analysis on: (1) Simulated dataset (n=10,000 cases) (2) Empirican sample 1: US portion of PISA 2003 (n=456 cases) (3) Empirical sample 2: Population and Housing portion of US Census (2000). The paper looks at the effect of the different methods if sample size is small, medium or large, and if 5% and 10% of data is missing.

In this project, I propose to reproduce Cheema 2014’s findings from the US portion of the PISA 2003 dataset (Empirical sample 1) with 5% missing data using five different imputation methods - Listwise deletion, Mean imputation, Regression imputation, EM impuation, and Multiple imputation . These are the results in Table 2.

Justification for choice of study

This paper is highly cited in context of handing missing data, but it hasn’t been replicated to my knowledge. This project would also give me an opportunity to work with large-scale assessment datasets like PISA and understand their underlying structure. This paper is relevant to my research interests as I plan to work on large education datasets in the context of international assessments. By replicating the results, I would learn some methods for handling missing data in my future work.

Anticipated challenges

Do you anticipate running into any challenges when attempting to reproduce these result(s)? If so please, list them here.

1. Student achivement variables used in Cheema (2014) Cheema (2014) conducts student level imputations for PISA (2003). The dependent variable is Math achievement, and the reading achievement is one of the explanatory variables. The paper mentions that both the achievement variables vary between 200 and 800.

However, PISA reports 5 plausible values for Math and Reading for each student, and advises that student level plausible values should not be averaged. It suggests that any estimation should be conducted for each of the plausible values, and the resultant estimates should be averaged. Each of these plausible values have a minimum in the range of 141-189, lower than the minimum reported by Cheema. Given the PISA dataset I have downloaded has the same number of observations as reported by Cheema, it is possible that the paper averaged the 5 plausible values. I will be able to determine this once I do the imputation.

To address this, I plan to: (1) First, imputation using average plausible values to see if Cheema (2014) can be replicated (2) If results are not replicated, conduct estimation using each plausible value and then take the average

2. Figuring out how to code the methods used in the paper Cheema (2014) first performs imputation for a simulated dataset, and then for PISA. In the simulated dataset, 5% of the dataset is randomly dropped 5 times, so that there is no dependencies. However, the PISA section does not describe the values being dropped 5 times. To address this, I plan to drop 5% of data only once; if this is not sufficient to regenerate the sample, I can drop values 5 times, perform the imputation and run the regression, and then average the results

Links

Link to the repo here

Link to the original paper here

Methods

Description of the steps required to reproduce the results

PISA (2003) dataset for the US was downloaded from the NCES website. SPSS macros provided on the NCES were run to make the data ready for analysis, and the relevant variables were retained (student achievement for math and reading, student gender, home education resources and math anxiety).

For analysis, math achievement was predicted using the other variables using a linear multiple regression equation. 5% of the data was randomly discarded. Each of the 5 imputation methods were applied to each of the reduced datasets, and then the multiple analysis was conducted:Listwise deletion, Mean imputation, Regression imputation, EM impuation, and Multiple imputation.

Differences from original study

Explicitly describe known differences in the analysis pipeline between the original paper and yours (e.g., computing environment). The goal, of course, is to minimize those differences, but differences may occur. Also, note whether such differences are anticipated to influence your ability to reproduce the original results.

Results

Data preparation

Key analysis

The analyses as specified in the analysis plan.

Side-by-side graph with original graph is ideal here

###Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Reproduction Attempt

Open the discussion section with a paragraph summarizing the primary result from the key analysis and assess whether you successfully reproduced it, partially reproduced it, or failed to reproduce it.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis of the dataset, (b) assessment of the meaning of the successful or unsuccessful reproducibility attempt - e.g., for a failure to reproduce the original findings, are the differences between original and present analyses ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the reproducibility attempt (if you contacted them). None of these need to be long.