14 The data analysis checklist
This checklist provides a condensed look at the information in this book. It can be used as a guide during the process of a data analysis, as a rubric for grading data analysis projects, or as a way to evaluate the quality of a reported data analysis.
14.1 Answering the question
- Did you specify the type of data analytic question (e.g. exploration, association causality) before touching the data?
- Did you define the metric for success before beginning?
- Did you understand the context for the question and the scientific or business application?
- Did you record the experimental design?
- Did you consider whether the question could be answered with the available data?
14.2 Checking the data
- Did you plot univariate and multivariate summaries of the data?
- Did you check for outliers?
- Did you identify the missing data code? The data analysis checklist 88
14.3 Tidying the data
- Is each variable one column?
- Is each observation one row?
- Do different data types appear in each table?
- Did you record the recipe for moving from raw to tidy data?
- Did you create a code book?
- Did you record all parameters, units, and functions applied to the data?
14.4 Exploratory analysis
- Did you identify missing values?
- Did you make univariate plots (histograms, density plots, boxplots)?
- Did you consider correlations between variables (scatterplots)?
- Did you check the units of all data points to make sure they are in the right range?
- Did you try to identify any errors or miscoding of variables?
- Did you consider plotting on a log scale?
- Would a scatterplot be more informative?
14.5 Inference
- Did you identify what large population you are trying to describe? The data analysis checklist 89
- Did you clearly identify the quantities of interest in your model?
- Did you consider potential confounders?
- Did you identify and model potential sources of correlation such as measurements over time or space?
- Did you calculate a measure of uncertainty for each estimate on the scientific scale?
14.6 Prediction
- Did you identify in advance your error measure?
- Did you immediately split your data into training and validation?
- Did you use cross validation, resampling, or bootstrapping only on the training data?
- Did you create features using only the training data?
- Did you estimate parameters only on the training data?
- Did you fix all features, parameters, and models before applying to the validation data?
- Did you apply only one final model to the validation data and report the error rate?
14.7 Causality
- Did you identify whether your study was randomized?
- Did you identify potential reasons that causality may not be appropriate such as confounders, missing data, non-ignorable dropout, or unblinded experiments?
- If not, did you avoid using language that would imply cause and effect? The data analysis checklist 90
14.8 Written analyses
- Did you describe the question of interest?
- Did you describe the data set, experimental design, and question you are answering?
- Did you specify the type of data analytic question you are answering?
- Did you specify in clear notation the exact model you are fitting?
- Did you explain on the scale of interest what each estimate and measure of uncertainty means?
- Did you report a measure of uncertainty for each estimate on the scientific scale?
14.10 Presentations
- Did you lead with a brief, understandable to everyone statement of your problem?
- Did you explain the data, measurement technology, and experimental design before you explained your model? The data analysis checklist 91
- Did you explain the features you will use to model data before you explain the model?
- Did you make sure all legends and axes were legible from the back of the room?
14.11 Reproducibility
- Did you avoid doing calculations manually?
- Did you create a script that reproduces all your analyses?
- Did you save the raw and processed versions of your data?
- Did you record all versions of the software you used to process the data?
- Did you try to have someone else run your analysis code to confirm they got the same answers?
14.12 R packages
- Did you make your package name “Googleable”
- Did you write unit tests for your functions?
- Did you write help files for all functions?
- Did you write a vignette?
- Did you try to reduce dependencies to actively maintained packages?
- Have you eliminated all errors and warnings from R CMD CHECK?
From book : Jeff Leek’s The Element of Data Analytic Style.
LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQoNCiMgMTQgVGhlIGRhdGEgYW5hbHlzaXMgY2hlY2tsaXN0DQpUaGlzIGNoZWNrbGlzdCBwcm92aWRlcyBhIGNvbmRlbnNlZCBsb29rIGF0IHRoZSBpbmZvcm1hdGlvbg0KaW4gdGhpcyBib29rLiBJdCBjYW4gYmUgdXNlZCBhcyBhIGd1aWRlIGR1cmluZyB0aGUgcHJvY2VzcyBvZiBhDQpkYXRhIGFuYWx5c2lzLCBhcyBhIHJ1YnJpYyBmb3IgZ3JhZGluZyBkYXRhIGFuYWx5c2lzIHByb2plY3RzLCBvcg0KYXMgYSB3YXkgdG8gZXZhbHVhdGUgdGhlIHF1YWxpdHkgb2YgYSByZXBvcnRlZCBkYXRhIGFuYWx5c2lzLg0KDQojIyAxNC4xIEFuc3dlcmluZyB0aGUgcXVlc3Rpb24NCjEuIERpZCB5b3Ugc3BlY2lmeSB0aGUgdHlwZSBvZiBkYXRhIGFuYWx5dGljIHF1ZXN0aW9uIChlLmcuDQpleHBsb3JhdGlvbiwgYXNzb2NpYXRpb24gY2F1c2FsaXR5KSBiZWZvcmUgdG91Y2hpbmcgdGhlDQpkYXRhPw0KMi4gRGlkIHlvdSBkZWZpbmUgdGhlIG1ldHJpYyBmb3Igc3VjY2VzcyBiZWZvcmUgYmVnaW5uaW5nPw0KMy4gRGlkIHlvdSB1bmRlcnN0YW5kIHRoZSBjb250ZXh0IGZvciB0aGUgcXVlc3Rpb24gYW5kDQp0aGUgc2NpZW50aWZpYyBvciBidXNpbmVzcyBhcHBsaWNhdGlvbj8NCjQuIERpZCB5b3UgcmVjb3JkIHRoZSBleHBlcmltZW50YWwgZGVzaWduPw0KNS4gRGlkIHlvdSBjb25zaWRlciB3aGV0aGVyIHRoZSBxdWVzdGlvbiBjb3VsZCBiZSBhbnN3ZXJlZA0Kd2l0aCB0aGUgYXZhaWxhYmxlIGRhdGE/DQoNCiMjIDE0LjIgQ2hlY2tpbmcgdGhlIGRhdGENCjEuIERpZCB5b3UgcGxvdCB1bml2YXJpYXRlIGFuZCBtdWx0aXZhcmlhdGUgc3VtbWFyaWVzIG9mDQp0aGUgZGF0YT8NCjIuIERpZCB5b3UgY2hlY2sgZm9yIG91dGxpZXJzPw0KMy4gRGlkIHlvdSBpZGVudGlmeSB0aGUgbWlzc2luZyBkYXRhIGNvZGU/DQpUaGUgZGF0YSBhbmFseXNpcyBjaGVja2xpc3QgODgNCg0KIyMgMTQuMyBUaWR5aW5nIHRoZSBkYXRhDQoxLiBJcyBlYWNoIHZhcmlhYmxlIG9uZSBjb2x1bW4/DQoyLiBJcyBlYWNoIG9ic2VydmF0aW9uIG9uZSByb3c/DQozLiBEbyBkaWZmZXJlbnQgZGF0YSB0eXBlcyBhcHBlYXIgaW4gZWFjaCB0YWJsZT8NCjQuIERpZCB5b3UgcmVjb3JkIHRoZSByZWNpcGUgZm9yIG1vdmluZyBmcm9tIHJhdyB0byB0aWR5DQpkYXRhPw0KNS4gRGlkIHlvdSBjcmVhdGUgYSBjb2RlIGJvb2s/DQo2LiBEaWQgeW91IHJlY29yZCBhbGwgcGFyYW1ldGVycywgdW5pdHMsIGFuZCBmdW5jdGlvbnMNCmFwcGxpZWQgdG8gdGhlIGRhdGE/DQoNCiMjIDE0LjQgRXhwbG9yYXRvcnkgYW5hbHlzaXMNCjEuIERpZCB5b3UgaWRlbnRpZnkgbWlzc2luZyB2YWx1ZXM/DQoyLiBEaWQgeW91IG1ha2UgdW5pdmFyaWF0ZSBwbG90cyAoaGlzdG9ncmFtcywgZGVuc2l0eQ0KcGxvdHMsIGJveHBsb3RzKT8NCjMuIERpZCB5b3UgY29uc2lkZXIgY29ycmVsYXRpb25zIGJldHdlZW4gdmFyaWFibGVzIChzY2F0dGVycGxvdHMpPw0KNC4gRGlkIHlvdSBjaGVjayB0aGUgdW5pdHMgb2YgYWxsIGRhdGEgcG9pbnRzIHRvIG1ha2Ugc3VyZQ0KdGhleSBhcmUgaW4gdGhlIHJpZ2h0IHJhbmdlPw0KNS4gRGlkIHlvdSB0cnkgdG8gaWRlbnRpZnkgYW55IGVycm9ycyBvciBtaXNjb2Rpbmcgb2YNCnZhcmlhYmxlcz8NCjYuIERpZCB5b3UgY29uc2lkZXIgcGxvdHRpbmcgb24gYSBsb2cgc2NhbGU/DQo3LiBXb3VsZCBhIHNjYXR0ZXJwbG90IGJlIG1vcmUgaW5mb3JtYXRpdmU/DQoNCiMjIDE0LjUgSW5mZXJlbmNlDQoxLiBEaWQgeW91IGlkZW50aWZ5IHdoYXQgbGFyZ2UgcG9wdWxhdGlvbiB5b3UgYXJlIHRyeWluZw0KdG8gZGVzY3JpYmU/DQpUaGUgZGF0YSBhbmFseXNpcyBjaGVja2xpc3QgODkNCjIuIERpZCB5b3UgY2xlYXJseSBpZGVudGlmeSB0aGUgcXVhbnRpdGllcyBvZiBpbnRlcmVzdCBpbg0KeW91ciBtb2RlbD8NCjMuIERpZCB5b3UgY29uc2lkZXIgcG90ZW50aWFsIGNvbmZvdW5kZXJzPw0KNC4gRGlkIHlvdSBpZGVudGlmeSBhbmQgbW9kZWwgcG90ZW50aWFsIHNvdXJjZXMgb2YgY29ycmVsYXRpb24NCnN1Y2ggYXMgbWVhc3VyZW1lbnRzIG92ZXIgdGltZSBvciBzcGFjZT8NCjUuIERpZCB5b3UgY2FsY3VsYXRlIGEgbWVhc3VyZSBvZiB1bmNlcnRhaW50eSBmb3IgZWFjaA0KZXN0aW1hdGUgb24gdGhlIHNjaWVudGlmaWMgc2NhbGU/DQoNCiMjIDE0LjYgUHJlZGljdGlvbg0KMS4gRGlkIHlvdSBpZGVudGlmeSBpbiBhZHZhbmNlIHlvdXIgZXJyb3IgbWVhc3VyZT8NCjIuIERpZCB5b3UgaW1tZWRpYXRlbHkgc3BsaXQgeW91ciBkYXRhIGludG8gdHJhaW5pbmcgYW5kDQp2YWxpZGF0aW9uPw0KMy4gRGlkIHlvdSB1c2UgY3Jvc3MgdmFsaWRhdGlvbiwgcmVzYW1wbGluZywgb3IgYm9vdHN0cmFwcGluZw0Kb25seSBvbiB0aGUgdHJhaW5pbmcgZGF0YT8NCjQuIERpZCB5b3UgY3JlYXRlIGZlYXR1cmVzIHVzaW5nIG9ubHkgdGhlIHRyYWluaW5nIGRhdGE/DQo1LiBEaWQgeW91IGVzdGltYXRlIHBhcmFtZXRlcnMgb25seSBvbiB0aGUgdHJhaW5pbmcgZGF0YT8NCjYuIERpZCB5b3UgZml4IGFsbCBmZWF0dXJlcywgcGFyYW1ldGVycywgYW5kIG1vZGVscyBiZWZvcmUNCmFwcGx5aW5nIHRvIHRoZSB2YWxpZGF0aW9uIGRhdGE/DQo3LiBEaWQgeW91IGFwcGx5IG9ubHkgb25lIGZpbmFsIG1vZGVsIHRvIHRoZSB2YWxpZGF0aW9uDQpkYXRhIGFuZCByZXBvcnQgdGhlIGVycm9yIHJhdGU/DQoNCiMjIDE0LjcgQ2F1c2FsaXR5DQoxLiBEaWQgeW91IGlkZW50aWZ5IHdoZXRoZXIgeW91ciBzdHVkeSB3YXMgcmFuZG9taXplZD8NCjIuIERpZCB5b3UgaWRlbnRpZnkgcG90ZW50aWFsIHJlYXNvbnMgdGhhdCBjYXVzYWxpdHkgbWF5DQpub3QgYmUgYXBwcm9wcmlhdGUgc3VjaCBhcyBjb25mb3VuZGVycywgbWlzc2luZyBkYXRhLA0Kbm9uLWlnbm9yYWJsZSBkcm9wb3V0LCBvciB1bmJsaW5kZWQgZXhwZXJpbWVudHM/DQozLiBJZiBub3QsIGRpZCB5b3UgYXZvaWQgdXNpbmcgbGFuZ3VhZ2UgdGhhdCB3b3VsZCBpbXBseQ0KY2F1c2UgYW5kIGVmZmVjdD8NClRoZSBkYXRhIGFuYWx5c2lzIGNoZWNrbGlzdCA5MA0KDQojIyAxNC44IFdyaXR0ZW4gYW5hbHlzZXMNCjEuIERpZCB5b3UgZGVzY3JpYmUgdGhlIHF1ZXN0aW9uIG9mIGludGVyZXN0Pw0KMi4gRGlkIHlvdSBkZXNjcmliZSB0aGUgZGF0YSBzZXQsIGV4cGVyaW1lbnRhbCBkZXNpZ24sIGFuZA0KcXVlc3Rpb24geW91IGFyZSBhbnN3ZXJpbmc/DQozLiBEaWQgeW91IHNwZWNpZnkgdGhlIHR5cGUgb2YgZGF0YSBhbmFseXRpYyBxdWVzdGlvbiB5b3UNCmFyZSBhbnN3ZXJpbmc/DQo0LiBEaWQgeW91IHNwZWNpZnkgaW4gY2xlYXIgbm90YXRpb24gdGhlIGV4YWN0IG1vZGVsIHlvdQ0KYXJlIGZpdHRpbmc/DQo1LiBEaWQgeW91IGV4cGxhaW4gb24gdGhlIHNjYWxlIG9mIGludGVyZXN0IHdoYXQgZWFjaA0KZXN0aW1hdGUgYW5kIG1lYXN1cmUgb2YgdW5jZXJ0YWludHkgbWVhbnM/DQo2LiBEaWQgeW91IHJlcG9ydCBhIG1lYXN1cmUgb2YgdW5jZXJ0YWludHkgZm9yIGVhY2ggZXN0aW1hdGUNCm9uIHRoZSBzY2llbnRpZmljIHNjYWxlPw0KDQojIyAxNC45IEZpZ3VyZXMNCjEuIERvZXMgZWFjaCBmaWd1cmUgY29tbXVuaWNhdGUgYW4gaW1wb3J0YW50IHBpZWNlIG9mDQppbmZvcm1hdGlvbiBvciBhZGRyZXNzIGEgcXVlc3Rpb24gb2YgaW50ZXJlc3Q/DQoyLiBEbyBhbGwgeW91ciBmaWd1cmVzIGluY2x1ZGUgcGxhaW4gbGFuZ3VhZ2UgYXhpcyBsYWJlbHM/DQozLiBJcyB0aGUgZm9udCBzaXplIGxhcmdlIGVub3VnaCB0byByZWFkPw0KNC4gRG9lcyBldmVyeSBmaWd1cmUgaGF2ZSBhIGRldGFpbGVkIGNhcHRpb24gdGhhdCBleHBsYWlucw0KYWxsIGF4ZXMsIGxlZ2VuZHMsIGFuZCB0cmVuZHMgaW4gdGhlIGZpZ3VyZT8NCg0KIyMgMTQuMTAgUHJlc2VudGF0aW9ucw0KMS4gRGlkIHlvdSBsZWFkIHdpdGggYSBicmllZiwgdW5kZXJzdGFuZGFibGUgdG8gZXZlcnlvbmUNCnN0YXRlbWVudCBvZiB5b3VyIHByb2JsZW0/DQoyLiBEaWQgeW91IGV4cGxhaW4gdGhlIGRhdGEsIG1lYXN1cmVtZW50IHRlY2hub2xvZ3ksIGFuZA0KZXhwZXJpbWVudGFsIGRlc2lnbiBiZWZvcmUgeW91IGV4cGxhaW5lZCB5b3VyIG1vZGVsPw0KVGhlIGRhdGEgYW5hbHlzaXMgY2hlY2tsaXN0IDkxDQozLiBEaWQgeW91IGV4cGxhaW4gdGhlIGZlYXR1cmVzIHlvdSB3aWxsIHVzZSB0byBtb2RlbCBkYXRhDQpiZWZvcmUgeW91IGV4cGxhaW4gdGhlIG1vZGVsPw0KNC4gRGlkIHlvdSBtYWtlIHN1cmUgYWxsIGxlZ2VuZHMgYW5kIGF4ZXMgd2VyZSBsZWdpYmxlDQpmcm9tIHRoZSBiYWNrIG9mIHRoZSByb29tPw0KDQojIyAxNC4xMSBSZXByb2R1Y2liaWxpdHkNCjEuIERpZCB5b3UgYXZvaWQgZG9pbmcgY2FsY3VsYXRpb25zIG1hbnVhbGx5Pw0KMi4gRGlkIHlvdSBjcmVhdGUgYSBzY3JpcHQgdGhhdCByZXByb2R1Y2VzIGFsbCB5b3VyIGFuYWx5c2VzPw0KMy4gRGlkIHlvdSBzYXZlIHRoZSByYXcgYW5kIHByb2Nlc3NlZCB2ZXJzaW9ucyBvZiB5b3VyDQpkYXRhPw0KNC4gRGlkIHlvdSByZWNvcmQgYWxsIHZlcnNpb25zIG9mIHRoZSBzb2Z0d2FyZSB5b3UgdXNlZCB0bw0KcHJvY2VzcyB0aGUgZGF0YT8NCjUuIERpZCB5b3UgdHJ5IHRvIGhhdmUgc29tZW9uZSBlbHNlIHJ1biB5b3VyIGFuYWx5c2lzDQpjb2RlIHRvIGNvbmZpcm0gdGhleSBnb3QgdGhlIHNhbWUgYW5zd2Vycz8NCg0KIyMgMTQuMTIgUiBwYWNrYWdlcw0KMS4gRGlkIHlvdSBtYWtlIHlvdXIgcGFja2FnZSBuYW1lIOKAnEdvb2dsZWFibGXigJ0NCjIuIERpZCB5b3Ugd3JpdGUgdW5pdCB0ZXN0cyBmb3IgeW91ciBmdW5jdGlvbnM/DQozLiBEaWQgeW91IHdyaXRlIGhlbHAgZmlsZXMgZm9yIGFsbCBmdW5jdGlvbnM/DQo0LiBEaWQgeW91IHdyaXRlIGEgdmlnbmV0dGU/DQo1LiBEaWQgeW91IHRyeSB0byByZWR1Y2UgZGVwZW5kZW5jaWVzIHRvIGFjdGl2ZWx5IG1haW50YWluZWQNCnBhY2thZ2VzPw0KNi4gSGF2ZSB5b3UgZWxpbWluYXRlZCBhbGwgZXJyb3JzIGFuZCB3YXJuaW5ncyBmcm9tIFINCkNNRCBDSEVDSz8NCg0KDQoNCkZyb20gYm9vayA6IEplZmYgTGVlaydzIFRoZSBFbGVtZW50IG9mIERhdGEgQW5hbHl0aWMgU3R5bGUuIA0KYGBge3J9DQojIFNlY3Rpb24NCmBgYA0KDQo=