Code
# Load your data files here.
# Replace the file paths with your actual file locations.
# Example:
# my_data <- read_csv("data/my_data.csv")State your research question clearly and specifically here. A good research question is answerable with the available data and has implications for action.
Example: Which early indicators best predict student withdrawal in distance learning modules?
Who is the audience for this analysis? What decision are they trying to make? Why does this question matter to them?
Example: This analysis is intended for the advising office at a distance learning institution. Understanding early withdrawal predictors would allow advisors to proactively reach out to at-risk students before they disengage.
Before looking at the data, state what you expect to find and why. This anchors your analysis in theory rather than data dredging.
Describe the dataset you are using. Where does it come from? What does it contain? What are its limitations?
Document every variable you plan to use. Explain what it measures, its data type, and any known issues or caveats.
| Variable | Description | Type | Notes |
|---|---|---|---|
How was the data collected? Is it anonymized? Are there any ethical concerns about how it was gathered or how your analysis might be used?
# Load your data files here.
# Replace the file paths with your actual file locations.
# Example:
# my_data <- read_csv("data/my_data.csv")# Check dimensions, column names, and data types before doing anything else.
# This catches problems early — wrong column names, unexpected data types,
# columns that should be numeric but read as character, etc.
# Example:
# glimpse(my_data)
# names(my_data)
# nrow(my_data)# If your data comes from multiple tables, join them here step by step.
# Document each join — why you used left_join, what the join keys are,
# and what you expect the row count to be after each step.
# Step 1 — Start with the central table
# Step 2 — Add ...
# Step 3 — Add ...# Handle missing values, recode variables, create derived variables.
# Document every decision — why you dropped NAs, what a recoded variable means,
# why you created a new variable.Explore the data before trying to answer the research question. Look at distributions, missingness, outliers, and relationships between variables. The goal is to understand what you have and refine your hypotheses.
# How much missing data is there? Which variables? Does missingness follow
# a pattern — e.g. are missing values concentrated in withdrawn students?# Plot the distribution of your key variables.
# Are they normally distributed? Skewed? Are there outliers?# Compare your outcome variable across groups.
# Example: average score by final result, VLE engagement by withdrawal status.# Examine relationships between variables before modeling.
# Are your predictors correlated with each other? With the outcome?Answer your research question here using appropriate methods. The method should follow from the question — not the other way around.
Describe the analytical approach you chose and why it is appropriate for your research question.
# Your main analysis goes here.# Present key results in a clean formatted table.
# gt() is a good option for polished tables.Translate your statistical findings into plain language. What do the numbers actually mean for your stakeholder? Avoid jargon — write as if explaining to a dean, not a statistician.
Every analysis has limitations. Be honest about yours. What can’t you conclude from this analysis? What alternative explanations exist? What data would you need to be more confident?
What should the stakeholder actually do based on your findings? Be specific and actionable.
Briefly summarize the research question, what you found, and why it matters. This section should stand alone — a busy stakeholder should be able to read just this section and understand the key takeaway.
# Always include session info in a reproducible document.
# It records the R version and package versions used so others can reproduce
# your work exactly.
sessionInfo()R version 4.4.3 (2025-02-28 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_1.3.0 gt_1.3.0 lubridate_1.9.4 forcats_1.0.0
[5] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
[9] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.2 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_1.9.1 compiler_4.4.3 tidyselect_1.2.1
[5] xml2_1.3.8 yaml_2.3.10 fastmap_1.2.0 R6_2.6.1
[9] generics_0.1.3 knitr_1.50 htmlwidgets_1.6.4 munsell_0.5.1
[13] pillar_1.11.0 tzdb_0.5.0 rlang_1.1.7 stringi_1.8.4
[17] xfun_0.51 fs_1.6.5 timechange_0.3.0 cli_3.6.4
[21] withr_3.0.2 magrittr_2.0.3 digest_0.6.37 grid_4.4.3
[25] rstudioapi_0.17.1 hms_1.1.3 lifecycle_1.0.4 vctrs_0.6.5
[29] evaluate_1.0.3 glue_1.8.0 colorspace_2.1-1 rmarkdown_2.29
[33] tools_4.4.3 pkgconfig_2.0.3 htmltools_0.5.8.1
Any supplementary material that supports the analysis but would interrupt the flow of the main document goes here.