Purpose: A living notebook of short, reliable R
snippets I actually use. Keep it short. Keep it runnable. Update as I
learn.
1. Quick Setup
# Install-once (uncomment as needed)
# install.packages(c("tidyverse", "janitor", "lubridate", "readr", "readxl", "openxlsx",
# "skimr", "here", "fs", "glue", "knitr", "rmarkdown", "ggthemes",
# "patchwork", "gt", "gtExtras", "stringr", "forcats"))
# Load every session
suppressPackageStartupMessages({
library(tidyverse)
library(janitor)
library(lubridate)
library(here)
library(glue)
library(skimr)
library(gt)
})
# Reproducibility
set.seed(42)
Project tip: Use an RStudio Project and
here::here() for paths. Never hard‑code
"C:/Users/...".
2. Reading & Writing Data (fast + safe)
# CSV (robust defaults)
df_csv <- readr::read_csv(here("data", "my_data.csv"))
# Excel (first sheet)
df_xlsx <- readxl::read_excel(here("data", "my_data.xlsx"), sheet = 1)
# Write outputs with timestamps
out_path <- here("output", glue("cleaned_{format(Sys.Date(), '%Y%m%d')}.csv"))
# readr::write_csv(df_csv, out_path)
Gotcha: If you see encoding issues, try
locale = locale(encoding = "UTF-8") in
read_csv().
3. Inspecting Data (what is this?)
# High‑level skim
skimr::skim(df_csv)
# Structure & types
str(df_csv)
# Column names (clean + check)
names(df_csv)
janitor::compare_df_cols(df_csv)
Rule of thumb: If a column should be a date, convert
it immediately with lubridate.
4. Cleaning Columns & Rows
# Consistent names
clean <- df_csv %>%
janitor::clean_names() %>% # snake_case column names
mutate(across(where(is.character), trimws)) # trim leading/trailing spaces
# Remove complete duplicate rows
clean <- distinct(clean)
# Handle blanks as NA
clean <- mutate(clean, across(everything(), ~na_if(.x, "")))
Tip: Use distinct(.keep_all = TRUE) to
de‑dupe by subset of columns.
5. dplyr Cheatsheet (minimal set)
result <- clean %>%
filter(!is.na(id)) %>%
mutate(
date = lubridate::ymd(date),
category = forcats::fct_lump_n(as.factor(category), n = 5)
) %>%
group_by(category) %>%
summarize(
n = n(),
mean_val = mean(value, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(n))
Mnemonic:
Select–Filter–Mutate–Summarize–Arrange covers 80% of
wrangling.
6. Joins (I always mix these up)
# left_join: keep all rows from x, bring matches from y
joined <- df_csv %>% left_join(df_xlsx, by = "id")
# anti_join: rows in x with no match in y (great for QA)
missing_keys <- df_csv %>% anti_join(df_xlsx, by = "id")
QA trick: anti_join() first to see what
won’t match before any heavy processing.
7. Dates & Times (lubridate)
# Parse and standardize
clean_dates <- clean %>%
mutate(
date = ymd(date),
year = year(date),
month = month(date, label = TRUE, abbr = TRUE),
wk = isoweek(date)
)
Tip: If parsing fails, inspect with
parse_date_time(x, orders = c("ymd", "mdy", "dmy")).
8. Strings (stringr)
text_clean <- clean %>%
mutate(
email = str_to_lower(email),
domain = str_extract(email, "@.+$")
)
Regex sanity: Test patterns at https://regex101.com/
before committing.
9. Factors (forcats)
fac <- clean %>%
mutate(
status = fct_relevel(as.factor(status), c("new", "active", "inactive")),
top_cat = fct_lump_n(as.factor(category), n = 6)
)
Plotting tip: Relevel factors to control ggplot
ordering.
10. Plotting (ggplot2: minimal patterns)
# Bar (counts)
clean %>%
ggplot(aes(x = category)) +
geom_bar(fill = "#2E86AB") +
theme_minimal(base_size = 12) +
labs(title = "Counts by Category", x = NULL, y = "Count")
# Line (time series)
clean_dates %>%
group_by(date) %>% summarize(n = n(), .groups = "drop") %>%
ggplot(aes(date, n)) +
geom_line(color = "#7D3C98", linewidth = 0.9) +
theme_minimal(base_size = 12) +
labs(title = "Daily Counts", x = NULL, y = NULL)
Small multiples: Use
+ facet_wrap(~group) when categories are many.
11. Tables (gt quick pattern)
result %>%
gt::gt() %>%
gt::fmt_number(columns = where(is.numeric), decimals = 2) %>%
gt::tab_header(title = md("**Summary by Category**"))
Export: gtsave("table.png") or
gt::gtsave() to PNG/PDF/HTML.
12. Modeling (tidymodels tiny starter)
# install.packages("tidymodels") # once
# library(tidymodels)
# set.seed(42)
# split <- initial_split(clean, prop = 0.8)
# train <- training(split); test <- testing(split)
# rec <- recipe(target ~ ., data = train) %>% step_dummy(all_nominal(), -all_outcomes())
# mod <- linear_reg() %>% set_engine("lm")
# wf <- workflow() %>% add_model(mod) %>% add_recipe(rec)
# fit <- fit(wf, data = train)
# metrics <- predict(fit, test) %>% bind_cols(test) %>% metrics(truth = target, estimate = .pred)
Reality check: Always baseline with a simple model
(e.g., lm) before anything fancy.
13. Debugging & Safety Nets
- Common errors: missing packages, wrong column
names, bad joins, factor levels not set.
- Tactics:
rlang::last_error() to see context
dplyr::glimpse() before/after key steps
stopifnot() for assumptions (e.g., unique keys)
- Use
tryCatch() around fragile I/O
stopifnot(!anyDuplicated(clean$id)) # ids should be unique
14. Reproducible Paths & Projects
- Use RStudio Projects; root paths with
here::here().
- Keep folders:
data/, R/,
output/, figs/, docs/.
- Save session info with outputs.
sessionInfo()
15. Handy Snippets I Reuse
# Percent of total
percent_of_total <- function(x) round(100 * x / sum(x, na.rm = TRUE), 1)
# Not-in operator
`%nin%` <- function(x, y) !(x %in% y)
# Quietly run an expression
quietly <- purrr::quietly
16. Checklist Before You Ship
17. Appendix: swirl (learn by doing)
- Install once:
install.packages("swirl")
- Each session:
library(swirl); swirl()
- Navigate with:
skip(), play() →
nxt(), main(), info(),
bye()
18. Appendix: Keyboard Macros (RStudio)
- Run line/selection: Ctrl/Cmd + Enter
- Run all chunks above: Ctrl + Shift + P
(Windows/Linux) or Cmd + Option + P (macOS)
- Insert chunk: Ctrl + Alt + I / Cmd + Option
+ I
---
title: "Melanie’s R Grab‑Bag: Useful Code & Tips for Future Me"
author: "Melanie Holden"
date: "`r format(Sys.Date(), '%B %d, %Y')`"
output: html_notebook
---

> **Purpose:** A living notebook of short, reliable R snippets I actually use. Keep it short. Keep it runnable. Update as I learn.

# 1. Quick Setup

```{r setup, message=FALSE, warning=FALSE}
# Install-once (uncomment as needed)
# install.packages(c("tidyverse", "janitor", "lubridate", "readr", "readxl", "openxlsx",
#                    "skimr", "here", "fs", "glue", "knitr", "rmarkdown", "ggthemes",
#                    "patchwork", "gt", "gtExtras", "stringr", "forcats"))

# Load every session
suppressPackageStartupMessages({
  library(tidyverse)
  library(janitor)
  library(lubridate)
  library(here)
  library(glue)
  library(skimr)
  library(gt)
})

# Reproducibility
set.seed(42)
```

**Project tip:** Use an RStudio Project and `here::here()` for paths. Never hard‑code `"C:/Users/..."`.

# 2. Reading & Writing Data (fast + safe)

```{r io}
# CSV (robust defaults)
df_csv <- readr::read_csv(here("data", "my_data.csv"))

# Excel (first sheet)
df_xlsx <- readxl::read_excel(here("data", "my_data.xlsx"), sheet = 1)

# Write outputs with timestamps
out_path <- here("output", glue("cleaned_{format(Sys.Date(), '%Y%m%d')}.csv"))
# readr::write_csv(df_csv, out_path)
```

**Gotcha:** If you see encoding issues, try `locale = locale(encoding = "UTF-8")` in `read_csv()`.

# 3. Inspecting Data (what is this?)

```{r inspect}
# High‑level skim
skimr::skim(df_csv)

# Structure & types
str(df_csv)

# Column names (clean + check)
names(df_csv)
janitor::compare_df_cols(df_csv)
```

**Rule of thumb:** If a column should be a date, convert it *immediately* with `lubridate`.

# 4. Cleaning Columns & Rows

```{r cleaning}
# Consistent names
clean <- df_csv %>%
  janitor::clean_names() %>%                 # snake_case column names
  mutate(across(where(is.character), trimws)) # trim leading/trailing spaces

# Remove complete duplicate rows
clean <- distinct(clean)

# Handle blanks as NA
clean <- mutate(clean, across(everything(), ~na_if(.x, "")))
```

**Tip:** Use `distinct(.keep_all = TRUE)` to de‑dupe by subset of columns.

# 5. dplyr Cheatsheet (minimal set)

```{r dplyr}
result <- clean %>%
  filter(!is.na(id)) %>%
  mutate(
    date = lubridate::ymd(date),
    category = forcats::fct_lump_n(as.factor(category), n = 5)
  ) %>%
  group_by(category) %>%
  summarize(
    n = n(),
    mean_val = mean(value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(n))
```

**Mnemonic:** *Select–Filter–Mutate–Summarize–Arrange* covers 80% of wrangling.

# 6. Joins (I always mix these up)

```{r joins}
# left_join: keep all rows from x, bring matches from y
joined <- df_csv %>% left_join(df_xlsx, by = "id")

# anti_join: rows in x with no match in y (great for QA)
missing_keys <- df_csv %>% anti_join(df_xlsx, by = "id")
```

**QA trick:** `anti_join()` first to see what won’t match before any heavy processing.

# 7. Dates & Times (lubridate)

```{r dates}
# Parse and standardize
clean_dates <- clean %>%
  mutate(
    date = ymd(date),
    year = year(date),
    month = month(date, label = TRUE, abbr = TRUE),
    wk = isoweek(date)
  )
```

**Tip:** If parsing fails, inspect with `parse_date_time(x, orders = c("ymd", "mdy", "dmy"))`.

# 8. Strings (stringr)

```{r strings}
text_clean <- clean %>%
  mutate(
    email = str_to_lower(email),
    domain = str_extract(email, "@.+$")
  )
```

**Regex sanity:** Test patterns at <https://regex101.com/> before committing.

# 9. Factors (forcats)

```{r factors}
fac <- clean %>%
  mutate(
    status = fct_relevel(as.factor(status), c("new", "active", "inactive")),
    top_cat = fct_lump_n(as.factor(category), n = 6)
  )
```

**Plotting tip:** Relevel factors to control ggplot ordering.

# 10. Plotting (ggplot2: minimal patterns)

```{r plots, fig.width=7, fig.height=4}
# Bar (counts)
clean %>%
  ggplot(aes(x = category)) +
  geom_bar(fill = "#2E86AB") +
  theme_minimal(base_size = 12) +
  labs(title = "Counts by Category", x = NULL, y = "Count")

# Line (time series)
clean_dates %>%
  group_by(date) %>% summarize(n = n(), .groups = "drop") %>%
  ggplot(aes(date, n)) +
  geom_line(color = "#7D3C98", linewidth = 0.9) +
  theme_minimal(base_size = 12) +
  labs(title = "Daily Counts", x = NULL, y = NULL)
```

**Small multiples:** Use `+ facet_wrap(~group)` when categories are many.

# 11. Tables (gt quick pattern)

```{r gt}
result %>%
  gt::gt() %>%
  gt::fmt_number(columns = where(is.numeric), decimals = 2) %>%
  gt::tab_header(title = md("**Summary by Category**"))
```

**Export:** `gtsave("table.png")` or `gt::gtsave()` to PNG/PDF/HTML.

# 12. Modeling (tidymodels *tiny* starter)

```{r modeling, message=FALSE}
# install.packages("tidymodels")  # once
# library(tidymodels)
# set.seed(42)
# split <- initial_split(clean, prop = 0.8)
# train <- training(split); test <- testing(split)
# rec <- recipe(target ~ ., data = train) %>% step_dummy(all_nominal(), -all_outcomes())
# mod <- linear_reg() %>% set_engine("lm")
# wf  <- workflow() %>% add_model(mod) %>% add_recipe(rec)
# fit <- fit(wf, data = train)
# metrics <- predict(fit, test) %>% bind_cols(test) %>% metrics(truth = target, estimate = .pred)
```

**Reality check:** Always baseline with a simple model (e.g., `lm`) before anything fancy.

# 13. Debugging & Safety Nets

-   **Common errors:** missing packages, wrong column names, bad joins, factor levels not set.
-   **Tactics:**
    -   `rlang::last_error()` to see context
    -   `dplyr::glimpse()` before/after key steps
    -   `stopifnot()` for assumptions (e.g., unique keys)
    -   Use `tryCatch()` around fragile I/O

```{r guardrails}
stopifnot(!anyDuplicated(clean$id))  # ids should be unique
```

# 14. Reproducible Paths & Projects

-   Use RStudio Projects; root paths with `here::here()`.
-   Keep folders: `data/`, `R/`, `output/`, `figs/`, `docs/`.
-   Save session info with outputs.

```{r session-info}
sessionInfo()
```

# 15. Handy Snippets I Reuse

```{r snippets}
# Percent of total
percent_of_total <- function(x) round(100 * x / sum(x, na.rm = TRUE), 1)

# Not-in operator
`%nin%` <- function(x, y) !(x %in% y)

# Quietly run an expression
quietly <- purrr::quietly
```

# 16. Checklist Before You Ship

-   [ ] Column names are clean & consistent
-   [ ] Dates parsed and in correct timezone/format
-   [ ] Joins audited with `anti_join()`
-   [ ] NAs handled intentionally
-   [ ] Figures have titles, labels, units
-   [ ] Code chunks are deterministic (set seeds)
-   [ ] Save artifacts with versioned filenames

# 17. Appendix: swirl (learn by doing)

-   Install once: `install.packages("swirl")`
-   Each session: `library(swirl); swirl()`
-   Navigate with: `skip()`, `play()` → `nxt()`, `main()`, `info()`, `bye()`

# 18. Appendix: Keyboard Macros (RStudio)

-   Run line/selection: **Ctrl/Cmd + Enter**
-   Run all chunks above: **Ctrl + Shift + P** (Windows/Linux) or **Cmd + Option + P** (macOS)
-   Insert chunk: **Ctrl + Alt + I** / **Cmd + Option + I**

# 19. To Do / Parking Lot

-   [ ] Add a `targets` or `renv` section when projects grow
-   [ ] Add unit tests with `testthat` for key helpers
-   [ ] Add a style guide decision (lintr/styler)
