Tools needed for Cleaning, Organizing and Transforming Data in R

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Cleaning Data Tools

  • rename()
  • rename_with()
  • glimpse()
  • select()
  • clean_names()
  • skim_without-charts()

Organising Data Tools

  • filter()
  • max()
  • mean()
  • summarize()
  • arrange()
  • group_by()
  • drop_na()

Transforming Data Tools

  • %>%
  • unite()
  • separate()
  • mutate()

Checking for Bias Tools

library(SimDesign)
  • bias(actual, predicted)

Summary statistics and visualization

Anscombe’s quartet has four datasets that have nearly identical summary statistics.

library(Tmisc)
data(quartet)
quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x,y))
## # A tibble: 4 × 6
##   set   `mean(x)` `sd(x)` `mean(y)` `sd(y)` `cor(x, y)`
##   <fct>     <dbl>   <dbl>     <dbl>   <dbl>       <dbl>
## 1 I             9    3.32      7.50    2.03       0.816
## 2 II            9    3.32      7.50    2.03       0.816
## 3 III           9    3.32      7.5     2.03       0.816
## 4 IV            9    3.32      7.50    2.03       0.817

The standard deviation can help us understand the spread of values in a dataset and show us how far each value is from the mean.

ggplot(quartet, aes(x,y)) + geom_point() + geom_smooth(method=lm, se=FALSE) + facet_wrap(~set)
## `geom_smooth()` using formula 'y ~ x'

search() shows attached datasets and packages

detach() remove attached datasets. Opposite of attach()

library(datasauRus)
ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset)) + geom_point() + 
                theme_void() + theme(legend.position = "none") + facet_wrap(~dataset, ncol = 3)