library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
%>%library(SimDesign)
Anscombe’s quartet has four datasets that have nearly identical summary statistics.
library(Tmisc)
data(quartet)
quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x,y))
## # A tibble: 4 × 6
## set `mean(x)` `sd(x)` `mean(y)` `sd(y)` `cor(x, y)`
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 I 9 3.32 7.50 2.03 0.816
## 2 II 9 3.32 7.50 2.03 0.816
## 3 III 9 3.32 7.5 2.03 0.816
## 4 IV 9 3.32 7.50 2.03 0.817
The standard deviation can help us understand the spread of values in a dataset and show us how far each value is from the mean.
ggplot(quartet, aes(x,y)) + geom_point() + geom_smooth(method=lm, se=FALSE) + facet_wrap(~set)
## `geom_smooth()` using formula 'y ~ x'
search() shows attached datasets and packages
detach() remove attached datasets. Opposite of
attach()
library(datasauRus)
ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset)) + geom_point() +
theme_void() + theme(legend.position = "none") + facet_wrap(~dataset, ncol = 3)