2024-11-16

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Warning: package 'plotly' was built under R version 4.4.2
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
## Warning: package 'animation' was built under R version 4.4.2

Statistic Topic: Standard Deviation

Definition: average amount of variability within a data set

What does this mean?

  • How far are values from the mean?
  • Normal Distribution : data is symmetrically distributed about the mean
  • Bell shaped curve, even tapering on either side of the apex

Calculating Standard Deviation for Population

Population Standard Deviation - Understanding terms \[ \sigma = \text{Standard Deviation} \\ \mu = \text{Population Mean} \\ \bar {x} = \text {Sample Mean} \\ X= \text{Value} \\ N = \text{Total Number of Values of the Population} \\ n = \text{Number of Values in sample} \]

Equation: \[ \sigma= \frac {\sum(X-\mu)^2}{N} \]

Calculating Standard Deviation for Samples

Sample Standard Deviation - Understand terms \[ s = \text{Sample Standard Deviation} \\ \bar {x} = \text {Sample Mean} \\ X= \text{Value} \\ n = \text{Number of Values in sample} \]

Equaiton: \[ s = \sqrt{\frac {\sum(X-\bar{x})^2}{n-1}} \]

Generic Standard Deviation Graph

Let’s use some Data

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100  
##  1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200  
##  Median :5.000   Median :3.400   Median :1.500   Median :0.200  
##  Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246  
##  3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300  
##  Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600  
##        Species  
##  setosa    :50  
##  versicolor: 0  
##  virginica : 0  
##                 
##                 
## 

Calculating SD for a Sample

##    n
## 1 50
##   mean.setosadata.Petal.Length.
## 1                         1.462
## [1] 0.173664

Sample Equation:

\[ s = \sqrt{\frac {\sum(X-\bar{x})^2}{n-1}} \\ = \sqrt{\frac {\sum(X-1.462)^2}{50-1}} \\ s = 0.173664 \]

Visualizing SD

Importance of SD

  • Understanding distribution spread
  • Probability within data set range
  • Accessing reliability of data

Population vs Samples

Population vs Sample Continued

Last notes: Visualizing the Bigger Picture

  • Changes in population diversity over time
  • Distribution of specific characteristics across related species
  • Outliers