The Normal distribution

This is a summary of my page: https://dataZ4s.com/statistics/normal-distribution-characteristics/

Characteristics of the normal distribution

  • Symmetric around its mean
  • Mean = median = mode
  • The area under the normal curve = 1.0.
  • The normal distribution has a high peak and light tails compared to the t-distribution
  • The normal distribution is defined by the mean (μ) and the variance (σ2).

As explained by the Empirical Rule, 68% of the area of the normal distribution is within one standard deviation of the mean and approximately 95% of the area is within two standard deviations of the mean:

Alt text

Alt text

The paramaters and estimators

\[\mu = \text{Population mean}\] \[\sigma^2=\text{Population variance}\] \[\sigma=\text{Population standard deviation}\] \[\bar x=\text{Sample mean}\] \[s^2=\text{Sample variance}\] \[s=\text{Sample standard deviation}\] \[SE=\text{Standard error}\]

The higher n, the “peakier” the curve

The higer n, the higher the proportion of data that center around the mean

The higer n, the higher the proportion of data that center around the mean

This can also be deducted from the formula of teh sample variance: \[s^2_{n-1}=\frac{\sum^n_{i=1} (x_i-\bar x)^2}{n-1}\]

The normal distribution vs the t-distribution

95% confidence interval examples

95% confidence interval examples

The critical value of the t-distribution is greater than the one of the normal distribution.

The normal distribution with R

Calculating probabilities, percentiles and taking random samples from a normally distributed variable.

Example I will follow the example of X being normally distributed with a mean of 65 and a standard deviation of 4:

\[X \sim N\big(\mu=65, \sigma^2 = 4^2\big)\]

pnorm

The pnorm command can be used to calculate probabilties for a normal random variable:

# P(X <= 60):
pnorm(q=60, mean = 65, sd = 4, lower.tail = T)
## [1] 0.1056498
# Can also be written:
pnorm(60,65,4)
## [1] 0.1056498
#P(X >= 75)
pnorm(75, 65, 4, F)
## [1] 0.006209665

pnorm can also be used to calculate Z, the standard normal

# P(Z >= 1)
pnorm(q=1.5, mean = 0, sd = 1, lower.tail = FALSE)
## [1] 0.0668072
pnorm(1.5,0,1,F)
## [1] 0.0668072

qnorm

The qnorm function can be used to calculate quantiles or percentiles for a normal random variable

# Find first quartile (Q1)
qnorm(p=0.25, mean=65, sd=4, lower.tail = T)
## [1] 62.30204

dnorm

the dnorm function can be used to find and/or plot the probability density function

# First, we create a sequence and assign this to x
x <- seq(from=50, to=80, by=0.25)
x
##   [1] 50.00 50.25 50.50 50.75 51.00 51.25 51.50 51.75 52.00 52.25 52.50
##  [12] 52.75 53.00 53.25 53.50 53.75 54.00 54.25 54.50 54.75 55.00 55.25
##  [23] 55.50 55.75 56.00 56.25 56.50 56.75 57.00 57.25 57.50 57.75 58.00
##  [34] 58.25 58.50 58.75 59.00 59.25 59.50 59.75 60.00 60.25 60.50 60.75
##  [45] 61.00 61.25 61.50 61.75 62.00 62.25 62.50 62.75 63.00 63.25 63.50
##  [56] 63.75 64.00 64.25 64.50 64.75 65.00 65.25 65.50 65.75 66.00 66.25
##  [67] 66.50 66.75 67.00 67.25 67.50 67.75 68.00 68.25 68.50 68.75 69.00
##  [78] 69.25 69.50 69.75 70.00 70.25 70.50 70.75 71.00 71.25 71.50 71.75
##  [89] 72.00 72.25 72.50 72.75 73.00 73.25 73.50 73.75 74.00 74.25 74.50
## [100] 74.75 75.00 75.25 75.50 75.75 76.00 76.25 76.50 76.75 77.00 77.25
## [111] 77.50 77.75 78.00 78.25 78.50 78.75 79.00 79.25 79.50 79.75 80.00
# Find the value of the probabililty density function for each of these x-values 
dens <- dnorm(x, mean=65, sd=4)

# Adding a vertical line at our mu. The abline
plot(x, dens, type = "l", main = "Normal dist for X: Mean=65, s=4)", xlab = "x", ylab = "Probability density",las=1) + abline(v=65)

## integer(0)

rnorm

The rnorm function can be used to draw a random sample from a normally distributed population

rand30 <- rnorm(n=30, mean=65, sd=4)
rand30
##  [1] 64.24202 63.90724 63.18491 69.22113 60.49885 65.74011 62.94402
##  [8] 54.97123 66.95889 64.87745 63.71434 62.96648 63.28726 66.32949
## [15] 64.39445 68.28887 62.09151 67.42206 62.19808 65.99918 58.14922
## [22] 67.31560 65.19037 58.80577 61.65911 62.14693 73.73459 62.77703
## [29] 53.21033 66.37164
hist(rand30)

Though the sample is taken from a normally distributed population, the sample might not look normally distributed, specially with small sample sizes like this.

Carsten Grube
Sharing and freelancing from my site: https://dataZ4s.com