R Notebook

Probability Distributions

There are basically two types of probability distributions:
Probability distribution for discrete random variables
Binomial probability distribution : binary outcomes

-Poisson probability distribution : count data

-Negative binomial probability distribution: similar to Poisson, but more robust

Probability distribution for continuous random variables

-Normal distribution : bell-curved

Checking the normal distribution of a continuous random variable

# Create a continuous random variable and simulate its normal distribution
x<-rnorm(1000,20,4)

hist(x,xlab="Values",main="Histogram",col=rainbow(x))

# Working with real world data

df<-ToothGrowth
# Checking the normal distribution using `hist` method
hist(df$len)

# Using `qqnorm` and `qqline` to check the normal distribution

qqnorm(df$len)
qqline(df$len)

# Using `Shapiro Wilk normality test` to check the normal distribution

## Making hypothesis: 

### Ho: data are normal distributed; Ha: data are not normal distributed
shapiro.test(df$len) # There is a strong evidence to suggest that data are not normal distributed

## 
##  Shapiro-Wilk normality test
## 
## data:  df$len
## W = 0.96743, p-value = 0.1091

Normal distribution is also known as standard normal distribution in a special case where mean (u)=0, sd=1. It is called z-distribution