##Videos I’ve used for this: https://www.youtube.com/watch?v=UrOXRvG9oYE https://www.youtube.com/watch?v=BPlmjp2ymxw https://www.youtube.com/watch?v=v1uUgTcInQk https://www.youtube.com/watch?v=CEVELIz4WXM
##other references https://www.zoology.ubc.ca/~bio301/Bio301/Lectures/Lecture25/Overheads.html
What distribution are your data likely from?
Why do you think this?
Using the distribution that you’ve chosen, change the different parameters. What happens when you change the parameters for the distribution? Describe how the shape changes.
Can you simulate data that looks like what you expect your data looks like? What are the parameters for that?
Get into a group with a peer. Listen to them describe their response variable. What do you think the appropriate distribution would be? Can you simulate data that resembles what your partner has done?
I want to talk about a bunch of distributions, and then I want to simulate them as we go so people can see what I’m talking about.
I’m going to organize this by continuous vs discrete
height<-rnorm(1000, mean=170, sd=6.35)
hist(height)
plot(density(height))
pdf(height)
dev.off()
## png
## 2
exp<-rexp(1000, 0.5) ##time until death, where instantaneous death rate is lambda
hist(exp)
plot(density(exp))
gamma<-rgamma(1000, shape=100, scale=0.5) ##time until death, where instantaneous death rate is lambda
hist(gamma)
plot(density(gamma))
## Discrete Distributions
Bernoulli - the distribution of the number of successes on a single Bernoulli trial. For example survival at a single time point. 0 or 1, not a bunch of 0’s or 1’s (that would be binomial). If we toss a coin once, what’s the probability of it being heads? Special case of binomial, n=1
Binomial distribution is multiple Bernoulli trials - if a coin is tossed 20 times, what is the probability that heads comes up xx/20? How many patients in this ER have covid? What about in this classroom? Discrete space/time.
#### Eryn - note to yourself that the fun thing to do here would be to change the different variables for each simulation and see how they look difference in the plots!
trials<-0:100 ## 1000 Bernoulli trials
hist(rbinom(1000, 1, prob=0.5))
plot(density(rbinom(1000, 1, prob=0.5)))
hist(rbinom(100,6, 0.5))
plot(density(rbinom(100,6, 0.5)))
rgeom(100, 0.05) # this gives back the number of failures before the first successs in each of 10 trials. In our chick example, this could be how many kids each of 10 sets of parents need to have before they successfully recruit one.
## [1] 26 21 5 31 37 3 17 26 10 6 43 0 96 25 7 25 0 6 20 4 16 21 1 14 61
## [26] 3 12 17 26 49 3 14 28 8 12 4 30 9 13 26 6 55 10 19 12 8 22 35 16 0
## [51] 9 19 1 55 14 0 10 12 35 45 2 24 38 23 5 1 3 14 14 12 34 2 5 7 7
## [76] 45 28 5 3 45 34 25 9 0 2 3 20 56 13 43 4 1 22 24 47 10 13 9 4 16
hist(rgeom(100, 0.05))
plot(density(rgeom(100, 0.05)))
4) Poisson- The probability of a given number of events occuring in a
fixed time interval. Closely approximates the binomial if n is very
large and the probablity of success, p, is very small. Counts, like
binomial, but not assuming independence, and in continuous space/time.
Mean and the variance are the same (which is why when you know you have
overdispersed data, you’re encouraged to use the negative binomial.) The
Poisson only makes sense for count data.
hist(rpois(1000, 6.5))
plot(density(rpois(1000, 6.5)))
library(MASS)
MASS::rnegbin(100, mu=5, theta=10) # To carry our flycatcher example forward, negative binomial could be what is the number of singing males we see over a week (as opposed to alarm calling or otherwise not singing). n=# of sites we visit (or site-weeks), mu= mean, theta= measure of over dispersion. If mu = theta, you have poisson distribution (according to here: https://stats.stackexchange.com/questions/10419/what-is-theta-in-a-negative-binomial-regression-fitted-with-r)
## [1] 9 5 4 4 7 6 7 4 5 2 4 4 2 7 6 7 10 8 7 6 5 8 13 10 3
## [26] 14 1 2 3 8 4 9 9 13 6 7 2 5 5 1 4 4 7 5 3 10 4 8 4 4
## [51] 4 4 6 4 3 5 3 6 1 0 8 4 6 5 9 2 3 2 7 2 11 8 8 3 3
## [76] 5 4 3 3 2 8 17 5 3 5 2 4 4 5 4 5 6 6 0 1 10 1 2 8 4
hist(MASS::rnegbin(100, mu=5, theta=10))
plot(density(MASS::rnegbin(100, mu=5, theta=10)))
Binomial vs Negative Binomial Binomial - number of trials is fixed (n), number of successes is a random variable. Negative Binomial - number of successes is fixed (r), number of trials is the random variable (x)
## My chick example isn't working for me as well right now. But, something like distribution of admixed hybrids (from two populations) is well approximated by beta. So let's say we have two interbreeding populations, and we want to know what the proportion of species A is in each individual. Beta's great for that.
hist(rbeta(100, 5, 5))
plot(density(rbeta(100, 5, 5)))
#library(dirmult)
#library(ggplot2)
#library(Compositional)
#bivt.contour(rdirichlet(n=100, alpha=rep(1,3)) )
## example includes metabarcoding, rnaseq. Anything that must sum up to one.