Discussion 3-distribution

Author

Allison Shrivastava

Please explain each of the 3 distributions in less than 4 sentences.

Normal: a bell curved distribution of continuous data that clusters around the mean, and the standard deviations are equal on both sides (symmetric distribution).

Binomial: distribution of binary data from a finite sample. This would give us, for example, a probability of getting X events out of a number of trials.

Poisson: distribution of binary data from an infinite sample, for example, how many mutations there are in a genome.

Explain what the pdf and cdf of a distribution measures. Pick any of the three distributions (or a distribution from the list above that we have not covered in class), and provide some intuition as to if the pdf formula makes sense or not.

PDF distribution (probability density function) can be used to measure the likelihood that a continuous random variable takes on a certain value. (for example, 80% of the population are between 5 and 6.5 feet tall. Because it measures the density of a continuous variable, its a good tool for normal distributions, describing what portion of the population falls under the bell curve.

CDF distribution determines the probability that a variable will fall below a given threshold, for example, 80% of the population with a given disease will be cured by a medication. This makes it a good tool for poisson and binomial distributions.

What are the key parameters that define the 3 distributions above (or a distribution from the list above)? Does R require these key parameters to be declared ? Type the “?distribution” command in R to find out.

The distributions require the parameters size (number of trials) and prob (probability of success). Both of these parameters are required in the R functions

```
?distribution
```
Give a few examples of situations that can be modeled with each of the 3 distributions above. You can try to read Chapter 1.3 Parametric Families of DistributionLinks to an external site.in Introduction to Statistical Thought by Michael Lavine recommended textbook.

Poisson distribution could be used to understand the number of customers visiting a store in a given time, or how many earthquakes there are in an area. An example of binomial distribution usages are a coin toss or quality control of a product (how many times a product will fail). An example of normal distribution are things like weight, height or test scores.
Plot the distribution in part B

## lets start with a binomial distribution
x<-20
n<-0:20
pr<-0.12  

plot(n ,dbinom(n,x,pr),
     type='h',
     main="Binomial Distribution",
     ylab="",
     xlab="",,
     lwd=5)

### now plot a probability mass function for a poisson distribution
plot(n, dpois(n, lambda = 5)
     , type='h',
               main="Poisson Distribution",
     ylab="",
     xlab="",
     lwd=5)

## lastly, a normal distribution

x2<-seq(-5,5,length=88)
y<-dnorm(x2)
plot(x2,y, type="l",lwd=5,
      main="Normal Distribution",
     xlab="",
     ylab="")

Let’s assume that a hospital’s neurosurgical team performed procedures for in-brain bleeding last year. x of these procedures resulted in death within 30 days. If the national proportion for death in these cases is , then is there evidence to suggest that your hospital’s proportion of deaths is more extreme than the national proportion?

Pick your own values of N, x, and x is necessarily less than or equal to N, and is a fixed probability of success. The probability should be greater than or equal to x.

Then model both as a binomial and a Poisson, and provide your R code solutions.
Do you get similar answers or not under the two different distributional assumptions, and can you guess why?

these answers are comparable as one is an exact (binomial) and the other is an estimation (poisson)

### starting with a binomial distribution
n<-300
x<-13
pr<-0.19
# look binomial test (using this approach to test for overall similarities)
binom.test(x,n,p=pr)


    Exact binomial test

data:  x and n
number of successes = 13, number of trials = 300, p-value = 9.303e-14
alternative hypothesis: true probability of success is not equal to 0.19
95 percent confidence interval:
 0.02327184 0.07296146
sample estimates:
probability of success 
            0.04333333

## next poisson
poisson.test(x,n,pr)


    Exact Poisson test

data:  x time base: n
number of events = 13, time base = 300, p-value = 4.767e-12
alternative hypothesis: true event rate is not equal to 0.19
95 percent confidence interval:
 0.02307317 0.07410132
sample estimates:
event rate 
0.04333333