Preliminaries

aliens <- read.csv("aliens.csv", header = TRUE, stringsAsFactors = TRUE)
library (skimr) 
source('special_functions.R') 
my_sample <- suppressWarnings(make.my.sample(33243684, 100, aliens))

Question 1

Compute the probability of the following outcomes: • a rat who has a probability of .2 of successfully completing a maze does so exactly 4 times out of 10 trials

dbinom(4, 10, .2)
## [1] 0.08808038

• a class with 20 students contains exactly 10 girls, assuming that the probability that each student is a girl is .5

dbinom(10, 20, .5)
## [1] 0.1761971

• a class with 20 students contains exactly three students born in January, assuming that the probability that any student is born in January is 1/12

dbinom(3, 20, 1/12)
## [1] 0.1502988

Question 2

x.var <- seq(0, 20, 1)
probs <- dbinom(x.var, 10, .2)
plot(x.var, probs, type = "h")

• the number of girls in a class of 20 students, if the probability that each student is a girl is .5

x.var <- seq(0, 20, 1)
probs <- dbinom(x.var, 20, .5)
plot(x.var, probs, type = "h")

• the number of students in a class of 20 who are born in January, if the probability that any student is born in January is 1/12

x.var <- seq(0, 20, 1)
probs <- dbinom(x.var, 20, 1/12)
plot(x.var, probs, type = "h")

Question 3

x.var <- seq(0, 20, 1)
probs <- dbinom(x.var, 20, 1/12)
plot(x.var, probs, type = "l")

The reason it is better to use type = “h” or type = “p” is because the binomial distribution is more efficient and easy to understand then using type = “l”. This is because type = “l” creates a binomial distribution using a horizontal and uneven line. Rather than the symmetrical vertical lines or points that using type = “h” or type = “p” create. Which make the probability easier to read. The data is also not a continuous value so the line should not be connecting all the way.

Question 4

Does the number of trials matter? The trials do not matter because the median doesn’t change. Does the probability of success matter? No the probability of success does not matter. Do these things work together in determining whether the distribution is symmetrical? The distributions going to be symmetrical when it is .5 probability. There will be a same rate of success for failure.

pbinom(5, 20, .1667)
## [1] 0.8980818
plot(x.var, probs, type = "h")

If you look again at the graph you made of the binomial distribution for this situation, you can see why the probability is so high: there’s very little probability of rolling 1 six times, seven times, or eight times, etc. ## Question 5 Determine the cumulative probability in the following situations: • A mouse that can successfully complete 20% of mazes will complete 5 or fewer out of 10.

pbinom(5, 10, .20)
## [1] 0.9936306

• A new schizophrenia drug that improves outcomes for 80% of patients will help 70 or fewer in a sample of 100 patients.

pbinom(70, 100, .80)
## [1] 0.01124898

Question 6

Based on what you got in question 5, what’s the probability of • The mouse completing 6 or more out of 10 mazes.

1- pbinom(6, 10, .20)
## [1] 0.0008643584

• The drug helping more than 70 out of the 100 patients.

1- pbinom(70, 100, .80)
## [1] 0.988751

Question 7

summary(my_sample$antennae) 
##    Curly Straight 
##       21       79
pbinom(21, 100, .50)
## [1] 2.168683e-09

The reports from previous explorers was correct, because it is probable most aliens would have curly hair because the probability for the curly sample is over 1.