Examples:
Takes 1 random sample from the vector (in this case, c(“heads”, “tails”) you supplied in the first argument.
sample(c("heads", "tails"), 1)
[1] "tails"
Takes ten random samples from the vector you supplied. Note that you have to specify replace=TRUE or there wouldn’t be ten outcomes available for sampling in your vector. Here, we’ve saved the outcome as ‘flips’. You are likely to get a different sample each time you run this code.
flips<-sample(c("heads", "tails"), 10, replace = TRUE)
flips
[1] "heads" "heads" "tails" "heads" "tails" "tails" "heads" "heads" "heads"
[10] "heads"
#flips<-sample(c("heads", "tails"), 10, replace = FALSE)
#flips
Yields a count of the number of heads in ‘flips’. The sum function works because R treats true (heads) as equaling 1 and false (tails) as equaling 0.
sum(flips=="heads")
[1] 5
We are telling R that this function requires one argument, k, which is the number of coin flips. The code within the { brackets } dictates what the function will do. In this case, it will automatically simulate k number of coin flips and calculate the sum the number of heads.
head_count<-function(k){
flips <- sample(c("heads", "tails"), k, replace = TRUE)
sum(flips=="heads")
}
head_count(10)
[1] 5
Using our function ‘head_count’, we can conduct a single experiment with 10 coin flips. We can repeat this experiment 100 times using the replicate function. Note that we are calling this data ‘heads’. *Check: Look at the results of your 100 replicates of the experiment. What does the output mean?
heads <- replicate(100, head_count(10))
Displays a histogram of the frequency of how many heads there were in each of the 100 replicates of the coin-tossing experiment saved as ‘heads’.
hist(heads)

hist(heads, freq=FALSE)

Displays a summary table with the frequencies of how many heads there were across the 100 replicates of the coin-tossing experiment saved as ‘heads’.
table(heads)
heads
0 1 2 3 4 5 6 7 8
1 2 5 15 24 21 15 11 6
- If we flip a coin 20 times, what is the expected probability of getting six heads? Conduct a simulation using 1000 replicates.
head_count <- function(k){
flips <- sample(c("heads", "tails"), k, replace = TRUE)
sum(flips=="heads")
}
table(replicate(1000, head_count(20)))/100
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0.03 0.04 0.15 0.41 0.80 1.31 1.42 1.76 1.63 1.23 0.69 0.32 0.11 0.07 0.03
- Simulating coin flips: Generate 10 separate random flips with probability 0.7 of producing heads. What kind of values do you see and what do they represent?
rr rbinom(10, 10, 0.7)
[1] 7 8 7 8 7 8 7 8 6 9
Getting to know the binomial distribution functions in R: Simulates a random draw from a binomial distribution. This function has three arguments: the number of random draws, the number of coins being flipped on each draw, and the probability of a heads as the outcome.As before, heads will be assigned a value of 1 and tails a value of 0.
rbinom(1,1,.5)
Simulates 10 flips of a single coin.
rbinom(10,1,.5)
[1] 0 0 0 0 1 0 1 1 0 1
Simulates one draw of ten coins. This counts the number of heads out of 10 flips.
rbinom(1,10,.5)
[1] 4
Simulates 10 draws with each having 10 coins and reports the number of heads from each draw. You can think of the number of draws as the number of replicates and the number of coins as the sample size in each replicate.
rbinom(10,10,.5)
[1] 6 3 7 4 5 4 5 7 4 4
Simulates 10 draws of 10 coins, each with an 80% chance of producing a head instead of a 50% chance. These biased coins are still considered random variables because the outcome is random.
rbinom(10,10,.8)
[1] 9 8 9 9 7 8 10 8 10 6
Simulates 100000 draws, each with 10 fair coins, and saves the outcome as ‘flips’.
flips<-rbinom(100000, 10, .5)
Finds the fraction of draws where 5 heads occurred, which reflects the probability that this outcome will occur.
mean(flips==5)
[1] 0.24626
Finds the cumulative probability that heads will occur 4 or fewer times during each draw. This is referred to as the cumulative density because it sums up the probability that the number of heads is 0, 1, 2, 3, or 4 out of 10 flips.
mean(flips<=4)
[1] 0.37734
Estimates the exact probability density at a given point. The arguments here are the density being estimated (5 heads), the number of coins (10), and the probability of producing a head (0.5).
dbinom(5,10, .5)
[1] 0.2460938
Estimates the exact cumulative probability density for a given range. The arguments here are the density being estimated (4 or fewer heads), the number of coins (10), and the probability of producing a head (0.5).
pbinom(4,10,.5)
[1] 0.3769531
- Simulating coin flips: Generate 10 separate random flips with probability 0.7 of producing heads. What kind of values do you see and what do they represent?
rbinom(10, 1, .7)
[1] 0 1 1 0 0 1 0 1 1 1
- Simulating draws from a binomial distribution: Generate 100 occurrences of flipping 10 coins, each with 70% probability of producing heads. What kind of values do you produce and what do they represent? Produce a plot of the probability distribution that you generate and describe its shape.
flips<-rbinom(100, 10, 0.7)
table(flips)/100
flips
3 4 5 6 7 8 9 10
0.03 0.02 0.14 0.21 0.25 0.15 0.15 0.05
hist(flips, freq = F)

- Calculate the exact probability that 2 heads will arise from 10 coin flips with a 70% probability of coming up tails. Compare your answer with a simulation of 10,000 trials. Do the two approaches yield similar results?
dbinom(2, 10, 0.3) #Note that the word problem lists probability of getting tails
[1] 0.2334744
mean(rbinom(10000, 10, 0.3)==2)
[1] 0.2269
- Calculate the cumulative probability that at least five coins out of 10 are heads with a 30% probability of coming up heads. Compare your answer with a simulation of 10,000 trials. Do the two approaches yield similar results?
1-pbinom(4, 10, .3) #Note that we have to subtract from 1 to get the right range
[1] 0.1502683
pbinom(4, 10, .3, lower.tail = FALSE) #same as the line above
[1] 0.1502683
mean(rbinom(10000, 10, .3)>= 5)
[1] 0.1498
- Repeat the simulation you ran in exercise (4) with 10, 100, 1,000, 10,000, and 100,000 trials. Which simulation yields a result most similar to the exact probability? Produce a plot depicting the number of trails on the x-axis and the associated probabilities you calculated on the y-axis. Make sure to adjust your axis labels appropriately. What pattern do you see? Produce the plot again but this time log-transform the number of trials. What does this graph reveal that your first graph did not?
r10<-mean(rbinom(10, 10, 0.3)==2)
r100<-mean(rbinom(100, 10, 0.3)==2)
r1000<-mean(rbinom(1000, 10, 0.3)==2)
r10000<-mean(rbinom(10000, 10, 0.3)==2)
r100000<-mean(rbinom(100000, 10, 0.3)==2)
r<-c(r10, r100, r1000, r10000, r100000)
n<-c(10, 100, 1000, 10000, 100000)
plot(n, r, xlab="Number of replicates", ylab="Probability of 70% flips coming up tails")

plot(log(n), r, xlab="Log of number of replicates", ylab="Probability of 70% flips coming up tails")

- If events A and B are independent, and A has a 50% chance of happening and B has a 30% chance of happening, what is the probability that they will both happen? Use simulations to answer this question.
A<-rbinom(1000, 1, 0.5)
B<-rbinom(1000, 1, 0.3)
mean(A & B) #Answer is around 15.2%
[1] 0.162
- Expanding on exercise (7), event C has a 70% chance of happening. What is the probability that events A, B, and C all happen?
C<-rbinom(1000, 1, 0.7)
mean(A & B & C) #Answer is around 9.5%
[1] 0.105
- If events A and B are independent, and A has a 40% chance of coming up heads and B has a 75% chance of coming up heads, what is the probability that either A or B will come up heads? Use simulations to answer this question.
A<-rbinom(1000, 1, 0.4)
B<-rbinom(1000, 1, 0.75)
mean(A|B) #Answer is around 86.1%
[1] 0.866
- Suppose X is a random binomially distributed variable (10, 0.3) and Y is another random binomially distributed variable (10, 0.65), and that they are independent. What is the probability that either of these variables is less than or equal to 5? Estimate this probability both using simulations of 10,000 trials and by calculating exact cumulative densities. How do these two approaches compare?
X<-rbinom(100000, 10, 0.3)
Y<-rbinom(100000, 10, 0.65)
mean(X<=5|Y<=5)
[1] 0.96442
prob_X_less<-pbinom(5, 10, 0.3)
prob_Y_less<-pbinom(5, 10, 0.65)
prob_X_less + prob_Y_less - prob_X_less*prob_Y_less
[1] 0.9644174
- Write a function called ‘dice’ containing no arguments that simulates a roll of two, six-sided dice and calculates the sum of the sides of the dice. Use the replicate function to repeat this process 100 times and create a plot of the resulting probability distribution. What is the probability of that your two dice will produce a sum of 2 (i.e. that your dice come up ‘snake eyes’)?
Repeat this process using 1,000 and 10,000 replicates. Does the shape of the probability distribution change as the number of replicates increases? Does the probability of rolling ‘snake eyes’ depend on the number of replicates used to estimate probabilities?
dice <- function() {
tosses <- sample(x=1:6, size=2, replace=T)
sum(tosses)
}
rep_1K<-replicate(1000, dice())
barplot(table(rep_1K)/1000)

table(rep_1K)/1000
rep_1K
2 3 4 5 6 7 8 9 10 11 12
0.027 0.045 0.079 0.114 0.164 0.166 0.115 0.122 0.081 0.066 0.021
rep_1H<-replicate(100, dice())
barplot(table(rep_1H)/100)

table(rep_1H)/100
rep_1H
2 3 4 5 6 7 8 9 10 11 12
0.01 0.07 0.08 0.13 0.12 0.17 0.17 0.15 0.03 0.04 0.03
rep_10K<-replicate(10000, dice())
barplot(table(rep_10K)/10000)

table(rep_10K)/10000
rep_10K
2 3 4 5 6 7 8 9 10 11 12
0.0260 0.0532 0.0867 0.1108 0.1335 0.1705 0.1364 0.1150 0.0820 0.0560 0.0299
---
title: "R Notebook"
output:
  html_notebook: default
  pdf_document: default
---

Examples:

Takes 1 random sample from the vector (in this case, c(“heads”, “tails”) you supplied in the first argument.

```{r}
sample(c("heads", "tails"), 1)
```

Takes ten random samples from the vector you supplied.  Note that you have to specify replace=TRUE or there wouldn’t be ten outcomes available for sampling in your vector.  Here, we’ve saved the outcome as ‘flips’.  You are likely to get a different sample each time you run this code.  

```{r}
flips<-sample(c("heads", "tails"), 10, replace = TRUE)
flips
#flips<-sample(c("heads", "tails"), 10, replace = FALSE) #produces error b/c sample space not large enoguh without replacement
#flips
```

Yields a count of the number of heads in ‘flips’. The sum function works because R treats true (heads) as equaling 1 and false (tails) as equaling 0.

```{r}
sum(flips=="heads")
```

We are telling R that this function requires one argument, k, which is the number of coin flips.  The code within the { brackets } dictates what the function will do.  In this case, it will automatically simulate k number of coin flips and calculate the sum the number of heads.

```{r}
head_count<-function(k){
  flips <- sample(c("heads", "tails"), k, replace = TRUE)
  sum(flips=="heads")
}
head_count(10)
```

Using our function ‘head_count’, we can conduct a single experiment with 10 coin flips.  We can repeat this experiment 100 times using the replicate function. Note that we are calling this data ‘heads’.
*Check: Look at the results of your 100 replicates of the experiment. What does the output mean?

```{r}
heads <- replicate(100, head_count(10))
```

Displays a histogram of the frequency of how many heads there were in each of the 100 replicates of the coin-tossing experiment saved as ‘heads’.  

```{r}
hist(heads)
hist(heads, freq=FALSE)
```

Displays a summary table with the frequencies of how many heads there were across the 100 replicates of the coin-tossing experiment saved as ‘heads’.

```{r}
table(heads)
```
1) If we flip a coin 20 times, what is the expected probability of getting six heads?  Conduct a simulation using 1000 replicates.

```{r}
head_count <- function(k){
  flips <- sample(c("heads", "tails"), k, replace = TRUE)
  sum(flips=="heads")
}
table(replicate(1000, head_count(20)))/100
```

2) Simulating coin flips:  Generate 10 separate random flips with probability 0.7 of producing heads.  What kind of values do you see and what do they represent?

```{r}
rbinom(10, 10, 0.7)
```

Getting to know the binomial distribution functions in R:
Simulates a random draw from a binomial distribution.  This function has three arguments: the number of random draws, the number of coins being flipped on each draw, and the probability of a heads as the outcome.As before, heads will be assigned a value of 1 and tails a value of 0.

```{r}
rbinom(1,1,.5)
```

Simulates 10 flips of a single coin.
```{r}
rbinom(10,1,.5)
```

Simulates one draw of ten coins.  This counts the number of heads out of 10 flips.

```{r}
rbinom(1,10,.5)
```

Simulates 10 draws with each having 10 coins and reports the number of heads from each draw.  You can think of the number of draws as the number of replicates and the number of coins as the sample size in each replicate.

```{r}
rbinom(10,10,.5)
```

Simulates 10 draws of 10 coins, each with an 80% chance of producing a head instead of a 50% chance.  These biased coins are still considered random variables because the outcome is random.

```{r}
rbinom(10,10,.8)
```

Simulates 100000 draws, each with 10 fair coins, and saves the outcome as ‘flips’. 

```{r}
flips<-rbinom(100000, 10, .5)
```

Finds the fraction of draws where 5 heads occurred, which reflects the probability that this outcome will occur.

```{r}
mean(flips==5)
```

Finds the cumulative probability that heads will occur 4 or fewer times during each draw.  This is referred to as the cumulative density because it sums up the probability that the number of heads is 0, 1, 2, 3, or 4 out of 10 flips.

```{r}
mean(flips<=4)
```

Estimates the exact probability density at a given point.  The arguments here are the density being estimated (5 heads), the number of coins (10), and the probability of producing a head (0.5).

```{r}
dbinom(5,10, .5)
```

Estimates the exact cumulative probability density for a given range.  The arguments here are the density being estimated (4 or fewer heads), the number of coins (10), and the probability of producing a head (0.5).

```{r}
pbinom(4,10,.5)
```

2) Simulating coin flips:  Generate 10 separate random flips with probability 0.7 of producing heads.  What kind of values do you see and what do they represent?

```{r}
rbinom(10, 1, .7)
```

3) Simulating draws from a binomial distribution:  Generate 100 occurrences of flipping 10 coins, each with 70% probability of producing heads.  What kind of values do you produce and what do they represent?  Produce a plot of the probability distribution that you generate and describe its shape.

```{r}
flips<-rbinom(100, 10, 0.7)
table(flips)/100
hist(flips, freq = F)
```

4) Calculate the exact probability that 2 heads will arise from 10 coin flips with a 70% probability of coming up tails.  Compare your answer with a simulation of 10,000 trials.  Do the two approaches yield similar results?


```{r}
dbinom(2, 10, 0.3)  #Note that the word problem lists probability of getting tails
mean(rbinom(10000, 10, 0.3)==2)
```

5) Calculate the cumulative probability that at least five coins out of 10 are heads with a 30% probability of coming up heads.  Compare your answer with a simulation of 10,000 trials.  Do the two approaches yield similar results?

```{r}
1-pbinom(4, 10, .3) #Note that we have to subtract from 1 to get the right range
pbinom(4, 10, .3, lower.tail = FALSE) #same as the line above
mean(rbinom(10000, 10, .3)>= 5)
```

6) Repeat the simulation you ran in exercise (4) with 10, 100, 1,000, 10,000, and 100,000 trials.  Which simulation yields a result most similar to the exact probability?  Produce a plot depicting the number of trails on the x-axis and the associated probabilities you calculated on the y-axis.  Make sure to adjust your axis labels appropriately.  What pattern do you see?  Produce the plot again but this time log-transform the number of trials.  What does this graph reveal that your first graph did not?

```{r}
r10<-mean(rbinom(10, 10, 0.3)==2)
r100<-mean(rbinom(100, 10, 0.3)==2)
r1000<-mean(rbinom(1000, 10, 0.3)==2)
r10000<-mean(rbinom(10000, 10, 0.3)==2)
r100000<-mean(rbinom(100000, 10, 0.3)==2)

r<-c(r10, r100, r1000, r10000, r100000)
n<-c(10, 100, 1000, 10000, 100000)

plot(n, r, xlab="Number of replicates", ylab="Probability of 70% flips coming up tails")
plot(log(n), r, xlab="Log of number of replicates", ylab="Probability of 70% flips coming up tails")

```

7) If events A and B are independent, and A has a 50% chance of happening and B has a 30% chance of happening, what is the probability that they will both happen?  Use simulations to answer this question.

```{r}
A<-rbinom(1000, 1, 0.5)
B<-rbinom(1000, 1, 0.3)
mean(A & B)  #Answer is around 15.2%

```

8) Expanding on exercise (7), event C has a 70% chance of happening.  What is the probability that events A, B, and C all happen?

```{r}
C<-rbinom(1000, 1, 0.7)
mean(A & B & C)  #Answer is around 10.5%

```

9) If events A and B are independent, and A has a 40% chance of coming up heads and B has a 75% chance of coming up heads, what is the probability that either A or B will come up heads?  Use simulations to answer this question.

```{r}
A<-rbinom(1000, 1, 0.4)
B<-rbinom(1000, 1, 0.75)
mean(A|B)  #Answer is around 86.1%
```

10) Suppose X is a random binomially distributed variable (10, 0.3) and Y is another random binomially distributed variable (10, 0.65), and that they are independent.  What is the probability that either of these variables is less than or equal to 5?  Estimate this probability both using simulations of 10,000 trials and by calculating exact cumulative densities.  How do these two approaches compare?

```{r}
X<-rbinom(100000, 10, 0.3)
Y<-rbinom(100000, 10, 0.65)
mean(X<=5|Y<=5)

prob_X_less<-pbinom(5, 10, 0.3)
prob_Y_less<-pbinom(5, 10, 0.65)
prob_X_less + prob_Y_less - prob_X_less*prob_Y_less

```

11) Write a function called ‘dice’ containing no arguments that simulates a roll of two, six-sided dice and calculates the sum of the sides of the dice. Use the replicate function to repeat this process 100 times and create a plot of the resulting probability distribution.  What is the probability of that your two dice will produce a sum of 2 (i.e. that your dice come up ‘snake eyes’)?

Repeat this process using 1,000 and 10,000 replicates.  Does the shape of the probability distribution change as the number of replicates increases?  Does the probability of rolling ‘snake eyes’ depend on the number of replicates used to estimate probabilities?

```{r}
dice <- function() {
                        tosses <- sample(x=1:6, size=2, replace=T)
                        sum(tosses)
}
 
 
rep_1K<-replicate(1000, dice())
barplot(table(rep_1K)/1000)
table(rep_1K)/1000
 
rep_1H<-replicate(100, dice())
barplot(table(rep_1H)/100)
table(rep_1H)/100
 
rep_10K<-replicate(10000, dice())
barplot(table(rep_10K)/10000)
table(rep_10K)/10000
```


