HW2


## Possible Protocol 1**(PP1)** : roll once; if get 6 then conclude the dice is not fair; if roll any other number then conclude it is fair. Analyze PP1: if the dice were fair, what is the probability it would be judged to be unfair? Oppositely, if the dice were unfair, what is the probability that it would be judged to be fair?
 
 If the dice were fair, the probability of getting a 6 is E(6)=1/6
 
 If the dice were unfair, E(1,2,3,4,5)=5/6
  
  

## **PP2**: roll the dice 30 times. Group can specify a decision rule to judge that dice is fair or unfair. Consider the stats question: if fair dice are rolled 30 times, what is likely number of 6 resulting? How unusual is it, to get 1 more or less than that? How unusual is it, to get 2 more or less? 3? At least one member of the group should be able to do this with theory; at least one member of the group should be able to write a little program in R to simulate this. Analyze PP2 including the question: if the dice were fair, what is the chance it could be judged as unfair?
  
 In this multinormial experience, we will use chi-square test to evaluate whether the distribution of rolls we obtain from the simulation matches the theoretical distribution of a fair die.
 
 n=30
 E(X)=(1/6)*30=5
standard deviation 𝜎=[𝑁𝑝(1−𝑝)]^(1/2)=(30*1/6*5/6)^(1/2)=2.04

Now we consider the hypothesis as follow:

**Step1** :Null Hypothesis: the observed value matches the excepted value: H0: p1=p2=p3=p4=p5=p6

Alternative Hypothesis Ha: the observed values don't match the expected value.

**Step2**: we set up α=0.10 as the level of confidence and claim that the die is unfair.

**Step3**: gather and interprete data.

rolls_pp2<-sample(1:6,30,rep=T)
print(rolls_pp2)

##  [1] 2 1 6 2 5 5 5 6 5 4 6 1 2 3 5 6 4 3 3 1 2 3 4 3 6 3 2 2 1 6

library(ggplot2)
qplot(rolls_pp2,binwith=1)

## Warning: Ignoring unknown parameters: binwith

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

chisq.test(table(rolls_pp2), p = rep(1/6, 6))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(rolls_pp2)
## X-squared = 1.6, df = 5, p-value = 0.9012

0.7308>0.1==p-value> a Thus, we fail to reject the null hypothesis. At 0.1 level of significance, in this game there is no sufficient evidence to support the claim that the expected value don’t match the expected value. The die is fair.


1) the probability of getting 6 five time

x<-pbinom(5,30,1/6)
print(x)

## [1] 0.616447

2)the un-usualness of getting 1, 2,3 less than expected value

x<-pbinom(4,30,1/6)
print(x)

## [1] 0.4243389

x<-pbinom(3,30,1/6)
print(x)

## [1] 0.2396195

x<-pbinom(2,30,1/6)
print(x)

## [1] 0.1027904

##the probability of getting less and less sixs become smaller, vice vesa.

##PP3: roll 100 times and specify decision rules. Some cases are easy: if every roll comes to 6 then might quickly conclude. But what about the edge cases? Is it fair to say that every conclusion has some level of confidence attached? Where do you set boundaries for decisions? Analyze PP3.

Now roll it for 100 times

Repeat steps in PP2 Step1 :Null Hypothesis: the observed value matches the excepted value: H0: p1=p2=p3=p4=p5=p6

Alternative Hypothesis Ha: the observed values don’t match the expected value.

Step2: we set up α=0.05 as the level of confidence and claim that the die is unfair.

rolls_dice<-sample(1:6,100,rep=T)
print(rolls_dice)

##   [1] 2 3 4 1 6 6 4 2 6 6 5 5 1 5 5 1 2 5 5 6 3 2 2 1 6 1 3 1 1 2 4 4 1 2 1 4 1
##  [38] 1 4 3 5 5 1 5 4 4 1 1 3 1 4 5 2 5 4 2 2 2 3 4 4 2 5 6 2 4 4 6 1 6 3 3 3 4
##  [75] 4 1 2 4 3 3 2 6 1 2 1 5 3 3 5 1 2 2 6 6 3 6 4 1 4 1

library(ggplot2)
qplot(rolls_pp2,binwith=1)

## Warning: Ignoring unknown parameters: binwith

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

step3 simulation of 100 die rolls to set up randomly selected sample:

chisq.test(table(rolls_dice),p=rep(1/6,6))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(rolls_dice)
## X-squared = 3.8, df = 5, p-value = 0.5786

Interpretation: a=0.05, P-value=0.5963; Since P-value>a, we failed to reject the die is fair.

roll it 1000 times

roll_dice<-sample(1:6,1000,rep=T)
print(rolls_dice)

##   [1] 2 3 4 1 6 6 4 2 6 6 5 5 1 5 5 1 2 5 5 6 3 2 2 1 6 1 3 1 1 2 4 4 1 2 1 4 1
##  [38] 1 4 3 5 5 1 5 4 4 1 1 3 1 4 5 2 5 4 2 2 2 3 4 4 2 5 6 2 4 4 6 1 6 3 3 3 4
##  [75] 4 1 2 4 3 3 2 6 1 2 1 5 3 3 5 1 2 2 6 6 3 6 4 1 4 1

library(ggplot2)
qplot(roll_dice,binwith=1)

## Warning: Ignoring unknown parameters: binwith

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

With the population becomes larger, the distribution of each side become equally smooth bit by bit.

##EP1## Testing for the mean value

scenario: If the die is fair, each of the number will have an equal probability. thus the mean of the population would be 3.5. now we claim that the mean may be higher than 3.5. to test our assumption, we simulate a 30 times die roll and set up a=0.05.

step1: Null hypothesis: H0: μ≤3.5 Ha: μ>3.5

step2: we use the data from pp2 which is 30 times rolls and run on a T-test to test the population mean.

step3 calculation

boxplot(rolls_pp2)

 #Ho: mu<=3.5
  #one-side 95% confidence interval for mu
  t.test(rolls_pp2,mu=3.5,alternative="greater",conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  rolls_pp2
## t = 0.20801, df = 29, p-value = 0.4183
## alternative hypothesis: true mean is greater than 3.5
## 95 percent confidence interval:
##  3.022096      Inf
## sample estimates:
## mean of x 
##  3.566667

the P-value is 0.183>0.05, which means P-value is greater than the alpha. Thus, we failed to reject the null hypothesis. We can say there is 95% of confidence that the mean value is less than 3.5.

##EP2## Testing for the population proportion. ##1 according to the data from 1000 rolls above, at least 16% of the outcome turn outto be 6. ##2 we assumed that there is less than 16% chance of having an outcome of 6. to test our assumption, we set up a randomly selected sample of 50 times of die rolls. we wish to be 95% confident in this assumption.

Step1 Ho: p>=16% Ha: p<16%

step2 check n=50, p=16% c=0.95 np=8.3>5 n(1-p)=41.7>5 alpha=1-c=0.05

Step3 simulation and calculation

rolls_EP2<-sample(1:6,50,rep=T)
library(ggplot2)
qplot(rolls_EP2,binwith=1)

## Warning: Ignoring unknown parameters: binwith

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

In this die roll, we got 6 out of 50 of getting a six.

prop.test(6,50,p=0.16,alternative="less")

## 
##  1-sample proportions test with continuity correction
## 
## data:  6 out of 50, null probability 0.16
## X-squared = 0.33482, df = 1, p-value = 0.2814
## alternative hypothesis: true p is less than 0.16
## 95 percent confidence interval:
##  0.0000000 0.2275205
## sample estimates:
##    p 
## 0.12

We have 0.2814>0.05. The P-value is greater than the alpha, thus we fail to reject the null hypothesis. At 95% confidence, we can’t provide enough evidence to support our assumption which is mean of the population will have less than 16% to be 3.5.

HW2

Joyce, Yameili, Tamires

9/20/2020

Now roll it for 100 times

roll it 1000 times

With the population becomes larger, the distribution of each side become equally smooth bit by bit.