Let the discrete random variable X represent the number of Heads (successes). Since this experiment fits the Binomial Setting, the distribution of X is binomial \(B(n, p)=B(4, 0.5)\). We can use the plotDist command to plot this distribution (Note: we can just give R Studio the value of the two parameters n and p, and it will know which number is n and which is p. This is because n is an integer and p is always between 0 and 1).
plotDist("binom", 4, 0.5)
To make the plot look nicer, we can ask to plot a histogram using the optional command kind=“histogram”
plotDist("binom", 4, 0.5, kind="histogram")
We can use the command dbinom(x, n, p) to get the probability that X is equal the input x. For example, the probability that we get three heads, P(X=3) can be found with the following command:
dbinom(3, 4, 0.5)
## [1] 0.25
Similarly, P(X=2)=probability that there are two heads can be found by
dbinom(2, 4, 0.5)
## [1] 0.375
We can use the command pbinom(x, n, p) to get the probability that X is less than or equal to the input x. For example, the probability of getting three of fewer heads, i.e. P(X \(\le\) 3) can be found by
pbinom(3, 4, 0.5)
## [1] 0.9375
The following are the R commands used on the Binomial Distribution handout from class (posted on Blackboard).
The financial records of businesses may be audited by state tax authorities to test compliance with tax laws. It is too time-consuming to examine all sales and purchases made by a company during the period covered by the audit. Suppose the auditor examines an SRS of 150 sales records out of 10000 available. One issue is whether each sale was correctly classified as subject to sales tax or not. Suppose that 800 of the 10000 sales are incorrectly classified.
Let \(S_i=1\) if the \(i^{\text{th}}\) observation in the sample is incorrectly classified, and \(S_i=0\) if the \(i^{\text{th}}\) observation in the sample is correctly classified. Then, for each \(S_i\), \(P(S_i=0)=.92\) and \(P(S_i=1)=.08\). These probabilities are approximate because the sample doesn’t quite fit the binomial setting (refer to your class notes or the textbook for a more detailed explanation).
Create a smaple of 150 observations with these probabilities:
S<-sample(0:1, 150, prob=c(.92, .08), replace=T)
Let X represent the number of incorrectly classified observations in the sample. We can find X using the following command
X<-sum(S)
X
## [1] 8
Now, generate 5000 samples of size 150 and record the number of incorrectly classified observations in each sample using the command
counts<-do(5000)*sum(sample(0:1, 150, prob=c(.92, .08), replace=T))
Rename the column name in counts using the command
names(counts)[1]<-"X"
Create a histogram of the number of incorrectly classified records using the command
histogram(~X, data=counts)
Plot the histogram of the binomial distribution \(B(150, .08)\) on top of the histogram you just created using the command
plotDist("binom", 150, .08, col=rgb(0, 0.7, 0, 0.4), kind="histogram", add=T)
This shows that the distribution of X is indeed approximately Binomial.
Find the mean number of incorrectly classified records, \(\mu_X\), using the command
mean(~X, data=counts)
## [1] 11.916
Find the standard deviation of the number of incorrectly classified records, \(\sigma_X\), using the command
sd(~X, data=counts)
## [1] 3.357261
Find what proportion of each of the 5000 samples was incorrectly classified, and create a second column in counts with these values using the command
counts<-transform(counts, phat=X/150)
Find the mean of the 5000 sample proportions using the command
mean(~phat, data=counts)
## [1] 0.07944
Find the standard deviation of the 5000 sample proportions using the command
sd(~phat, data=counts)
## [1] 0.02238174
Recreate the histogram from question 2(c):
histogram(~X, data=counts)
Add the Normal curve \(N(np,\sqrt{np(1-p)})=N((150)(.08),\sqrt{(150)(.08)(.92)})=N(12, 3.32265)\) on top of the histogram:
plotDist("norm", mean=12, sd=3.32265, add=T, col="red")
Notice that the Normal curve captures the shape of the histogram very well.
Create a histogram of the sample proportions:
histogram(~phat, data=counts)
Add the Normal curve \(N(p, \sqrt{\frac{p(1-p)}{n}})=N(.08, \sqrt{\frac{(.08)(.92)}{150}})=N(.8,.022151)\) to the histogram
plotDist("norm", mean=.08, sd=.022151, add=T, col="red")
Notice that the Normal curve captures the shape of the histogram very well.