Lab 5 R Markdown

Robert W. Walker

2014-09-26

## Loading required package: splines
## Loading required package: RcmdrMisc
## Loading required package: car
## Loading required package: sandwich
## The Commander GUI is launched only in interactive sessions

Our organization has been offered a meeting space with capacity of 72. We need to know the distribution of attendance because this entirely determines the adequacy of the space. Attendance is a binomial random variable; we know how many possible attendees and we know the probability that each attends. Let’s plot attendance.

> local({
+   .x <- 44:76
+   plotDistr(.x, dbinom(.x, size=100, prob=0.6), xlab="Number of Successes", 
+   ylab="Probability Mass", 
+   main="Binomial Distribution:  Binomial trials=100, Probability of success=0.6",
+    discrete=TRUE)
+ })

plot of chunk unnamed-chunk-2

Though the graph is instructive, the numerical probabilities are far more useful. Because the probability of each specific number of attendees is given by the binomial, we can examine the entire distribution – the probability of attendance at every possible value ranging from 0 to 100 attendees. These are the binomial probabilities.

> local({
+   .Table <- data.frame(Probability=dbinom(0:100, size=100, prob=0.6))
+   rownames(.Table) <- 0:100 
+   print(.Table)
+ })
                                      Probability
0   0.0000000000000000000000000000000000000001607
1   0.0000000000000000000000000000000000000241041
2   0.0000000000000000000000000000000000017897272
3   0.0000000000000000000000000000000000876966351
4   0.0000000000000000000000000000000031899651015
5   0.0000000000000000000000000000000918709949233
6   0.0000000000000000000000000000021819361294284
7   0.0000000000000000000000000000439504277499144
8   0.0000000000000000000000000007663855838891529
9   0.0000000000000000000000000117512456196333931
10  0.0000000000000000000000001604045027079957665
11  0.0000000000000000000000019686007150526630794
12  0.0000000000000000000000219006829549607889189
13  0.0000000000000000000002223761653888350474590
14  0.0000000000000000000020728635416602217318716
15  0.0000000000000000000178266264582777207692819
16  0.0000000000000000001420559295894027374788232
17  0.0000000000000000010528851251920400012565202
18  0.0000000000000000072824554492449748953342148
19  0.0000000000000000471443168556383824553679407
20  0.0000000000000002864017248980014826095774616
21  0.0000000000000016365812851314242525307268705
22  0.0000000000000088152219221852040117320958856
23  0.0000000000000448426506476380200413509036217
24  0.0000000000002158052562417572030947082728503
25  0.0000000000009840719684624048833529452728897
26  0.0000000000042580037096931031012983503991620
27  0.0000000000175051263620718727115344037770228
28  0.0000000000684575477373878963846531853221222
29  0.0000000002549453501944109151504391785891812
30  0.0000000009050559931901489001300487036161257
31  0.0000000030655122349988794018538218466574108
32  0.0000000099150161350746598750566240809689589
33  0.0000000306464135084122346434909456291961760
34  0.0000000905871928704548350867048478463061656
35  0.0000002562323455478555586477262817624023228
36  0.0000006939626025254391182301577645219481383
37  0.0000018005516173633084209271260078821796924
38  0.0000044776875747587547003415731072806238444
39  0.0000106775626782708722272569046296553096909
40  0.0000244249246265447276247120833581050192151
41  0.0000536156882046103183767776778623215250263
42  0.0001129759144311438785008780416596607665269
43  0.0002285791757095217155773547723640604090178
44  0.0004441708982537368247123232833928341278806
45  0.0008291190100736257591676481304432400065707
46  0.0014870069202407333597282246273607597686350
47  0.0025627140540319397372936993662051463616081
48  0.0042444951519903595729688028370674146572128
49  0.0067565433031684029793750845271915750345215
50  0.0103375112538475772555601750468667887616903
51  0.0152022224321287785508971523995569441467524
52  0.0214877567069513483732912106916046468541026
53  0.0291909147717074879402332499012118205428123
54  0.0381103609519514580084020849426451604813337
55  0.0478111801033572828001361187943984987214208
56  0.0576295474460109882763880762013286584988236
57  0.0667289496743285470703455075636156834661961
58  0.0742071940343825980912484396867512259632349
59  0.0792381902401034649008337851228134240955114
60  0.0812191449961060657480871327607019338756800
61  0.0798876836027272441143054493295494467020035
62  0.0753778950122507146458517013343225698918104
63  0.0681990478682268502774732610305363778024912
64  0.0591413618232279753028635127520828973501921
65  0.0491328236685278571527213387071242323145270
66  0.0390829279181471728188412839699594769626856
67  0.0297496914003806643689298283561583957634866
68  0.0216560253576300043576452480920124799013138
69  0.0150650611183513125079791450389166129752994
70  0.0100075048857619329500945326572036719880998
71  0.0063427847867505368628648909634648589417338
72  0.0038320991419951225265272398701199563220143
73  0.0022047693693670487521951706355594069464132
74  0.0012066643170184467373506898013602040009573
75  0.0006274654448495952134209896478012069565011
76  0.0003096046602876301808045245156364444483188
77  0.0001447502307838269123597962906302427654737
78  0.0000640241405390002733420337643899244994827
79  0.0000267442612378102977620242253209426053218
80  0.0000105305528623879291513342620900672841344
81  0.0000039002047638473022676645907136361302037
82  0.0000013555589728006032860770438408515303763
83  0.0000004409649670556138715410421369256255275
84  0.0000001338643649990266671755828609136074192
85  0.0000000377969971761954790564347339554274186
86  0.0000000098887492612139870282184084260279633
87  0.0000000023869394768447759979344835468140218
88  0.0000000005289240886190120313585516509391482
89  0.0000000001069734111813723689550330309128157
90  0.0000000000196117920499185028719579815259522
91  0.0000000000032327129752612767561725792830885
92  0.0000000000004743654909350743366022729041731
93  0.0000000000000612084504432348030199262378659
94  0.0000000000000068371141452550154844584884284
95  0.0000000000000006477266032346843997735685861
96  0.0000000000000000506036408777089375624763656
97  0.0000000000000000031301221161469874908808014
98  0.0000000000000000001437300971700139834040175
99  0.0000000000000000000043554574900005497417599
100 0.0000000000000000000000653318623500068971360

The specific space that we were offered accomodates 72 people. Because R treats the value that we specify as a member/part of the lower tail for discrete distributions, we want to know the probability that our meeting space for 72 is adequate and that involves summing up all of the probabilities of attendance at 72 or less. How often is attendance 72 or less? R will answer this with a tail probability [Binomial, n=100, p=0.6]

> pbinom(c(72), size=100, prob=0.6, lower.tail=TRUE)
[1] 0.9954

So that means that our proposed space seating 72 people is too small with probability one minus the above. I calculate that as the exact same tail probability but now looking in the upper tail [because that is attendance greater than 72].

> pbinom(c(72), size=100, prob=0.6, lower.tail=FALSE)
[1] 0.0046

We can also plan a look for alternative spaces by knowing how often a space of a given size is adequate. We can formulate that problem as follows. What is the 95th percentile of attendance? If we knew this, we would know how many people we need to seat so that 95% of our meetings can accomodate everyone that shows up. Perhaps 90% of meetings is a better standard because space costs money. We can quantify all of the tradeoffs by calculating this for varied percentages. In either case, when we specify a probability and ask R to recover the value given that probability [in the distribution’s given tail], this is a quantile. In this case, a binomial (100, p=0.6) quantile. Or we could characterize the proportion of meetings accomodated given a meeting space of a certain size [tail probabilities].

> qbinom(c(0.90,0.95), size=100, prob=0.6, lower.tail=TRUE)
[1] 66 68

The campus survey says 54 students are not satisfied. The chef claims that people are generally satisfied [the probability of a No is not only less than 0.5 but is 0.46]. If he is right, what is the probability of 54 or more No responses by chance alone?

> local({
+   .x <- 30:62
+   plotDistr(.x, dbinom(.x, size=100, prob=0.46), xlab="Number of No Responses",
+    ylab="Probability Mass", 
+   main="Binomial Distribution:  n=100, Pr(No)=0.46",
+    discrete=TRUE)
+ })

plot of chunk unnamed-chunk-7 That’s also a tail probability in a binomial. The probability above 53 students is 54 or more. So I need the upper tail probability of 53 [representing 54 or more] given a binomial with p=0.46 and n=100.

> pbinom(c(53), size=100, prob=0.46, lower.tail=FALSE)
[1] 0.06642

Turning to a Poisson problem, the Poisson is defined by a rate of arrivals in units of time. People arrive randomly in the mortgage bank queue at a rate of 7 per hour. Quantity per unit time. What I want to know are things abour how many actual people to expect with a given probability or what is the probability of a given number of people arriving?

A Plot of Mortgage Bank Arrivals

> local({
+   .x <- 0:17
+   plotDistr(.x, dpois(.x, lambda=7), xlab="Mortgage Clients", ylab="Probability Mass", 
+   main="Poisson Distribution:  Mean=7", discrete=TRUE)
+ })

plot of chunk unnamed-chunk-9

What are the chances that more than 11 arrive? Again, the specified number for a Poisson tail probability will be a part of the lower tail. So the upper tail of 11 will give me 12 and more than 12.

> ppois(c(11), lambda=7, lower.tail=FALSE)
[1] 0.05335

What is the most that I should expect 90% of the time? That’s a Poisson quantile, I specify the probability and R returns the number of events for that probability.

> qpois(c(0.9), lambda=7, lower.tail=TRUE)
[1] 10

Air Traffic Control Problems

If we know events per unit time, we can alter time by known conversion factors. For example, 21 events per year is 21 events per 52 weeks or 21/52 per week. 21/52 is 0.4038462. The arrival of errors in a week for the Cleveland air traffic control center follows a Poisson(0.4038462). What is the probability of more than 2 errors in one week?

> ppois(c(2), lambda=0.4038462, lower.tail=FALSE)
[1] 0.008134

Later we learn that there are 6 errors in the first two weeks. What are the chances of that? The arrival rate needs to be altered. We know of 21 errors per year. There are 26 2-weeks per year. Thus, events arrive at a rate of 21/26 2-weeks [0.8076924]. What is the probability of 6 or more given an arrival rate of 21/26 per 2-week period.

> ppois(c(5), lambda=2*0.4038462, lower.tail=FALSE)
[1] 0.000194

The first two weeks of August strongly suggest staffing shortages. This number of errors should happen just under 2 times in 10,000, assuming a Poisson(0.8077).

The Normal: Flour

> local({
+   .x <- seq(53.355, 56.645, length.out=1000)  
+   plotDistr(.x, dnorm(.x, mean=55, sd=0.5), cdf=FALSE, xlab="x", 
+   ylab="Density", 
+   main=paste("Normal Distribution:  Mean=55, Standard deviation=0.5"))
+ })

plot of chunk unnamed-chunk-14

With continuous distributions, these tail membership issues vanish. Now we just need to make sure we ask for the correct things in the correct tails. What is the probability of a bag above 54.5 kg? Upper tail above 54.5 of a normal (55, 0.5).

> pnorm(c(54.5), mean=55, sd=0.5, lower.tail=FALSE)
[1] 0.8413

What about jelly? What is the probability of a noncompliant jar: outside of 15.95oz and 16.05oz?

First, the probability below 15.95. The errors that are too small/underfills.

> pnorm(c(15.95), mean=16.004, sd=0.028, lower.tail=TRUE)
[1] 0.02689

Now I need the errors that are overfilled. The overfills are in the upper tail above 16.05.

> pnorm(c(16.05), mean=16.004, sd=0.028, lower.tail=FALSE)
[1] 0.05021

Now I will just add the two above together to get a sum. This is the total probability of failures above and below.

> pnorm(c(15.95), mean=16.004, sd=0.028, lower.tail=TRUE)+pnorm(c(16.05), 
+   mean=16.004, sd=0.028, lower.tail=FALSE)
[1] 0.0771

The last question on the exercise asks about averaging over two jars as opposed to looking at one. From the lecture on Monday, the average has the same mean – as a distribution – but the standard deviation/standard error is smaller [divided by square root of the number of things averaged]. So averaging over two jars will have a smaller variance and standard deviation that the individual jars. In this example, the probability of an average over 2 falling outside of 15.95 to 16.05 drops by a huge amount.

> pnorm(c(15.95), mean=16.004, sd=0.028/sqrt(2), lower.tail=TRUE)+pnorm(c(16.05), 
+   mean=16.004, sd=0.028/sqrt(2), lower.tail=FALSE)
[1] 0.01327