Probability & Statistics for Linguists: Class 3(Jan 12, 2015)

Programming concepts

Here are the things that you absolutely need to have down cold:

Booleans and connectives (&& ‘and’; || ‘or’; == ‘is equal to’).
variables and assignment of values to variables.
conditionals (e.g., if (condition1) {...} else if (condition2) {...} else {...}).
for- and while-loops.
functions - how to define and work with.

Lots of freely available help on the web - e.g., Cookbook for R - if you need a refresher on any of these, or help figuring out their syntax in R.

Continuing from last time:

Precise and approximate predictions about data

Suppose you want to estimate the prevalence of a disease in a certain city. (Example from Hoff, A First Course in Bayesian Statistical Methods.) You select 30 people from the city at random and test them. Assume the test is perfectly accurate, and let the true rate of the disease be \(\pi\). How many of the people you test do you expect to have the disease?

If you’re really sampling randomly from the population, and the true rate of infection is \(\pi\), then the distribution of \(T^+\) - the number of positive test results in your sample - is \[ P(T^+|\pi) = {30 \choose T^+} \pi^{T^+} (1-\pi)^{30-T^+} \] In R:

total.sampled = 30
num.infected = 0:30 # vector listing possible numbers of positive test results in sample
true.rate = .3
binomial = function(total, diseased, proportion) {
  return(choose(total, diseased) * proportion^diseased * (1-proportion)^(total-diseased))
}
probs = sapply(num.infected, function(t.plus) {
    binomial(total.sampled, t.plus, proportion=true.rate)
  })
probs

##  [1] 2.253934e-05 2.897915e-04 1.800847e-03 7.203389e-03 2.083838e-02
##  [6] 4.643981e-02 8.292823e-02 1.218537e-01 1.501412e-01 1.572908e-01
## [11] 1.415617e-01 1.103078e-01 7.485173e-02 4.441751e-02 2.311524e-02
## [16] 1.056697e-02 4.245656e-03 1.498467e-03 4.638111e-04 1.255429e-04
## [21] 2.959225e-05 6.039234e-06 1.058827e-06 1.578375e-07 1.972969e-08
## [26] 2.029340e-09 1.672533e-10 1.061925e-11 4.876188e-13 1.441238e-14
## [31] 2.058911e-16

plot(num.infected, probs, pch=20, type='b', col='blue', xlab='Number infected in sample, out of 30', ylab='Probability that this number will be infected', main='')

On the other hand, we could have used sampling to get an approximate picture. There will be more on this below, but to preview, here’s a way to use rbinom() to randomly generate data from a specified distribution, which we can then inspect.

set.seed(0) # set a seed for random number generator, so we can precisely replicate simulations
rbinom(n=1, size=30, prob=.3) # a single data set we could have collected

## [1] 12

rbinom(n=1, size=30, prob=.3) # another one

## [1] 7

rbinom(n=20, size=30, prob=.3) # 20 such data sets

##  [1]  8  9 12  7 12 13 10 10  5  7  7 10  8 11  9 10 15  8 11 13

par(mfrow=c(2,3))
hist(rbinom(n=20, size=30, prob=.3), xlab='number infected', main='20 sims', xlim=c(0,30))
hist(rbinom(n=200, size=30, prob=.3), xlab='number infected', main='200 sims', xlim=c(0,30))
hist(rbinom(n=2000, size=30, prob=.3), xlab='number infected', main='2,000 sims', xlim=c(0,30))
hist(rbinom(n=20000, size=30, prob=.3), xlab='number infected', main='20,000 sims', xlim=c(0,30))
hist(rbinom(n=200000, size=30, prob=.3), xlab='number infected', main='200,000 sims', xlim=c(0,30))
two.million.samples = rbinom(n=2000000, size=30, prob=.3)
hist(two.million.samples, xlab='number infected', main='2,000,000 sims', xlim=c(0,30))

summary(two.million.samples)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   7.000   9.000   8.998  11.000  23.000

With a large number of simulated data sets, the histogram of the sample distribution gives a very close approximation to the shape of the true distribution (see above). However, notice that there are zero samples with \(T^+ > 23\) even in 2 million samples. This is not because \(P(T^+ > |\pi) = 0\), but simply because \(P(T^+ > 23)\) probability is so tiny that we didn’t sample enough to find any such samples! Specifically, this probability is less than \(2 \times 10^{-7}\), or 2 in 100 million.

which(num.infected > 23) # make sure you understand which() usage

## [1] 25 26 27 28 29 30 31

probs[which(num.infected > 23)]

## [1] 1.972969e-08 2.029340e-09 1.672533e-10 1.061925e-11 4.876188e-13
## [6] 1.441238e-14 2.058911e-16

sum(probs[which(num.infected > 23)])

## [1] 2.19374e-08

Cumulative probability example

Earlier we introduced the concept of cumulative probability by considering a situation where you want to flip a weighted coin until it comes up ‘heads’. Let the weight of the coin (its probability of coming up heads on a single toss) be \(\pi \in [0,1]\), and assume that tosses are all independent of each other. If \(Z\) is the random variable/question “How many flips until we see the first heads?”, we noted that the probability distribution on answers \(z\) was \[ P(z) = (1 - \pi)^{z-1} \times \pi \] since the coin has to come up tails \(z-1\) times, and each of these events has probability \(1 - \pi\); then it has to come up heads the last time, which happens with probability \(\pi\).

We then defined the cumulative probability: \[ \begin{aligned} F(z) =& \sum_{i \leq z} P(i)\\ =& \sum_{i \leq z} (1 - \pi)^{i-1} \times \pi \end{aligned} \]

First, create a variable with the value of the binomial parameter. This is that we can easily adapt the code to explore the effects of different values, without going through and changing it in a number of places (which would be both time-consuming and error-prone).

binomial.parameter = .1

Now let’s write a function that calculates the probability of stopping within \(z\) flips for, say, \(z \in \{1, 2, ..., 99, 100\}\). Pay close attention to R’s function syntax here - you’ll be doing a lot of this in coming weeks!

P = function(z) (1-binomial.parameter)^(z-1) * binomial.parameter

Note that I could also have written that function equivalently as follows:

P = function(z) {
  (1-binomial.parameter)^(z-1) * binomial.parameter
}

Or, even more verbosely, like this:

P = function(z) {
  return((1-binomial.parameter)^(z-1) * binomial.parameter)
}

In general, you have to surround more complex multiline functions with braces. Also, it never hurts to be explicit in marking out the return value using return().

Anyway, we can use R’s sapply() function to create a vector showing the result of applying P() to each of a vector of values. Here, we’ll try it out on all the numbers between 1 and 100, inclusive.

1:100

##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
##  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
##  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
##  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
##  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100

probs = sapply(1:100, FUN=P)
probs

##   [1] 1.000000e-01 9.000000e-02 8.100000e-02 7.290000e-02 6.561000e-02
##   [6] 5.904900e-02 5.314410e-02 4.782969e-02 4.304672e-02 3.874205e-02
##  [11] 3.486784e-02 3.138106e-02 2.824295e-02 2.541866e-02 2.287679e-02
##  [16] 2.058911e-02 1.853020e-02 1.667718e-02 1.500946e-02 1.350852e-02
##  [21] 1.215767e-02 1.094190e-02 9.847709e-03 8.862938e-03 7.976644e-03
##  [26] 7.178980e-03 6.461082e-03 5.814974e-03 5.233476e-03 4.710129e-03
##  [31] 4.239116e-03 3.815204e-03 3.433684e-03 3.090315e-03 2.781284e-03
##  [36] 2.503156e-03 2.252840e-03 2.027556e-03 1.824800e-03 1.642320e-03
##  [41] 1.478088e-03 1.330279e-03 1.197252e-03 1.077526e-03 9.697737e-04
##  [46] 8.727964e-04 7.855167e-04 7.069650e-04 6.362685e-04 5.726417e-04
##  [51] 5.153775e-04 4.638398e-04 4.174558e-04 3.757102e-04 3.381392e-04
##  [56] 3.043253e-04 2.738927e-04 2.465035e-04 2.218531e-04 1.996678e-04
##  [61] 1.797010e-04 1.617309e-04 1.455578e-04 1.310021e-04 1.179018e-04
##  [66] 1.061117e-04 9.550050e-05 8.595045e-05 7.735540e-05 6.961986e-05
##  [71] 6.265787e-05 5.639209e-05 5.075288e-05 4.567759e-05 4.110983e-05
##  [76] 3.699885e-05 3.329896e-05 2.996907e-05 2.697216e-05 2.427494e-05
##  [81] 2.184745e-05 1.966271e-05 1.769643e-05 1.592679e-05 1.433411e-05
##  [86] 1.290070e-05 1.161063e-05 1.044957e-05 9.404611e-06 8.464150e-06
##  [91] 7.617735e-06 6.855961e-06 6.170365e-06 5.553329e-06 4.997996e-06
##  [96] 4.498196e-06 4.048377e-06 3.643539e-06 3.279185e-06 2.951267e-06

Now check that these results are sane. For example, it should be pretty damn close to probability 1 that we’ll get a head within 100 flips, even with a coin heavily weighted to tails.

sum(probs)

## [1] 0.9999734

probs[1]

## [1] 0.1

sum(probs[1:5])

## [1] 0.40951

sum(probs[1:10])

## [1] 0.6513216

sum(probs[1:30])

## [1] 0.9576088

sum(probs[1:50])

## [1] 0.9948462

Now let’s write a function that calculates the cumulative probability of a value z, i.e., the probability that \(Z ≤ z\). Pay close attention to the for-loop syntax!

F.loop = function(z) {
  total = 0
  for (i in 1:z) {
    total = total + P(i)
  }
  return(total)
}

F.loop(1)

## [1] 0.1

F.loop(1) == P(1)

## [1] TRUE

F.loop(5)

## [1] 0.40951

F.loop(30) #etc.

## [1] 0.9576088

F.loop(30) == sum(probs[1:30])

## [1] FALSE

Wait, F.loop(30) and sum(probs[1:30]) weren’t equal. Did we make a mistake?!?

F.loop(30) - sum(probs[1:30])

## [1] 1.110223e-16

Nope: the difference is less than 10^-15. This is an issue created by the use of floating point numbers to approximate real numbers.

We could also have calculated the cumulative probability in recursive style:

F.recursive = function(z) {
  if (z == 0) 0
  else F.recursive(z-1) + P(z)
}
F.recursive(1)

## [1] 0.1

F.recursive(1) == P(1)

## [1] TRUE

F.recursive(5)

## [1] 0.40951

F.recursive(30)

## [1] 0.9576088

F.recursive(50)

## [1] 0.9948462

F.recursive(30) == sum(probs[1:30]) # floating-point issue again

## [1] FALSE

F.recursive(30) == F.loop(30) # Interestingly, this one works ...

## [1] TRUE

But actually, by far the best way to do this - at least if we’re only calculating the values once - would be to use R’s vector operations. The reason is that tail recursion is generally very slow, and loops are very slow in R, due to details of its implementation. We’ll use sapply() here, so let’s look more closely at how it works:

sapply(1:10, FUN=P)

##  [1] 0.10000000 0.09000000 0.08100000 0.07290000 0.06561000 0.05904900
##  [7] 0.05314410 0.04782969 0.04304672 0.03874205

vec = rep(NA, 10)
for (i in 1:10) {
    vec[i] = P(i)
}
vec

##  [1] 0.10000000 0.09000000 0.08100000 0.07290000 0.06561000 0.05904900
##  [7] 0.05314410 0.04782969 0.04304672 0.03874205

vec == sapply(1:10, FUN=P) # does same thing as sapply(), but more verbose and slower

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Now we write a function to calculate probability, this time using sapply() instead of a loop or a recursion.

F.vector = function(z) sum(sapply(1:z, FUN=P))
  # does the same thing as the above
F.vector(1)

## [1] 0.1

F.vector(5)

## [1] 0.40951

F.vector(30)

## [1] 0.9576088

F.vector(30) == F.loop(30)

## [1] FALSE

F.vector(30) - F.loop(30)

## [1] -1.110223e-16

Which is fastest?

system.time(F.loop(1000))

##    user  system elapsed 
##   0.002   0.000   0.001

system.time(F.recursive(1000))

##    user  system elapsed 
##   0.002   0.001   0.003

system.time(F.vector(1000))

##    user  system elapsed 
##   0.002   0.000   0.001

Here, the math involved is so trivial that it doesn’t matter much. In general, though, when using R you should always avoid loops and recursive operations whenever possible, opting for vector/matrix operations instead.

Note, however, that all of these functions are re-computing the probabilities of low values of z every time you call them. If you’re going to be computing with the same numbers repeatedly, the best thing to do is store them as a variable and have your function access that.

z.probs = sapply(1:10000, FUN=P)
F.cached = function(z) sum(z.probs[1:z])
F.cached(1)

## [1] 0.1

F.cached(5)

## [1] 0.40951

F.cached(30)

## [1] 0.9576088

F.cached(50)

## [1] 0.9948462

system.time(F.cached(1000))

##    user  system elapsed 
##       0       0       0

F.cached is the fastest of all.

Let’s make some plots. First, let’s plot the value of \(P(Z=z)\) for some reasonable range, say, 1 to 100:

upper.limit = 100
z.probs = sapply(1:upper.limit, P)
plot(1:upper.limit, z.probs, 
     xlab='z', ylab='P(z)', main='Probability', pch=20)

Now let’s plot \(F(z)\), i.e., \(\sum_{i \leq z} P(i)\):

z.cumulative = sapply(1:upper.limit, FUN=F.cached)
plot(1:upper.limit, z.cumulative,
     xlab='z', ylab='F(z)', main='Cumulative probability')

It would be nice to have those plots side-by-side in a single image. You can use par(mfrow=c(nrow,ncol)) for this purpose. (There are also lots of fancier plotting methods you may want to explore eventually, e.g., using the lattice or ggplot2 packages.)

par(mfrow=c(1,2)) # 1 row, 2 columns
plot(1:upper.limit, z.probs, 
     xlab='z', ylab='P(z)', main='Probability', pch=20)
plot(1:upper.limit, z.cumulative,
     xlab='z', ylab='F(z)', main='Cumulative probability', pch=20)

Simulation

When using simulations, always set a seed so that you can reproduce results precisely if needed.

set.seed(0)

Random number generation is what makes simulation possible and useful. Let’s generate some random floating point numbers in the \([0,1]\) interval:

runif(n=1,min=0,max=1)

## [1] 0.8966972

runif(1,0,1)

## [1] 0.2655087

runif(1)

## [1] 0.3721239

We get a different value each time, of course, but these functions are all doing the same thing: parameter labels are options, and 0 and 1 are predefines as default min and max for runif().

runif(200)

##   [1] 0.57285336 0.90820779 0.20168193 0.89838968 0.94467527 0.66079779
##   [7] 0.62911404 0.06178627 0.20597457 0.17655675 0.68702285 0.38410372
##  [13] 0.76984142 0.49769924 0.71761851 0.99190609 0.38003518 0.77744522
##  [19] 0.93470523 0.21214252 0.65167377 0.12555510 0.26722067 0.38611409
##  [25] 0.01339033 0.38238796 0.86969085 0.34034900 0.48208012 0.59956583
##  [31] 0.49354131 0.18621760 0.82737332 0.66846674 0.79423986 0.10794363
##  [37] 0.72371095 0.41127443 0.82094629 0.64706019 0.78293276 0.55303631
##  [43] 0.52971958 0.78935623 0.02333120 0.47723007 0.73231374 0.69273156
##  [49] 0.47761962 0.86120948 0.43809711 0.24479728 0.07067905 0.09946616
##  [55] 0.31627171 0.51863426 0.66200508 0.40683019 0.91287592 0.29360337
##  [61] 0.45906573 0.33239467 0.65087047 0.25801678 0.47854525 0.76631067
##  [67] 0.08424691 0.87532133 0.33907294 0.83944035 0.34668349 0.33377493
##  [73] 0.47635125 0.89219834 0.86433947 0.38998954 0.77732070 0.96061800
##  [79] 0.43465948 0.71251468 0.39999437 0.32535215 0.75708715 0.20269226
##  [85] 0.71112122 0.12169192 0.24548851 0.14330438 0.23962942 0.05893438
##  [91] 0.64228826 0.87626921 0.77891468 0.79730883 0.45527445 0.41008408
##  [97] 0.81087024 0.60493329 0.65472393 0.35319727 0.27026015 0.99268406
## [103] 0.63349326 0.21320814 0.12937235 0.47811803 0.92407447 0.59876097
## [109] 0.97617069 0.73179251 0.35672691 0.43147369 0.14821156 0.01307758
## [115] 0.71556607 0.10318424 0.44628435 0.64010105 0.99183862 0.49559358
## [121] 0.48434952 0.17344233 0.75482094 0.45389549 0.51116978 0.20754511
## [127] 0.22865814 0.59571200 0.57487220 0.07706438 0.03554058 0.64279549
## [133] 0.92861520 0.59809242 0.56090075 0.52602772 0.98509522 0.50764182
## [139] 0.68278808 0.60154122 0.23886868 0.25816593 0.72930962 0.45257083
## [145] 0.17512677 0.74669827 0.10498764 0.86454495 0.61464497 0.55715954
## [151] 0.32877732 0.45313145 0.50044097 0.18086636 0.52963060 0.07527575
## [157] 0.27775593 0.21269952 0.28479048 0.89509410 0.44623532 0.77998489
## [163] 0.88061903 0.41312421 0.06380848 0.33548749 0.72372595 0.33761533
## [169] 0.63041412 0.84061455 0.85613166 0.39135928 0.38049389 0.89544543
## [175] 0.64431576 0.74107865 0.60530345 0.90308161 0.29373016 0.19126011
## [181] 0.88645094 0.50333949 0.87705754 0.18919362 0.75810305 0.72449889
## [187] 0.94372482 0.54764659 0.71174387 0.38890510 0.10087313 0.92730209
## [193] 0.28323250 0.59057316 0.11036060 0.84050703 0.31796368 0.78285134
## [199] 0.26750821 0.21864528

Now we define a function flip(), which takes a coin with weight weight and returns either TRUE or FALSE. This has default argument weight = .5, so that if we call flip() it gives us the same thing as calling flip(weight=.5) or flip(.5).

flip = function(weight=.5) runif(1,0,1) < weight
flip(.5)

## [1] FALSE

flip()

## [1] TRUE

This is the same thing as calling rbinom() with n=1, size=1, and prob=.5:

rbinom(n=1, size=1, prob=.5)

## [1] 0

Suppose we know that someone flipped a .1-weighted coin 10 times, and want to know what would be a reasonable number of heads to expect. (Intuitions?)

n.flips = 10
flips = rep(NA, n.flips)# create a length-10 vector to store results
for (i in 1:10) {
  flips[i] = flip(.1)
}
flips

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Much faster, at least when doing complicated simulations, would be to use sapply(). Note that we have to give the function a dummy argument i in order to use sapply() here.

n.flips = 10
flips = sapply(1:n.flips, FUN=function(i) {
  return(flip(.1))
})
flips

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

length(which(flips == TRUE))

## [1] 0

length(which(flips == T)) # same thing

## [1] 0

length(which(flips)) # this too

## [1] 0

The vector-based method using sapply(), in particular, is useful when a sampling function isn’t predefined in R. But in this case, because the binomial distribution is so common, this sampling function comes predefined as rbinom(). Let’s call it with n=1, size=10, prob=.5:

rbinom(n=1, size=10, prob=.1)

## [1] 1

In fact rbinom() is even more powerful, since we can run many such simulations at once. Let’s simulate \(.1\)-weighted coin flips 10 times and look at the distribution of heads.

sims = rbinom(n=10, size=10, prob=.1)
table(sims)

## sims
## 0 1 2 
## 4 5 1

summary(sims)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     1.0     0.7     1.0     2.0

plot(table(sims), xlim=c(0,10), main='', xaxt='n')
axis(side=1, at=0:10)

Does the distribution change much if we make it 10000 sims instead of 10?

sims = rbinom(n=10000, size=10, prob=.1)
table(sims)

## sims
##    0    1    2    3    4    5    6 
## 3510 3791 1961  614  114    8    2

summary(sims)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   1.006   2.000   6.000

plot(table(sims), xlim=c(0,10), main='', xaxt='n')
axis(side=1, at=0:10)

What if we change the coin weight to .8?

par(mfrow=c(1,2))
plot(table(rbinom(n=10000, size=10, prob=.2)), 
     xlim=c(0,10), main='', xaxt='n', ylim=c(0, 10000), 
     xlab='number of heads', ylab='freq. in 10k')
axis(side=1, at=0:10)
plot(table(rbinom(n=10000, size=10, prob=.8)), 
     xlim=c(0,10), main='', xaxt='n', ylim=c(0, 10000), 
     xlab='number of heads', ylab='freq. in 10k')
axis(side=1, at=0:10)

Let’s try for a range of proportions in \([0,1]\).

par(mfrow=c(3,4)) 
proportions = seq(from=0, to=1, length.out = 11)
proportions

##  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

for (i in 1:length(proportions)) {
    plot(table(rbinom(n=10000, size=10, prob=proportions[i])), 
        xlim=c(0,10), main=paste('proportion =', proportions[i]), xaxt='n', ylim=c(0, 10000),
        xlab='number of heads', ylab='freq. in 10k')
    axis(side=1, at=0:10)
}

Continuous random variables

Continuous random variables are those whose possible values are some interval in the real line - frequently, but not always, \((-\infty,\infty)\), \([0,\infty)\), \((0,\infty)\), or \([0,1]\). Most of the complications arise because there are so many possible values that the probability that any particular value is the true one is generally zero, even though the probability that the value falls into some range \([a,b]\) will generally be non-zero. This is counter-intuitive, but hey, most everything about calculus is counter-intuitive. Our task here is to learn to live with it, and to develop intuitions about what it all means.

To avoid ambiguity, we use a different symbol - lower-case \(p\) - for the probability density functions (PDFs) associated with continuous random variables. Densities have a lot of funny properties, including that they can be greater than 1! But they actually have a surprisingly simple interpretation (perhaps familiar from a long-ago calculus course): \[ p(x) = \lim_{\epsilon \to 0} \frac{P(X \in [x, x + \epsilon])}{\epsilon} \] In other words, it’s the limit, as the interval you’re looking at above \(x\) gets infinitesimally small, of the ratio of two things: the probability that the true value of \(X\) is in this interval, and the width of the interval. This function has various nice properties, e.g.: \[ \begin{aligned} \mathbf{normalization:}\ &\int_{-\infty}^\infty\ p(x) dx = 1\\ \mathbf{additivity:}\ &\int_a^c\ p(x)dx = \int_a^b\ p(x)dx + \int_b^c\ p(x)dx \end{aligned} \] In general, you can gain - merely approximate, but helpful - intuitions about CRVs by imagining them as very, very fine grids on continuous intervals.

We can use normal probability-talk to talk about the probability that a CRV falls into a range. For example, imagine we have variable \(X \sim U(0,1)\) - one that is uniformly distributed over \([0,1]\). Then the cumulative probability \(F(a)\) is given as follows: \[ \begin{aligned} F(a) =& \int_{-\infty}^a\ p(x) dx\\ =& \begin{cases} 0 \text{ if } a < 0\\ a \text{ if } 0 \leq a \leq 1\\ 1 \text{ otherwise} \end{cases} \end{aligned} \] Thinking of the definite integral as describing the area under a curve, a picture should make it obvious why this is so.

In addition, for any \(a\) and \(b\) in \([0,1]\) with \(b > a\), the probability that \(X\) is in \([a,b]\) is just \(b - a\). This is because, under these conditions, \[ \begin{aligned} P(X \in [a,b]) =& F(b) - F(a)\\ =& b - a \end{aligned} \] Of course, \(P(X \in [a,b]) = 0\) if \(a\) and \(b\) are both less than 0, or both greater than 1.

In R, the family of uniform distribution functions is given by dunif() (Density), runif() (Random sampling), punif() (culmulative Probability), and qunif() (Quantiles). When these names evade you, type ?runif (or any other of these function names that you can remember) and R will give you a helpful reminder of all of them. If you can’t remember any, type ??uniform into the R console (note double question marks!), or ‘uniform distribution R’ into Google.

punif(.7, min=0, max=1)

## [1] 0.7

interval.probability = function(lower=a, upper=b) {
  if (lower > upper) {
    stop('undefined interval!')
  } else if (upper > 1 && lower < 0) {
    return(1)
  } else if (upper > 1) {
    return(1 - punif(lower))
  } else if (lower < 0) {
    return(punif(upper))
  } else {
    return(punif(upper) - punif(lower))
  }
}
interval.probability(.3, .7)

## [1] 0.4

interval.probability(-40, .6)

## [1] 0.6

interval.probability(.4, 14)

## [1] 0.6

Note that the fact that conditional statements are evaluated in a particular order is critical in getting the interval.probability() function to work correctly.