STA 308: LAB FIELDWORK FOR SURVEY METHODS & SAMPLING THEORY

A STEP TO THE CONCEPT OF PROBABILITY DISTRIBUTION IN SURVEY

##Introduction:

Probability theory is the foundation of statistics, and R has plenty of machinery for working with probability, probability distributions, and random variables. The examples in this topic will show you how to calculate probabilities from quantiles, calculate quantiles from probabilities, generate random variables drawn from distributions, plot distributions,and so forth

####Names of Distributions####

R has an abbreviated name for every probability distribution. This name is used to identify the functions associated with the distribution. For example, the name of the Normal distribution is “norm”, which, for instance, is the root of these function names:

dnorm stands for Normal density pnorm stands for Normal distribution function qnorm stands for Normal quantile function rnorm stands for Normal random variates

As time goes on, we shall discuss applications of some of these distributions in details. Let us deal with the following sub-topics under PROBABILITY:

1 Counting the Number of Combinations 2 Generating Combinations 3 Generating Random Numbers 4 Generating Reproducible Random Numbers 5 Generating a Random Sample 6 Generating Random Sequences 7 Randomly Permuting a Vector 8 Calculating Probabilities for Discrete Distributions 9 Calculating Probabilities for Continuous Distributions 10 Converting Probabilities to Quantiles 11 Plotting a Density Function

NUMBER 1: COUNTING THE NUMBER OF COMBINATIONS

Problem: If you want to calculate the number of combinations of n items taken r at a time, then we can think of this topic.

Solution: Use the choose function stated below:

    choose(n, r)

Example 1:

How many ways can we select 3 items from 5 items?

#Here, we are discussing Combination. Do you remember the formula for computing Combination? I am sure you did. Now, while you calculate 5C3 manually, the code for it is:

choose(5,3)

## [1] 10

The answer is 10. Am I right?

Therefore, since the answer is 10, it means there are 10 ways from which 3 items could be selected out of 5.

#Example 2: How many ways can four boys be chosen from 7 boys?

#Solution: Here, n=7, r=4. Thus, the command is:

choose(7,4)

## [1] 35

#The answer is 35, meaning that there are 35 ways.

Hope you are getting the same result with me?

#Example 3: How many ways can 30 girls be selected from 50 girls?

###Solution:#### Note: Some questions can NEVER be done manually unless you explore the application of computer software. Here, we have the code for the question.

choose(50,30)

## [1] 4.712921e+13

#With me here, my answer is 4.712921e+13 . Hope you are on the same page with me? Now, We can all see that such result could not be possible to obtain by manual computation. So we all need computer software applications in solving problems arising from day-to-day activities.

###Exercises:### 1. Compute 14C8 (14 Combination 8) 2. In how many ways can 6 guys be selected from 11 guys? 3. In how many ways can four boys and three girls be selected from 6 boys and 9 girls? 4. In how many ways can 21 men and 23 women be chosen from 100 men and 87 women?

Meanwhile, let’s move ahead to the next topic …

NUMBER 2: GENERATING COMBINATIONS

This topic is quite different from the first one. Here, if we want to generate all combinations of n items taken k at a time, then Use the ‘combn’ function. The command is ….

combn(items, k)

We can use combn(1:5,3) to generate all combinations of the numbers 1 through 5 taken three at a time. What we are saying here is that when we want to generate numbers between 1, 2,3,4, and 5, but three of these numbers will be taken at a time. The code to use is:

combn(1:5, 3)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    1    1    1    1    1    2    2    2     3
## [2,]    2    2    2    3    3    4    3    3    4     4
## [3,]    3    4    5    4    5    5    4    5    5     5

Note that 1:5 means 1,2,3,4,5

Note that the answers here should be considered in block, that is, 123, 124, 125, etc.

If we want to do this manually, we can say that since we have 5 numbers from where 3 numbers are to be taken at a time, then we can first of all find the number of ways in which 3 numbers can be selected from 5 numbers, which is (5 Combination 3), then run this code below:

5C3

The answer for 5C3 is 10. Right?

Now, it is obvious that we will have 10 different combinations, right? So we can begin to obtain these manually by….

123, 124, 125, 134, 135, 145, 234, 235, 245, 345

That is, from 12345, we will combine three numbers at a time such that 123, 124, 125, 234, and so on. Hope we got it?

Yeah, could you remind me of this type of analysis in Experimental Design? Well, it is used in the Construction of Balanced Incomplete Block Design (BIBD), when we are to construct or obtain treatment combinations. It is known that we are not dealing with Experimental Designs here, but we can easily branch to show the areas of application of what we are doing.

However, recall there are three basic methods to construct BIBD, namely: - Unreduced Method; - Cyclic Method - Lattice Method

We can’t go further than this because we are dealing with PROBABILITY IN SURVEY not EXPERIMENTAL DESIGNS.

Hope we all understand this kind of scenario when we are to select 3 numbers at a time out of five numbers.

Now, can we think of selecting four numbers out of six, then the code is:

combn(1:6,4)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]    1    1    1    1    1    1    1    1    1     1     2     2     2     2
## [2,]    2    2    2    2    2    2    3    3    3     4     3     3     3     4
## [3,]    3    3    3    4    4    5    4    4    5     5     4     4     5     5
## [4,]    4    5    6    5    6    6    5    6    6     6     5     6     6     6
##      [,15]
## [1,]     3
## [2,]     4
## [3,]     5
## [4,]     6

Look at the results obtained…

Further examples could be as follows: If we are interested in selecting three letters from five letters, say ABCDE. Use the code:

combn(c("A", "B", "C", "D", "E"), 3)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] "A"  "A"  "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  
## [2,] "B"  "B"  "B"  "C"  "C"  "D"  "C"  "C"  "D"  "D"  
## [3,] "C"  "D"  "E"  "D"  "E"  "E"  "D"  "E"  "E"  "E"

Here, what we are saying is that we want to select three letters simltaneously at a time from five letters, that is, ABC, ABD, ABE, and so on. From the above results, we will consider our answers in block form in such a way that we have ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE

We will have 10 different combinations because 5C3 gives us 10.

EXERCISES 1. Construct four letters at a time from ABCDEF 2. Construct five letters at a time from BCDEGHILN 3. Construct six letters at a time from ABCDEFGH

Let’s move to the next topic….

## GENERATING RANDOM NUMBERS (SIMULATION OF NUMBERS) FROM PROBABILITY DISTRIBUTIONS/DENSITIES

  ##### PROBABILITY DISTRIBUTIONS/DENSITIES - DISCRETE TYPE #####

This is one of the sensitive application areas of R package. R can generate random numbers from many probability distributions, be it discrete or continuous. If you want to generate or simulate numbers randomly, it advisable to set the direction from where you want your numbers to come. For instance, I usually use Nigeria code such as 234 to set my seed. But you have the right to choose any other number of your choice, if you are not compelled to set your seed to be any number. So, it is always advisable to set your seed before starting the generation of numbers. Look at the code below:

set.seed(234)

What it means is that you are using a particular direction from which your data could be simulated/generated.

Well, we will be picking the distribution one by one as being classified either discrete or continuous. Let’s start from Binomial Distribution. Numbers can be generated from each of these distributions.

BERNOULLI DISTRIBUTION

Details The Bernoulli distribution with prob = p has density

P(x) = p^x (1-p)^(1-x)

for x = 0 or 1.

If an element of x is not 0 or 1, the result of dbern is zero, without a warning.

The quantile is defined as the smallest value x such that F(x) = p, where F is the distribution function.

Before making use of this distribution, let’s load the package below:

# install.packages('Rlab')

After that, then load it as follows:

library(Rlab)

## Rlab 4.0 attached.

## 
## Attaching package: 'Rlab'

## The following objects are masked from 'package:stats':
## 
##     dexp, dgamma, dweibull, pexp, pgamma, pweibull, qexp, qgamma,
##     qweibull, rexp, rgamma, rweibull

## The following object is masked from 'package:datasets':
## 
##     precip

Description

Density, distribution function, quantile function and random generation for the Bernoulli distribution with parameter prob. are stated as follows:

dbern(x, prob, log = FALSE)

pbern(q, prob, lower.tail = TRUE, log.p = FALSE)

qbern(p, prob, lower.tail = TRUE, log.p = FALSE)

rbern(n, prob)

Meanings of the Arguments:

x, q: vector of quantiles.

p: vector of probabilities.

n: number of observations. If length(n) > 1, the length is taken to be the number required.

prob: probability of success on each trial.

log, log.p: logical; if TRUE, probabilities p are given as log(p).

lower.tail: logical; if TRUE (default), probabilities are P[X <= x], otherwise, P[X > x].

Number 1:

The function ‘dbern’ is used to obtain the exact probability using Bernoulli distribution, i.e. P(X=x).

The syntax is dbern(x, prob, log = FALSE)

Example:

Compute P(X=1) for X Bernoulli(0.45)

dbern(1, 0.45)     ## Answer = 0.45

## [1] 0.45

Alternatively, we can code it as follows:

dbern(x=1, prob=0.45, log = FALSE)

## [1] 0.45

Look at another example:

dbern(x=1, prob=0.45, log = TRUE)

## [1] -0.7985077

## This means that log of probability value has been taken.

Number 2:

The function ‘pbern’ is used to obtain the cummulative probability using Bernoulli distribution, i.e. P(X<=x). # The syntax is pbern(q, prob, lower.tail = TRUE, log.p = FALSE)

Example:

Compute P(X<=1) for X follows Bernoulli(0.68)

Solution is:

pbern(q=1, prob=0.68)       ## Answer = 1

## [1] 1

pbern(q=1, prob=0.68, lower.tail=TRUE)              ## Answer = 1

## [1] 1

pbern(q=1, prob=0.68, lower.tail=TRUE, log.p=TRUE)       ## Answer = 0 {This means P(X>x)}

## [1] 0

pbern(q=1, prob=0.68, lower.tail=FALSE)               ## Answer = 0 (This means P(X>x))

## [1] 0

Note:

lower.tail means logical; if TRUE (default), probabilities are P[X <= x], otherwise, P[X > x]. log.p means logical; if TRUE, probabilities p are given as log(p).

Number 3:

The function ‘qbern’ is used to obtain the particular quantile, say 70th quantile using Bernoulli distribution.

The syntax is qbern(p, prob, lower.tail = TRUE, log.p = FALSE)

Example:

Compute the 60th quantile of Bernoulli(0.22)

Solution:

qbern(p=0.60, prob=0.22, lower.tail = TRUE, log.p = FALSE)

## [1] 0

Number 4:

The function ‘rbern’ is used to generate (simulate) Bernoulli pseudorandom numbers.

The code is rbern(n, prob)

Example:

Simulate 20 Bernoulli pseudorandom number with probability of 0.58, setting seed to be 100.

set.seed(7845)
rbern(n=20, prob=0.58)

##  [1] 0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 1 1

## BINOMIAL DISTRIBUTION ##

Number 1:

The function ‘dbinom’ is used to obtain the exact probability using Binomial distribution, i.e. P(X=x).

The syntax is dbinom(x,n,p)

Example:

Compute the probability of getting four heads in a six tossess of a coin.

Here, n=6,x=4,p=0.5, the code is:

dbinom(4,6,0.5)

## [1] 0.234375

#What about this question below?

In an examination, 25 students sat for Maths exam. The probability of passing the exam is 0.34, what is the probability that: (a) 4 students will pass? (b) 8 students will pass?

Solutions:

(a)The code is:

dbinom(4,25,0.34)

## [1] 0.02744283

For this, the command is:

dbinom(8,25,0.34)

## [1] 0.1652475

Number 2:

The function ‘pbinom’ is used to obtain the cummulative probability using Binomial distribution, i.e. P(X<=x).

The syntax is pbinom(x,n,p)

Example:

Compute the probability of getting at most four heads in a six tossess of a coin.

Here, n=6,x=0,1,2,3,4,p=0.5, the code is:

pbinom(4,6,0.5)

## [1] 0.890625

Look at this question again, very similar with the previous but NOT the same….

In an examination, 25 students sat for Maths exam. The probability of passing the exam is 0.34, what is the probability that: (a) at least 4 students will pass? (b) at least 8 students will pass? (c) at most 4 students will pass? (d) at most 18 students will pass?

Solution to (a) alone:

1-pbinom(3,25,0.34)

## [1] 0.9874344

You can solve the rest questions, right?

Number 3:

The function ‘qbinom’ is used to obtain the particular quantile, say 89th quantile using Binomial distribution.

The syntax is qbinom(x,n,p)

Example:

Compute the 60th quantile when four heads are obtained in a six tossess of a coin.

Here, n=6,x=0.60,p=0.5, the code is:

qbinom(0.60,6,0.5)

## [1] 3

Number 4:

The function ‘rbinom’ is used to generate (simulate) binomial pseudorandom numbers.

The code is rbinom(x,n,p)

Example:

Out 30 students, 10 failed. The probability of failure is 0.65. Generate the scores of those who failed the exam.

(Set seed to be 234)

Here, x=10, n=30, p=0.65

set.seed(234)
rbinom(10,30,0.65)

##  [1] 18 17 25 18 23 19 16 18 16 21

Example:

set.seed(234) q=rbinom(15,18,0.75)

Finding some basic statistics such as:

mean(q) var(q) sd(q) IQR(q) table(q) barplot(q) hist(q) qqnorm(q) qqline(q) mad(q) median(q) round(rank(q),0) range(q) order(q)

set.seed(234)
q=rbinom(15,18,0.75)

mean(q)

## [1] 13.4

var(q)

## [1] 4.542857

sd(q)

## [1] 2.131398

IQR(q)

## [1] 2

table(q)

## q
## 11 12 13 15 16 17 18 
##  2  4  5  1  1  1  1

barplot(q)

hist(q)

qqnorm(q)
qqline(q)

mad(q)

## [1] 1.4826

median(q)

## [1] 13

round(rank(q),0)

##  [1]  4  4 14  4 13  9  2  4  2 12  9  9  9  9 15

range(q)

## [1] 11 18

order(q)

##  [1]  7  9  1  2  4  8  6 11 12 13 14 10  5  3 15

PRACTICAL QUESTION: ##Set your seed to be 1234 in the R-environment, simulate 20 binomial pseudorandom numbers with parameters 16 and 0.65, assigning them as vector K. #(i) Find the mean of K
#(ii) Find the variance of K
#(iii)Find the standard deviation of K
#(iv) Find the interquartile range of K
#(v) Convert K into table, draw a beautiful bar chart of K, with labeling

SOLUTION:

set.seed(1234)
K=rbinom(20,16,0.65)        
mean(K)

## [1] 10.75

var(K)

## [1] 3.039474

sd(K)

## [1] 1.743409

IQR(K)

## [1] 2

C = table(K)             
                    

barplot(C, main="BAR CHART", col.main="blue", xlab="Simulated Data", col.lab=2, col=c(10:19), ylab="Frequency", sub="Figure I", col.sub="green")

barplot(C, main="BAR CHART\n( n = 20, p = 0.65)", col.main="blue", xlab="Simulated Data", col.lab=2, col=c(10:19), ylab="Frequency", sub="Figure I", col.sub="green")

POISSON DISTRIBUTION >##Number 1: # The function ‘dpois’ is used to obtain the exact probability using Poisson distribution, i.e. P(X=x). # The syntax is dpois(x,lambda) # Example: # According to Poisson model, the probability of five arrivals at an automatic bank teller in the next minute, where the average number of arrivals per minute is 0.45, is: # Here, x=5,lambda=0.45, the code is:

dpois(x=5, lambda=0.45) #(Answer = 9.805027e-05)

## [1] 9.805027e-05

Number 2:

The function ‘ppois’ is used to obtain the cummulative probability using Poisson distribution, i.e. P(X<=x).

The syntax is ppois(x,lambda)

Example:

According to Poisson model, the probability of at most three arrivals at an automatic bank teller in the next

minute, where the average number of arrivals per minute is 0.45, is: # Here, x=0,1,2,3, lambda=0.45, the code is:

ppois(3,lambda=0.45)

## [1] 0.9988046

Number 3: # The function ‘qpois’ is used to obtain the particular quantile, say 89th quantile using Poisson distribution. # The syntax is qpois(x,lambda) # Example:

qpois(0, lambda=.45)

## [1] 0

Number 4:

The function ‘rpois’ is used to generate (simulate) Poisson pseudorandom numbers.

The code is rpois(n,lambda)

Example:

Suppose traffic accidents occur at an intersection with an average rate of 6 per year.

Simulate the annual number of accidents for a period of 15 years.(Set seed to be 234) # Here, n=15, lambda=6

set.seed(234)
rpois(15,6)

##  [1]  8  8  2  8  3  7 10  7 10  4  6  6  6  6  0

Example:

set.seed(234) u=rpois(18,3) # Finding some basic statistics such as: mean(u) var(u) sd(u) IQR(u) table(u) barplot(u) hist(u) qqnorm(u) qqline(u) mad(u) median(u) round(rank(u),0) range(u) order(u)

set.seed(234)
u=rpois(18,3)
# Finding some basic statistics such as:
mean(u)

## [1] 3.055556

var(u)

## [1] 2.761438

sd(u)

## [1] 1.661757

IQR(u)

## [1] 1.75

table(u)

## u
## 0 1 2 3 4 6 
## 2 1 2 6 5 2

barplot(u)

hist(u)

qqnorm(u)
qqline(u)

mad(u)

## [1] 1.4826

median(u)

## [1] 3

round(rank(u),0)

##  [1] 14 14  2 14  3  8 18 14 18  4  8  8  8  8  2  8  4 14

range(u)

## [1] 0 6

order(u)

##  [1]  3 15  5 10 17  6 11 12 13 14 16  1  2  4  8 18  7  9

PRACTICAL QUESTION

##Suppose traffic accidents occur at an intersection with a mean of 3.5 per year. If you set seed to be 1234 in the R-environment, perform the following statistical tasks: #(i) simulate the annual number of accidents for a 20-year period, assuming a Poisson model (2 Marks) #(ii) Suppose the simulated data is represented by H, compute the average number of accidents within the stipulated period, correct to one decimal place (2 Marks) #(iii) Using your representation in (c)(ii) above, calculate the mean absolute deviation of H, correct to zero decimal place (2 Marks) #(iv) By converting H into tabular form, obtain a colourful bar chart, with labeling (3 Marks)

##SOLUTION:

set.seed(1234)
H=rpois(20,3.5)
H               ## Answer = 1 4 4 4 6 4 0 2 4 3 4 4 2 6 2 5 2 2 2 2

##  [1] 1 4 4 4 6 4 0 2 4 3 4 4 2 6 2 5 2 2 2 2

round(mean(H), 1)       ## Answer = 3.1

## [1] 3.1

mad(H)          ## Answer = 2.2239

## [1] 2.2239

A=table(H)          ## Answer = 0   1   2   3   4   5   6 
                ##      1   1   7   1   7   1   2
barplot(A, main="BAR CHART", col.main="purple", xlab="Simulated Data", col.lab=5, col=c(1:9), ylab="Frequency", sub="Figure II", col.sub="blue")

GEOMETRIC DISTRIBUTION

Description Density, distribution function, quantile function and random generation for the geometric distribution with parameter prob.

Usage:

dgeom(x, prob, log = FALSE)

pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)

qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)

rgeom(n, prob)

Arguments:

x, q: vector of quantiles representing the number of failures in a sequence of Bernoulli trials before success occurs.

p: vector of probabilities.

n: number of observations. If length(n) > 1, the length is taken to be the number required.

prob: probability of success in each trial. 0 < prob <= 1.

log, log.p: logical; if TRUE, probabilities p are given as log(p).

lower.tail: logical; if TRUE (default), probabilities are P[X = x], otherwise, P[X > x].

Details

The geometric distribution with prob = p has density

p(x) = p (1-p)^x

for x = 0, 1, 2, …, 0 < p = 1.

If an element of x is not integer, the result of dgeom is zero, with a warning.

The quantile is defined as the smallest value x such that F(x) = p, where F is the distribution function.

Number 1:

The function ‘dgeom’ is used to obtain the exact probability using Geometric distribution, i.e. P(X=x).

The syntax is dgeom(x, prob, log = FALSE)

dgeom(x=5, prob=0.65)   #(Answer = 0.003413922)

## [1] 0.003413922

dgeom(x=5, prob=0.65, log=FALSE)    #(Answer = 0.003413922)

## [1] 0.003413922

dgeom(x=5, prob=0.65, log=TRUE) #(Answer = -5.679894)

## [1] -5.679894

# That means we have taken log

Number 2:

The function ‘pgeom’ is used to obtain the cummulative probability using Geometric distribution, i.e. P(X<=q).

The syntax is pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)

Example:

pgeom(q=5,prob=0.65)

## [1] 0.9981617

Number 3:

The function ‘qgeom’ is used to obtain the particular quantile, say 90th quantile using Poisson distribution.

The syntax is qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)

Example:

qgeom(p=0.40,prob=0.55)

## [1] 0

Number 4:

The function ‘rgeom’ is used to generate (simulate) Geometric pseudo-random numbers.

The code is rgeom(n,prob)

Example:

set.seed(234) u=rgeom(n=15,prob=0.6) print(u)

set.seed(234)
u=rgeom(n=15,prob=0.6)
print(u)

##  [1] 1 4 2 0 0 0 4 1 2 0 0 0 1 0 2

# Finding some basic statistics such as:
mean(u)

## [1] 1.133333

var(u)

## [1] 1.980952

sd(u)

## [1] 1.407463

IQR(u)

## [1] 2

table(u)

## u
## 0 1 2 4 
## 7 3 3 2

barplot(u)

hist(u)

qqnorm(u)
qqline(u,lwd=3)

mad(u)

## [1] 1.4826

median(u)

## [1] 1

round(rank(u),0)

##  [1]  9 14 12  4  4  4 14  9 12  4  4  4  9  4 12

range(u)

## [1] 0 4

order(u)

##  [1]  4  5  6 10 11 12 14  1  8 13  3  9 15  2  7

HYPERGEOMETRIC DISTRIBUTION

Hypergeometric Distribution in R Programming

Hypergeometric Distribution in R Language is defined as a method that is used to calculate probabilities when sampling without replacement is to be done in order to get the density value.

In R, there are 4 built-in functions to generate Hypergeometric Distribution:

dhyper() dhyper(x, m, n, k)

phyper() phyper(x, m, n, k)

qhyper() qhyper(x, m, n, k)

rhyper() rhyper(N, m, n, k) where,

x: represents the data set of values m: size of the population n: number of samples drawn k: number of items in the population N: hypergeometrically distributed values

Number 1: Functions To Generate Hypergeometric Distribution dhyper() Function: It is defined as Hypergeometric Density Distribution used in order to get the density value.

Syntax:

dhyper(x, m, n, k) Example 1:

# Specify x-values for dhyper function
x = seq(10, 86, by = 2)
print(x)    # This is used to detect the set of values

##  [1] 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
## [26] 60 62 64 66 68 70 72 74 76 78 80 82 84 86

length(x)

## [1] 39

Apply dhyper function

y <- dhyper(x, m = 45, n = 30, k = 20)   
print(y)

##  [1] 1.193389e-01 2.095829e-01 1.233665e-01 2.206362e-02 9.293320e-04
##  [6] 3.946710e-06 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [11] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [16] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [21] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [26] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [31] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [36] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00

Number 2: phyper() Function Hypergeometric Cumulative Distribution Function used estimating the number of faults initially resident in a program at the beginning of the test or debugging process based on the hypergeometric distribution and calculate each value in x using the corresponding values.

Syntax:

phyper(x, m, n, k) Example:

Specify x-values for phyper function

x_phyper <- seq(0, 22, by = 1)    
y_phyper <- phyper(x_phyper, m = 40, n = 20, k = 31)  
print(y_phyper)

##  [1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
##  [6] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [11] 0.000000e+00 2.019930e-08 9.964987e-07 2.097309e-05 2.521307e-04
## [16] 1.954992e-03 1.046930e-02 4.051979e-02 1.173155e-01 2.618127e-01
## [21] 4.641088e-01 6.760380e-01 8.424287e-01

Number 3: qhyper() Function It is basically Hypergeometric Quantile Function used to specify a sequence of probabilities between 0 and 1.

Syntax:

qhyper(x, m, n, k) Example:

Specify x-values for qhyper function

x_qhyper <- seq(0, 1, by = 0.02)        
y_qhyper <- qhyper(x_qhyper, m = 49, n = 18, k = 30)    
print(y_qhyper)

##  [1] 12 18 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21 22 22 22 22
## [26] 22 22 22 22 22 22 23 23 23 23 23 23 23 23 23 23 24 24 24 24 24 24 25 25 26
## [51] 30

Number 4: rhyper() Function It generally refers to generating random numbers function by specifying a seed and sample size.

Syntax:

rhyper(x, m, n, k) Example:

# Set seed for reproducibility
# Specify sample size
set.seed(400)                                 
N <- 10000   

  
# Draw N hypergeometrically distributed values
y_rhyper <- rhyper(N, m = 50, n = 20, k = 30) 
#y_rhyper         
#print(y_rhyper)
  
# Plot of randomly drawn hyper density
hist(y_rhyper,                                          
     breaks = 50,
     main = "")

NEGATIVE BINOMIAL DISTRIBUTION ## ## Number 1: # The function ‘dnbinom’ is used to obtain the direct/exact probability using Negative Binomial distribution, i.e. P(X=x). # The syntax is dnbinom(x,n,p) # Example:

# Here, n=6,x=4,p=0.75, the code is:
dnbinom(4,6,0.75)   #(Answer = 0.0875988)

## [1] 0.0875988

Number 2:

The function ‘pnbinom’ is used to obtain the cummulative probability using Negative Binomial distribution, i.e. P(X<=x).

The syntax is pnbinom(x,n,p)

Example:

Here, n=6,x=0,1,2,3,4,p=0.87, the code is:

pnbinom(4,6,0.87)

## [1] 0.9947033

Number 3:

The function ‘qnbinom’ is used to obtain the particular quantile, say 60th quantile using Negative Binomial distribution.

The syntax is qnbinom(x,n,p)

Example:

Here, n=6,x=0.60,p=0.5, the code is:

qnbinom(0.60,6,0.5) #(Answer = 6)

## [1] 6

Number 4:

The function ‘rnbinom’ is used to generate (simulate) Negative Binomial pseudo-random numbers.

The code is rnbinom(n,x,p)

Example:

#(Set seed to be 234) # Here, n=25, x=12, p=0.35

set.seed(234)
k=rnbinom(25,12,0.35)
k

##  [1] 39 23 15 24 23 18 19 27 20 10 18 24 21 33 11 12 32 33 13 24 15 18 12 26 26

# Finding some basic statistics such as:
mean(k)

## [1] 21.44

var(k)

## [1] 58.34

sd(k)

## [1] 7.638063

IQR(k)

## [1] 11

table(k)

## k
## 10 11 12 13 15 18 19 20 21 23 24 26 27 32 33 39 
##  1  1  2  1  2  3  1  1  1  2  3  2  1  1  2  1

barplot(k)

hist(k)

qqnorm(k)
qqline(k)

mad(k)

## [1] 7.413

median(k)

## [1] 21

round(rank(k),0)

##  [1] 25 14  6 17 14  9 11 21 12  1  9 17 13 24  2  4 22 24  5 17  6  9  4 20 20

range(k)

## [1] 10 39

order(k)

##  [1] 10 15 16 23 19  3 21  6 11 22  7  9 13  2  5  4 12 20 24 25  8 17 14 18  1

SIMULATION OF SURVEY DATASETS - PART 2

Developed by Timothy A. OGUNLEYE

June - August, 2022

STA 308: LAB FIELDWORK FOR SURVEY METHODS & SAMPLING THEORY

A STEP TO THE CONCEPT OF PROBABILITY DISTRIBUTION IN SURVEY

NUMBER 2: GENERATING COMBINATIONS

Description

Number 1:

The function ‘dbern’ is used to obtain the exact probability using Bernoulli distribution, i.e. P(X=x).

The syntax is dbern(x, prob, log = FALSE)

Example:

Compute P(X=1) for X Bernoulli(0.45)

Note:

Number 4:

The function ‘rbern’ is used to generate (simulate) Bernoulli pseudorandom numbers.

The code is rbern(n, prob)

Example:

Number 1:

The function ‘dbinom’ is used to obtain the exact probability using Binomial distribution, i.e. P(X=x).

The syntax is dbinom(x,n,p)

Example:

Compute the probability of getting four heads in a six tossess of a coin.

Here, n=6,x=4,p=0.5, the code is:

Example:

Number 2:

The function ‘ppois’ is used to obtain the cummulative probability using Poisson distribution, i.e. P(X<=x).

The syntax is ppois(x,lambda)

Example:

According to Poisson model, the probability of at most three arrivals at an automatic bank teller in the next

Number 4:

The function ‘rpois’ is used to generate (simulate) Poisson pseudorandom numbers.

The code is rpois(n,lambda)

Example:

Suppose traffic accidents occur at an intersection with an average rate of 6 per year.

Example:

PRACTICAL QUESTION

GEOMETRIC DISTRIBUTION

Number 1:

The function ‘dgeom’ is used to obtain the exact probability using Geometric distribution, i.e. P(X=x).

The syntax is dgeom(x, prob, log = FALSE)

Number 2:

The function ‘pgeom’ is used to obtain the cummulative probability using Geometric distribution, i.e. P(X<=q).

The syntax is pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)

Number 3:

The function ‘qgeom’ is used to obtain the particular quantile, say 90th quantile using Poisson distribution.

The syntax is qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)

Example:

Number 4:

The function ‘rgeom’ is used to generate (simulate) Geometric pseudo-random numbers.

The code is rgeom(n,prob)

Example:

Number 2:

The function ‘pnbinom’ is used to obtain the cummulative probability using Negative Binomial distribution, i.e. P(X<=x).

The syntax is pnbinom(x,n,p)

Example:

Here, n=6,x=0,1,2,3,4,p=0.87, the code is:

Number 3:

The function ‘qnbinom’ is used to obtain the particular quantile, say 60th quantile using Negative Binomial distribution.

The syntax is qnbinom(x,n,p)

Example:

Here, n=6,x=0.60,p=0.5, the code is:

Number 4:

The function ‘rnbinom’ is used to generate (simulate) Negative Binomial pseudo-random numbers.

The code is rnbinom(n,x,p)

Example: