R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

4 Probability

4.1 Introduction to Probability

Classical Method of Assigning Probabilities

This method involves an experiment, which is a process that produces outcomes, and an event, which is an outcome of an experiment. When we assign probabilities using the classical method, the probability of an individual event occurring is determined as the ratio of the number of items in a population containing the event (\(n_e\)) to the total number of items in the population (N). That is, \(P(E) = {\dfrac{n_e}{N}}\). For example, if a company has 200 workers and 70 are female, the probability of randomly selecting a female from this company is 70/200 = .35.

Relative Frequency of Occurrence

of assigning probabilities is based on cumulated historical data. With this method, the probability of an event occurring is equal to the number of times the event has occurred in the past divided by the total number of opportunities for the event to have occurred.

Subjective Probability

of assigning probability is based on the feelings or insights of the person determining the probability. Subjective probability comes from the person’s intuition or reasoning. Although not a scientific approach to probability, the subjective method often is based on the accumulation of knowledge, understanding, and experience stored and processed in the human mind. At times it is merely a guess. At other times, subjective probability can potentially yield accurate probabilities. Subjective probability can be used to capitalize on the background of experienced workers and managers in decision making.

4.1 Structure of Probability

Experiment

As previously stated, an experiment is a process that produces outcomes. Examples of business-oriented experiments with outcomes that can be statistically analyzed might include the following.

•Interviewing 20 randomly selected consumers and asking them which brand of appliance they prefer
•Sampling every 200th bottle of ketchup from an assembly line and weighing the contents
•Testing new pharmaceutical drugs on samples of cancer patients and measuring the patients' improvement
•Auditing every 10th account to detect any errors
•Recording the Dow Jones Industrial Average on the first Monday of every month for 10 years

Event

Because an event is an outcome of an experiment, the experiment defines the possibilities of the event. If the experiment is to sample five bottles coming off a production line, an event could be to get one defective and four good bottles. In an experiment to roll a die, one event could be to roll an even number and another event could be to roll a number greater than two.

Events are denoted by uppercase letters; italic capital letters (e.g., A and \(E_1\), \(E_2\), …) represent the general or abstract case, and roman capital letters (e.g., H and T for heads and tails) denote specific things and people.

Elementary Events

Events that cannot be decomposed or broken down into other events are called elementary events.

Elementary events are denoted by lowercase letters (e.g., \(e_1\), \(e_2\), \(e_3\), …). Suppose the experiment is to roll a die.

The elementary events for this experiment are to roll a 1 or roll a 2 or roll a 3, and so on. Rolling an even number is an event, but it is not an elementary event because the even number can be broken down further into events 2, 4, and 6.

Sample Space

A sample space is a complete roster or listing of all elementary events for an experiment.

Unions and Intersections

Set notation, the use of braces to group numbers, is used as a symbolic tool for unions and intersections in this chapter.

The union of X, Y is formed by combining elements from each of the sets and is denoted \(X\cup Y\), can be translated to “X or Y.”

The intersection of X, Y is denoted \(X\cap Y\), can be translated to “X and Y”

Mutually Exclusive Events

Two or more events are mutually exclusive events if the occurrence of one event precludes the occurrence of the other event(s). This characteristic means that mutually exclusive events cannot occur simultaneously and therefore can have no intersection.

Independent Events

Two or more events are independent events if the occurrence or nonoccurrence of one of the events does not affect the occurrence or nonoccurrence of the other event(s).

Collectively Exhaustive Events

A list of collectively exhaustive events contains all possible elementary events for an experiment.

Complementary Events

\(X'\) is the complement of event \(X\), pronounced not \(X\).

Counting the Possibilities

The mn Counting Rule

For an operation that can be done m ways and a second operation that can be done n ways, the two operations then can occur, in order, in \(mn\) ways. This rule can be extended to cases with three or more operations.

Sampling from a Population with Replacement

In the second counting method, sampling n items from a population of size N with replacement would provide \(N^n\) possibilities.

Combinations: Sampling from a Population Without Replacement

e third counting method uses combinations, sampling n items from a population of size N without replacement provides \(_NC_n={N \choose n}=\dfrac{N!}{n!(N-n)!}\)

4.3 Marginal, Unoin, Joint, and Conditional Probabilities

  • Marginal ~: \(P(E)\)
  • Unoin ~: \(P(E_1 \cup E_2)\)
  • Joint ~: \(P(E_1 \cap E_2)\)
  • Conditional ~: \(P(E_1|E_2)\)

4.4 Additon Laws

General Law of addition

\(P(X \cup Y)=P(X)+P(Y)-P(X \cap Y)\)

Joint Probability Tables

Table to be added

Complement of a Union

\(P(! (X \cup Y))=P(!X \cap !Y)=1-P(X\cup Y)\)

Special Law of Addition

If X, Y are mutually exclusive, \(P(X\cup Y)=P(X)+P(Y)\)

4.9

D E F
A 5 8 12
B 10 6 4
C 8 2 5
D <- c(5,10,8)
E <- c(8,6,2)
F <- c(12,4,5)

df = data.frame(D, E, F)
row.names(df) <- c("A", "B", "C")
df
##    D E  F
## A  5 8 12
## B 10 6  4
## C  8 2  5

a.

\(P(A\cup D)\)

(sum(df["D"]) + sum(df["A",]) - df["A", "D"]) / sum(df)
## [1] 0.7166667

b.

\(P(E\cup B)\)

(sum(df["E"]) + sum(df["D"])) / sum(df)
## [1] 0.65

c.

\(P(D\cup E)\)

(sum(df["E"]) + sum(df["B",]) - df["B", "E"]) / sum(df)
## [1] 0.5

d.

\(P(C\cup F)\)

(sum(df["F"]) + sum(df["C",]) - df["C", "F"]) / sum(df)
## [1] 0.5166667

4.11

According to a survey conducted by Netpop Research, 65% of new car buyers use online search engines as part of their car-buying experience. Another study reported that 11% of new car buyers skip the test drive. Suppose 7% of new car buyers use online search engines as part of their car-buying experience and skip the test drive. If a new car buyer is randomly selected, what is the probability that:

p1 = .65
p2 = .11
p.1n2 = .07

a.

the buyer used an online search engine as part of the car-buying experience or skipped the test drive?

p1+p2-p.1n2
## [1] 0.69

b.

the buyer did not use an online search engine as part of the car-buying experience or did skip the test drive?

1-p1+p.1n2
## [1] 0.42

c.

the buyer used an online search engine as part of the car-buying experience or did not skip the test drive?

1-p2+p.1n2
## [1] 0.96

4.13

According to Nielsen Media Research, approximately 86% of all U.S. households have High-definition television (HDTV). In addition, 49% of all U.S. households own Digital Video Recorders (DVR). Suppose 40% of all U.S. households have HDTV and have DVR. A U.S. household is randomly selected.

p1=.86
p2=.49
p.1n2=.4

a.

What is the probability that the household has HDTV or has DVR?

p1+p2-p.1n2
## [1] 0.95

b.

What is the probability that the household does not have HDTV or does have DVR?

1-p1+p.1n2
## [1] 0.54

c.

What is the probability that the household does have HDTV or does not have DVR?

1-p2+p.1n2
## [1] 0.91

d.

What is the probability that the household does not have HDTV or does not have DVR?

1-p.1n2
## [1] 0.6

4.5 Multiplication Laws

General Law of Multiplication

\(P(X\cap Y)=P(X)\cdot P(Y|X)=P(Y)\cdot P(X|Y)\)

Special Law of Multiplication

If X, Y are independent, \(P(X\cap Y) = P(X)\cdot P(Y)\)

4.15

Use the values in the cross-tabulation table to solve the equations given.

C <- c(5,2)
D <- c(11,3)
E <- c(16,5)
F <- c(8,7)

df = data.frame(C, D, E, F)
row.names(df) <- c("A", "B")
df
##   C  D  E F
## A 5 11 16 8
## B 2  3  5 7

a.

\(P(A\cap E)\)

df["A", "E"]/sum(df)
## [1] 0.2807018

b.

\(P(D\cap B)\)

df["B", "D"]/sum(df)
## [1] 0.05263158

c.

\(P(D\cap E)\)

0
## [1] 0

d.

\(P(A\cap B)\)

0
## [1] 0

4.17

a.

A batch of 50 parts contains six defects. If two parts are drawn randomly one at a time without replacement, what is the probability that both parts are defective?

p1=6/50 # .12
p1*(5/49)
## [1] 0.0122449

b.

If this experiment is repeated, with replacement, what is the probability that both parts are defective?

p1^2
## [1] 0.0144

4.19

A study by Peter D. Hart Research Associates for the Nasdaq Stock Market revealed that 43% of all American adults are stockholders. In addition, the study determined that 75% of all American adult stockholders have some college education. Suppose 37% of all American adults have some college education. An American adult is randomly selected.

p.stock=.43;p.college.on.stock=.75;p.college=.37

a.

What is the probability that the adult owns stock and has some college education? \(P(Stock\cap College)=P(Stock)\cdot P(College|Stock)\)

p.stock.and.college = p.stock*p.college.on.stock
p.stock.and.college
## [1] 0.3225

b.

What is the probability that the adult owns no stock and has some college education?

p.no.stock.no.college = p.college - p.stock.and.college
p.no.stock.no.college
## [1] 0.0475

c.

What is the probability that the adult owns stock and has no college education?

p.stock.on.college = p.stock - p.stock.and.college
p.stock.on.college
## [1] 0.1075

d.

What is the probability that the adult neither owns stock nor has some college education?

1 - (p.stock + p.college - p.stock.and.college)
## [1] 0.5225

4.21

A study by Becker Associates, a San Diego travel consultant, found that 30% of the traveling public said that their flight selections are influenced by perceptions of airline safety. Thirty-nine percent of the traveling public wants to know the age of the aircraft. Suppose 87% of the traveling public who say that their flight selections are influenced by perceptions of airline safety wants to know the age of the aircraft.

p.safety = .3
p.age = .39
p.age.when.safety = .87

a.

What is the probability of randomly selecting a member of the traveling public and finding out that she says that flight selection is influenced by perceptions of airline safety and she does not want to know the age of the aircraft?

p.no_age.and.safety = (1 - p.age.when.safety) * p.safety
p.no_age.and.safety
## [1] 0.039

b.

What is the probability of randomly selecting a member of the traveling public and finding out that she says that flight selection is neither influenced by perceptions of airline safety nor does she want to know the age of the aircraft?

p.age.and.safety = p.age.when.safety * p.safety
p.no_age.no_safety = 1 - p.age - p.safety + p.age.and.safety
p.no_age.no_safety
## [1] 0.571

c.

What is the probability of randomly selecting a member of the traveling public and finding out that he says that flight selection is not influenced by perceptions of airline safety and he wants to know the age of the aircraft?

p.no_safety = 1 - p.safety
p.age.and.no_safety = p.age - p.age.and.safety
p.age.and.no_safety
## [1] 0.129

4.6 Conditional Probability

Law of Conditional Probability

\(P(X|Y)=\dfrac{P(X\cap Y)}{P(Y)}=\dfrac{P(X)\cdot P(Y|X)}{P(Y)}\)

4.23

Use the values in the cross-tabulation table to solve the equations given.

E <- c(15, 11, 21, 18)
F <- c(12, 17, 32, 13)
G <- c( 8, 19, 27, 12)

df = data.frame(E, F, G)
row.names(df) <- c("A", "B", "C", "D")
df
##    E  F  G
## A 15 12  8
## B 11 17 19
## C 21 32 27
## D 18 13 12

a.

\(P(G|A)\)

df["A", "G"]/sum(df["A",])
## [1] 0.2285714

b.

\(P(B|F)\)

df["B", "F"]/sum(df["F"])
## [1] 0.2297297

c.

\(P(C|E)\)

df["C", "E"]/sum(df["E"])
## [1] 0.3230769

d.

\(P(E|G)\)

0
## [1] 0

###f 4.27 A national survey of small-business owners was conducted to determine the challenges for growth for their businesses. The top challenge, selected by 46% of the small-business owners, was the economy. A close second was finding qualified workers (37%). Suppose 15% of the small-business owners selected both the economy and finding qualified workers as challenges for growth. A small-business owner is randomly selected.

a.

What is the probability that the owner believes the economy is a challenge for growth if the owner believes that finding qualified workers is a challenge for growth? (.4054)

b.

What is the probability that the owner believes that finding qualified workers is a challenge for growth if the owner believes that the economy is a challenge for growth? (.3261)

c.

Given that the owner does not select the economy as a challenge for growth, what is the probability that the owner believes that finding qualified workers is a challenge for growth? (.4074)

d.

What is the probability that the owner believes neither that the economy is a challenge for growth nor that finding qualified workers is a challenge for growth? (.32)

4.29

According to a survey of restaurant owners in the U.S. by Must-HaveMenus, 77% of restaurant owners believe that they need to use social media as a marketing tool. A different survey by the National Restaurant owners revealed that 80% of restaurant owners started their careers at entry-level positions. Suppose that 83% of restaurant owners who started their careers at entry-level positions believe that they need to use social media as a marketing tool. Assuming that these percentages apply to all restaurant owners, if a restaurant owner is randomly selected,

a.

What is the probability that the owner does believe that he/she needs to use social media as a marketing tool and did start his/her career at an entry-level position? (.664)

b.

What is the probability that an owner either believes that he/she needs to use social media as a marketing tool or he/she did start his/her career at an entry-level position? (.906)

c.

What is the probability that the owner does not believe that he/she needs to use social media as a marketing tool given that he/she did start his/her career at an entry-level position? (.17)

d.

What is the probability that the owner does believe that he/she needs to use social media as a marketing tool given that he/she did not start his/her career at an entry-level position? (.53)

e.

What is the probability that the owner did not start his/her career at an entry-level position given that he/she does not believe he/she needs to use social media as a marketing tool? (.4087)

4.7 Revision of Probabilities: Bayes’ Rule

Bayes’ Rule

\(P(X_i|Y)=\dfrac{P(X_i)\cdot P(Y|X_i)}{P(X_1)\cdot P(Y|X_1)+P(X_2)\cdot P(Y|X_2)+\cdots+P(X_n)\cdot P(Y|X_n)}=\dfrac{P(X_i)\cdot P(Y|X_i)}{P(Y)}\)

4.31

In a manufacturing plant, machine A produces 10% of a certain product, machine B produces 40% of this product, and machine C produces 50% of this product. Five percent of machine A products are defective, 12% of machine B products are defective, and 8% of machine C products are defective. The company inspector has just sampled a product from this plant and has found it to be defective. Determine the revised probabilities that the sampled product was produced by machine A, machine B, or machine C. (.0538, .5161, .4301)

4.33

In a small town, two lawn companies fertilize lawns during the summer. Tri-State Lawn Service has 72% of the market. Thirty percent of the lawns fertilized by Tri-State could be rated as very healthy one month after service. Greenchem has the other 28% of the market. Twenty percent of the lawns fertilized by Greenchem could be rated as very healthy one month after service. A lawn that has been treated with fertilizer by one of these companies within the last month is selected randomly. If the lawn is rated as very healthy, what are the revised probabilities that Tri-State or Greenchem treated the lawn? (.7941, .2059)

Chapter 5 Discrete Distributions

5.1 Discrete Versus Continuous Distributions

5.2 Describing a Discrete Distribution

Mean, Variance, and Standard Deviation of Discrete Distributions

Mean or Expected Value of a Discrete Distribution

\(\mu=E(x)=\sum[x\cdot P(x)]\)

Variance and Standard Deviation of a Discrete Distribution

Variance of a discrete distribution

\(\rho^2=\sum[(x-\mu)^2\cdot P(x)]\)

Standard Deviation of a Discrete Distribution

\(\rho=\sqrt{\sum[(x-\mu)^2\cdot P(x)]}\)

5.3 Binomial Distribution

  • Video 1
  • Video 2
  • Applet

References https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Binomial.html http://www.r-tutor.com/elementary-statistics/probability-distributions/binomial-distribution

5.15

In the past few years, outsourcing overseas has become more frequently used than ever before by U.S. companies. However, outsourcing is not without problems. A recent survey by Purchasing indicates that 20% of the companies that outsource overseas use a consultant. Suppose 15 companies that outsource overseas are randomly selected.

n <- 15
p <- .2

a.

What is the probability that exactly five companies that outsource overseas use a consultant? ##### manual calculation

count_combn <- function(n, x){
  return(dim(combn(n, x))[2])
}

x <- 5

dbinom.wly <- function(n, x, p){#wly: Weilong You
  count_combn(n, x) * p^x * (1-p)^(n-x)
} 

dbinom.wly(n, x, p)
## [1] 0.1031823
use r function
dbinom(x, size=n, prob=p)
## [1] 0.1031823

b.

What is the probability that more than nine companies that outsource overseas use a consultant? ##### My own function

pbinom.wly <- function(n, x, p, lower.tail=TRUE){
  if (lower.tail) {
    nums2calc <- seq(0, x)
  }
  else {
    nums2calc <- seq(x+1, n)
  }
  
  prob.list <- lapply(nums2calc, dbinom.wly, n=n, p=p)
  prob.list.de_dim <- simplify2array(prob.list)
  return(sum(prob.list.de_dim))
}

pbinom.wly(n, x, p, lower.tail=FALSE)
## [1] 0.06105143
r function
x <- 9
pbinom(x, size=n, prob=p, lower.tail=FALSE)
## [1] 0.0001132257

c.

What is the probability that none of the companies that outsource overseas use a consultant? ##### my function

dbinom.wly(n, 0, p)
## [1] 0.03518437
r function
dbinom(0, size=n, prob=p)
## [1] 0.03518437

d.

What is the probability that between four and seven (inclusive) companies that outsource overseas use a consultant? ##### my function

a <- 4-1
b <- 7
pbinom.wly(n, b, p) - pbinom.wly(n, a, p)
## [1] 0.3475981
r function
pbinom(b, size=n, prob=p) - pbinom(a, size=n, prob=p)
## [1] 0.3475981

e.

Construct a graph for this binomial distribution. In light of the graph and the expected value, explain why the probability results from parts (a) through (d) were obtained.

1. Each bar indicates the probability of certain number of companies using outsourcing service
2. More than 9 companies using outsourcing service meaning there could be 10, 11,... up to 15 companies using outsourcing service, there for the result is the sum of bar of 10 until 15
3. simimarly, between 4 and 7 compnaies (inclusive) using outsourcing service would mean sum of the value of bar 4 to 7
x <- seq(0, n)
y <- dbinom(x, n, p)
barplot(y, x)

5.4 Poisson Distribution

Poisson Formula

\(P(x)=\dfrac{\lambda^xe^{-\lambda}}{x!}\)

5.5 Hypergeometric Distribution

In summary, the hypergeometric distribution should be used instead of the binomial distribution when the following conditions are present:

1. Sampling is being done without replacement.
2. n ≥ 5%N.

Hypergeometric Formula

\(P(x)=\dfrac{AC_x\cdot _{N-A}C_{n-x}}{_NC_n}\)