Set up workspace

Exercise 2.6

If you roll a pair of fair dice, what is the probability of:

(a) getting a sum of 1?

Minimum possible value is 2. 0% chance.

(b) getting a sum of 5?

P(1, 4) = (1/6)^2
P(2, 3) = (1/6)^2
P(3, 2) = (1/6)^2
P(4, 1) = (1/6)^2

4*((1/6)^2)
## [1] 0.1111111

About 11% chance, or 4/36.

(c) getting a sum of 12?

P(6, 6) = (1/6)^2

(1/6)^2
## [1] 0.02777778

About 3% chance, or 2/36.

Exercise 2.7

The American Community Survey … estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English (foreign language) at home, and 4.2% fall into both categories.

(a) Are living below the poverty line and speaking a foreign language at home disjoint?

No. 4.2% are associated with both categories.

(b) Draw a Venn diagram summarizing the variables and their associated probabilities.
grid.newpage()
draw.pairwise.venn(area1 = 14.6, area2 = 20.7, cross.area = 4.2, category = c("Poverty", 
    "Second Language"))

## (polygon[GRID.polygon.11], polygon[GRID.polygon.12], polygon[GRID.polygon.13], polygon[GRID.polygon.14], text[GRID.text.15], text[GRID.text.16], text[GRID.text.17], text[GRID.text.18], text[GRID.text.19])
(c) What percent of Americans live below the poverty line and only speak English at home?

10.4%

(d) What percent of Americans live below the poverty line or speak a foreign language at home?
14.6 + 20.7 - 4.2
## [1] 31.1
(e) What percent of Americans live above the poverty line and only speak English at home?
100 - 31.1
## [1] 68.9
(f) Is the event that someone lives below the poverty line independent of the event that the person speaks a foreign language at home?

P(poverty) * p(second language) = 0.030222

But this does not equal intersection (.042), so these events are not independent of one another.

Exercise 2.20

(a) What is the probability that a randomly chosen male respondent or his partner has blue eyes?
(114 + 108 - 78)/204
## [1] 0.7058824
(b) What is the probability that a randomly chosen male respondent with blue eyes has a partner with blue eyes?
78/114
## [1] 0.6842105
(c) What is the probability that a randomly chosen male respondent with brown eyes has a partner with blue eyes?
19/54
## [1] 0.3518519
What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes?
11/36
## [1] 0.3055556
(d) Does it appear that the eye colors of male respondents and their partners are independent? Explain your reasoning.

If independent, the probability of both events occurring would be the same as the probababilty of one occurring times the other occurring. It does seem the probably of the intersection is slightly higher than the general probabilities: p(male blue eyes): 114/204.
p(partner has blue eyes): 108/204.
p(intersection): (70/204), or 0.3431373.
Which is slightly higher than (114/204)(108/204), or 0.2958478. So, at least for blue eyes, they do not seem to be independent.

Exercise 2.30

(a) Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.
(28/95)*(67/94)
## [1] 0.2100784
(b) Determine the probability of drawing a fiction book first and then a hardcover book second, when drawing without replacement.

Two ways… first book could be either fiction paperback or fiction hardcover:

(59/95)*(28/94) + (13/95)*(27/94)
## [1] 0.2243001
(c) Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.
(72/95)*(28/95)
## [1] 0.2233795
(d) The final answers to parts (b) and (c) are very similar. Explain why this is the case

The majority of the books are fiction. So, only removing one of these initially doesn’t change the next step’s overall probability by very much. As more items are removed, a larger dependent effect would be observed.

Exercise 2.38

An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.

(a) Build a probability model, compute the average revenue per passenger, and compute the corresponding standard deviation.

Expected values:

.54*(0) + .34*(25) + .12*(60)
## [1] 15.7

Expected value of $15.7

Variance:

((0 - 15.7)^2)*(.54) + ((25 - 15.7)^2)*(.34) + ((60 - 15.7)^2)*(.12)
## [1] 398.01

Standard deviation is the square root of variance:

round((398.01)^(1/2), digits = 2)
## [1] 19.95
(b) About how much revenue should the airline expect for a flight of 120 passengers? With what standard deviation? Note any assumptions you make and if you think they are justified.

Assuming the number of passengers doesn’t affect the expected percentages checking 0, 1, or 2 pieces of luggage, the expected revenue

15.7*120
## [1] 1884

About $1884

Standard deviation of revenue for all the passengers is: sqrt(120 * 398.01)

(120*398.01)^(1/2)
## [1] 218.5434

This formula assumes the number of bags is independent of the number of passengers. If more people tend to fly at Christmas (with a lot of extra things in there bag), this may not be a valid assumption.

Exercise 2.44

(a) Describe the distribution of total personal income.

Bimodal distrubution, with the main peak at 35K-49,999K, and a second small peak at 100K+.

(b) What is the probability that a randomly chosen US resident makes less than $50,000 per year?
2.2 + 4.7 + 15.8 + 18.3 + 21.2
## [1] 62.2

About 62%

(c) What is the probability that a randomly chosen US resident makes less than $50,000 per year and is female? Note any assumptions you make.

Assuming gender and income are independent (which they are not), the probability would be the following:

.62*.41
## [1] 0.2542

If a female was more likely to earn below 50K than a male, there were be a slightly higher liklihood that a randomly chosen person had both of these attributes.

(d) The same data source indicates that 71.8% of females make less than $50,000 per year. Use this value to determine whether or not the assumption you made in part (c) is valid.

P( <50 | F) = P(<50 & F)/P(F) = .25/.41 = ~62% which is lower than the 71.8%.