N Cooper 606 HW2

Problem 2.6

If you roll a pair of fair dice, what is the probability of

I play pen-and-paper role playing games so I’ll assume the text means a D6, of which 2D6 have 36 possibilities.

getting a sum of 1?

P(1) = 0, because the minimum roll is 2, one on each die.

getting a sum of 5?

P(5) = 4/36 = 1/9; there are two ways to get 1 and 4, two to get 2 and 3 any roll with a 5 or 6 will not result in a total of 5.

getting a sum of 12?

P(12) = 1/36; the only way to roll a 12 is two sixes.

Problem 2.8

The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English (foreign language) at home, and 4.2% fall into both categories

Are living below the poverty line and speaking a foreign language at home disjoint?

Disjoint is a term meaning mutually exclusive. No, being below povery line and speaking a foreign langauge at home are not disjoint since there is a 4.6% chance of being both.

Draw a Venn diagram summarizing the variables and their associated probabilities.

I used https://rstudio-pubs-static.s3.amazonaws.com/13301_6641d73cfac741a59c0a851feb99e98b.html

To help with the formatting.

# install.package("VennDiagram")
library(VennDiagram)

## Warning: package 'VennDiagram' was built under R version 3.4.1

## Loading required package: grid

## Loading required package: futile.logger

## Warning: package 'futile.logger' was built under R version 3.4.1

draw.pairwise.venn(area1 = 0.146, area2 = 0.207, cross.area = 0.042, category = c("Below Poverty Line", "Foreign Language Speaker"), fill = c("violet","orange"), alpha = rep(0.5, 2), cat.pos = c(0,0), cat.dist = rep(0.025 ,2))

## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], text[GRID.text.5], text[GRID.text.6], text[GRID.text.7], text[GRID.text.8], text[GRID.text.9])

What percent of Americans live below the poverty line and only speak English at home?

10.4 %

What percent of Americans live below the poverty line or speak a foreign language at home?

Use the addition rule P(A or B) = P(A) + P(B) - P(A and B) = 0.207 + 0.146 - 0.042 = 0.207 + 0.146 - 0.042 = 0.311 or 31.1%

What percent of Americans live above the poverty line and only speak English at home?

I created a probablity table:

eng_pov <- matrix(c(0.104, 0.042, 0.146, 0.689, 0.165, 0.854, 0.793, 0.207, 1.00), nrow = 3, ncol = 3)
rownames(eng_pov) <- c("English", "Not English", "Marginal Prob")
colnames(eng_pov) <- c("Poverty", "Not Poverty", "Marginal Prob")
eng_pov <- data.frame(eng_pov)
eng_pov

##               Poverty Not.Poverty Marginal.Prob
## English         0.104       0.689         0.793
## Not English     0.042       0.165         0.207
## Marginal Prob   0.146       0.854         1.000

The marginal probablity for Poverty and Not English are given, their complitments are 1-P. The joint probablity was also given for Poverty and Not English. Each Column and Row has to add to the Marginal Probablity for that column or row and the Marginal Probablity have to add to 1. So according to the table 68.9 % are not in poverty and only speak English at home.

Is the event that someone lives below the poverty line independent of the event that the person speaks a foreign language at home?

If they are independent then: \[ P(A \& B) = P(A) x P(B) \] \[ P(A\&B) = 0.165\] \[P(A) x P(B) = 0.146 * 0.207 = 0.030222\]

From above we see that the multiplication rule for independence is not true, so living below poverty and speaking a foreign language at home are not independent events.

Problem 2.20

Assortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 204 Scandinavian men and their female partners. The table below summarizes the results. For simplicity, we only include heterosexual relationships in this exercise.

First thing I am going to do is adapt the code I wrote above to this problem, and let R do all the conversions form counts to probablities by dividing by 204.

eye_color <- matrix(c(78/204, 19/204, 11/204, 108/204, 23/204, 23/204, 9/204, 55/204, 13/204,12/204,16/204,41/204,114/204,54/204,36/204,204/204), nrow = 4, ncol = 4)
rownames(eye_color) <- c("Blue", "Brown", "Green", "Total")
colnames(eye_color) <- c("Blue", "Brown", "Green", "Total")
eye_color <- data.frame(eye_color)
eye_color

##             Blue      Brown      Green     Total
## Blue  0.38235294 0.11274510 0.06372549 0.5588235
## Brown 0.09313725 0.11274510 0.05882353 0.2647059
## Green 0.05392157 0.04411765 0.07843137 0.1764706
## Total 0.52941176 0.26960784 0.20098039 1.0000000

What is the probability that a randomly chosen male respondent or his partner has blue eyes?

\[P(A or B) = P(A) + P(B) - P (A\&B) \]

From the table:

\[P(A or B) = 0.52941176 + 0.5588235 - 0.38235294 = 0.7058823\]

There is a 70.6% change either a male or female partner has blue eyes.

What is the probability that a randomly chosen male respondent with blue eyes has a partner with blue eyes?

This is read directly from the table 0.38235294 or a 38.2% chance both a male and a female partner have blue eyes.

What is the probability that a randomly chosen male respondent with brown eyes has a partner with blue eyes? What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes?

This is read directly from the table first for male with brown eyes and female with blue is 0.09313725 or 9.3%, then male wih green eyes and female with blue eyes is 0.05392157 or 5.4%.

Does it appear that the eye colors of male respondents and their partners are independent? Explain your reasoning.

As with the English speaker vs Poverty problem we can use the Independent Probabilty multiplaction rule:

\[P(A \& B) = P(A) x P(B) \]

For Blue and Blue:

\[ P(A\&B) = 0.38235294\]

\[P(A) x P(B) = 0.52941176 * 0.5588235 = 0.2958477\]

Not Independent.

For Blue and Green:

\[ P(A\&B) = 0.05392157\]

\[P(A) x P(B) = 0.52941176 * 0.1764706 = 0.09342561\]

Not Independent.

For Blue and Brown:

\[ P(A\&B) = 0.09313725\]

\[P(A) x P(B) = 0.52941176 * 0.2647059 = 0.1401384\]

Not Independent.

For Brown and Green:

\[ P(A\&B) = 0.04411765\]

\[P(A) x P(B) = 0.26960784 * 0.1764706 = 0.04757786\]

Close, but Not Independent.

Brown and Brown:

\[ P(A\&B) = 0.11274510\]

\[P(A) x P(B) = 0.26960784 * 0.2647059 = 0.07136679\]

Not Independent.

For Green and Green:

\[ P(A\&B) = 0.07843137\]

\[P(A) x P(B) = 0.20098039 * 0.1764706 = 0.03546713\]

Not Independent.

It appears that these selections are not indepedent of each other.

Problem 2.30

The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.

Again I am going to laod the table from the book into R using the code I wrote above:

book <- matrix(c(13/95, 15/95, 28/95, 59/95, 8/95, 67/95, 72/95, 23/95, 95/95), nrow = 3, ncol = 3)
rownames(book) <- c("Fiction", "Nonfiction", "Total")
colnames(book) <- c("Hardcover", "Paperback", "Total")
book <- data.frame(book)
book

##            Hardcover  Paperback     Total
## Fiction    0.1368421 0.62105263 0.7578947
## Nonfiction 0.1578947 0.08421053 0.2421053
## Total      0.2947368 0.70526316 1.0000000

Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.

These are independent events so the overall probablity is the product of the individual probabilties.

\[P = (28/95)*(59/94) = 0.1849944\]

18.5%

Determine the probability of drawing a fiction book first and then a hardcover book second, when drawing without replacement.

Since you can have a hardcover fiction book we need to find the probablity of hardcover given fiction.

\[P(H|F) = P(H \& F) / P(F) = 0.1368421/0.7578947 = 0.1805556 \]

In this scenerio we have an 81.9445% chance that the draw is

\[P = (72/95)*(28/94) = 0.2257559 \]

And an 18.0555% chance it is

\[P = (72/95)*(27/94) = 0.2176932\]

Overall this is

\[P = 0.819445*0.2257559 + 0.180555*0.2176932 = 0.2243001\]

22.4%

In the first draw there is a 18% chance that the book will be hardcover given that it is fiction. This will effect the second draw.

Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.

Since the second book is placed back on the self, it no longer affects the second draw, and all books are available to draw.

\[P = (72/95)x(28/95) = 0.2233795\]

22.3%

The final answers to parts (b) and (c) are very similar. Explain why this is the case.

If you note dividing by 95 will result in a smaller number than dividing by 94 scenerio (c) will have a smaller probabilty than (b), however in scenerio (b) we have to factor in the slighty lower probablity event that the first book was both hardcover and fiction. When taking this into account it reduces the overall probabilty of (b) and by coincidence makes it similar to (C). If the books where different numbers fof hardcovers, fiction, etc. this might not be the case.

Problem 2.38

An airline charges the following baggage fees:$25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.

Build a probability model, compute the average revenue per passenger, and compute the corresponding standard deviation.

bags <- matrix(c(0.54,0.34,0.12), nrow = 1, ncol = 3)
rownames(bags) <- c("Probabilty")
colnames(bags) <- c("$0", "1bag=$25", "2bag=$35")
bags

##              $0 1bag=$25 2bag=$35
## Probabilty 0.54     0.34     0.12

\[E(X) = 0*0.54+25*0.34+35*0.12 = 12.7 \]

$12.7 per passenger.

\[Var(X) = (0-12.7)^2*0.54 + (25-12.7)^2*0.34+(35-12.7)^2*0.12 = 198.21\]

\[SD(X) = \sqrt{Var(X)} = \sqrt{198.21} = 14.07871\]

About how much revenue should the airline expect for a flight of 120 passengers? With what standard deviation? Note any assumptions you make and if you think they are justified.

\[120*E(X) = 120*12.7 = 1524\]

$1524 for a flight of 120 passengers on average.

\[120*SD(X) = 120*14.07871 = 1689.445\]

A standard deviation of $1689.45 for 120 passengers. In reality it is going to be less than this since $0 is the lowest they can make on luggage this constrains the lower limit of the spread.

Problem 2.44

The relative frequency table below displays the distribution of annual total personal income (in 2009 inflation-adjusted dollars) for a representative sample of 96,420,486 Americans. These data come from the American Community Survey for 2005-2009. This sample is comprised of 59% males and 41% females.

Describe the distribution of total personal income.

income <- c(0.022,0.047,0.158,0.183,0.212,0.139,0.058,0.084,0.097)
barplot(income)

The distribution is bimodal with a peak in the $35,000-$49,999 bracket and a second peak in the >$100,000 bracket.

What is the probability that a randomly chosen US resident makes less than $50,000 per year?

\[P = 0.022+0.047+0.158+0.183+0.212 = 0.622\]

62.2%

What is the probability that a randomly chosen US resident makes less than $50,000 per year and is female? Note any assumptions you make.

I am going to make the unsafe simpifying assumption that income is independent of sex. In reality the odds of a female making less than a male are greater.

\[P = 0.622*0.41 = 0.25502\]

This is really a lower bound and the actual number is going to be higher.

The same data source indicates that 71.8% of females make less than $50,000 per year. Use this value to determine whether or not the assumption you made in part (c) is valid.

eng_pov <- matrix(c(0.328, 0.294, 0.622, 0.262, 0.116, 0.378, 0.59, 0.41, 1.00), nrow = 3, ncol = 3)
rownames(eng_pov) <- c("Male", "Female", "Marginal Prob")
colnames(eng_pov) <- c(" less 50K", " more 50K", "Marginal Prob")
eng_pov <- data.frame(eng_pov)
eng_pov

##               X.less.50K X.more.50K Marginal.Prob
## Male               0.328      0.262          0.59
## Female             0.294      0.116          0.41
## Marginal Prob      0.622      0.378          1.00

With the data we are given we can see that is 71.8% of females make less than 50,000 this means that only 55.6% of males make less than 50,000 to get the overall percentage of 62.2%, so as I stated above it was not a good assumption.

\[P = .328/.59 = 0.556\]

N Cooper 606 HW2

Nathan Cooper

September 10, 2017

Problem 2.6

Problem 2.8

Problem 2.20

Problem 2.30

Problem 2.38

Problem 2.44