2.6 Dice rolls. If you roll a pair of fair dice, what is the probability of

getting a sum of 1?

ANS: The minimum sum from rolling a pair of fair dice is 2. Since 1 is not a possible outcome, the probability is 0.
getting a sum of 5?

ANS : dice1 –>1,2,3,4 dice2 –>4,3,2,1

4 possible outcomes with a sum of 5 out of a total of 6x6 possible outcomes. =4/36 =1/9
getting a sum of 12?

ANS: 1/36

2.8 Poverty and language.

The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English (foreign language) at home, and 4.2% fall into both categories.59

(a) Are living below the poverty line and speaking a foreign language at home disjoint?

ANS: No. one could be living below the poverty line and speaking a foreign language at home,

or one could living below the poverty line only, or speaking a foreign language at home only. In the case described in the question, 4.2% fall into both categories.

(b) Draw a Venn diagram summarizing the variables and their associated probabilities.

library(VennDiagram)

## Loading required package: grid

## Loading required package: futile.logger

pov <- 14.6
fl <- 20.7
both <- 4.2

venn.plot <- draw.pairwise.venn(pov, 
                   fl,
                   cross.area=both, 
                   c("Poverty", "Foreign Language"), 
                   fill=c("red", "blue"),
                   cat.dist=-0.08,
                   ind=FALSE)
grid.draw(venn.plot)

(c) What percent of Americans live below the poverty line and only speak English at home?

povOnly <- pov - both
povOnly #% of American live below the poverty line and only speak English at home.

## [1] 10.4

(d) What percent of Americans live below the poverty line or speak a foreign language at home?

#P(A or B) = P(A) + P(B) - P(A and B)
 fl + pov - both

## [1] 31.1

(e) What percent of Americans live above the poverty line and only speak English at home?

 povOnly <- pov - both
 100 - fl - povOnly  # % Americans living above poverty line and only speak English at home.

## [1] 68.9

(f) Is the event that someone lives below the poverty line independent of the event that the person speaks a foreign language at home?

#P(A and B) = P(A) x P(B)
(pov/100) * (fl/100)

## [1] 0.030222

2.20 Assortative mating.

(a) What is the probability that a randomly chosen male respondent or his partner has blue eyes?

  #P(Male Blue or Partner Blue) = 
  ((114+108)/204) - (78/204)

## [1] 0.7058824

b) What is the probability that a randomly chosen male respondent with blue eyes has a partner with blue eyes?

#P(Partner Blue given Male Blue) 
(78/114)

## [1] 0.6842105

(c) What is the probability that a randomly chosen male respondent with brown eyes has a partner with blue eyes? What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes?

P(Partner Blue given Male Green) = (11/36)

(d) Does it appear that the eye colors of male respondents and their partners are independent? Explain your reasoning.

there is definitely a possibility in selecting a partner with the same eye color. Given this analysis, the eye color of male respondents and their partners does not appear to be independent.

fBlue <- c(78,19,11)
fBrown <- c(23,23,9)
fGreen <- c(13,12,16)
df <- data.frame(fBlue, fBrown, fGreen)

df

##   fBlue fBrown fGreen
## 1    78     23     13
## 2    19     23     12
## 3    11      9     16

row.names(df) <- c("mBlue", "mBrown", "mGreen")
df$sum <- c(sum(df["mBlue",]), sum(df["mBrown",]), sum(df["mGreen",]))

dfProp <- df / df$sum
dfProp

##            fBlue    fBrown    fGreen sum
## mBlue  0.6842105 0.2017544 0.1140351   1
## mBrown 0.3518519 0.4259259 0.2222222   1
## mGreen 0.3055556 0.2500000 0.4444444   1

2.30 Books on a bookshelf

The table below shows the distribution of books on a bookcase: Type/Format Hardcover Paperback Total Fiction 13 59 72 Nonfiction 15 8 23 Total 28 67 95

a. Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.

P(Hardcover) (28/95)

P(Paperback Fiction) (59/94)

P(Hardcover and Paperback Fiction) =((28/95)*(59/94))

 ((28/95)*(59/94))

## [1] 0.1849944

b. Determine the probability of drawing a fiction book first and then a hardcover book second when drawing without replacement.

P(Fiction) = (72/95)

P(Hardcover) = (28/94)

P(Fiction and Hardcover) = ((72/95)*(28/94))

 ((72/95)*(28/94))

## [1] 0.2257559

c. Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.

P(Fiction) = (72/95)

P(Hardcover) = (28/95)

P(Fiction and Hardcover) = ((72/95)*(28/95))

 ((72/95)*(28/95))

## [1] 0.2233795

d. The final ansewrs to parts (b) and (c) are very similar. Explain why this is the case.

ANS: The only difference is the replacement which simply changes the denominator of the second book selection by 1. Number of books on the bookcase (94 vs 95) has very little effect.

2.38 Baggage fees

An airline changes the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked baggage, 34% have one piece of checked luggage, and 12% have two pieces. We suppose a negligible portion of people check more than two bags.

a. Build a probability model, compute the average revenue per passenger, and compute the corresponding standard deviation.

prob <- c(0.54, 0.34, 0.12)
bag <- c(0, 1, 2)
fee <- c(0, 25, 25 + 35)
df_rev <- data.frame(prob, bag, fee)
df_rev$pf <- df_rev$prob * df_rev$fee
df_rev

##   prob bag fee  pf
## 1 0.54   0   0 0.0
## 2 0.34   1  25 8.5
## 3 0.12   2  60 7.2

# Calculate the average revenue per passenger
avgRevPerPax <- sum(df_rev$pf)
avgRevPerPax

## [1] 15.7

# Calculate Variance
df_rev$DiffMean <- df_rev$pf - avgRevPerPax
df_rev$DMSqr <- df_rev$DiffMean ^ 2
df_rev$DMSP  <- df_rev$DMSqr * df_rev$prob
df_rev

##   prob bag fee  pf DiffMean  DMSqr     DMSP
## 1 0.54   0   0 0.0    -15.7 246.49 133.1046
## 2 0.34   1  25 8.5     -7.2  51.84  17.6256
## 3 0.12   2  60 7.2     -8.5  72.25   8.6700

# Calculate standard deviation
var <- sum(df_rev$DMSP)
sd <- sqrt(var)
sd

## [1] 12.62538

b. About how much revenue should the airline expect for a flight of 120 passengers? With what standard deviation? Note any assumptions you make and if you think they are justified.

# Revenue for 120 passengers
pax <- 120
avgFltRev <- avgRevPerPax * pax
avgFltRev

## [1] 1884

# Standard Deviation
var1 <- (pax ^ 2) * var
sd1 <- sqrt(var1)
sd1

## [1] 1515.046

2.44 Income and gender

income <- c("$1 - $9,999 or loss", 
            "$10,000 to $14,999", 
            "$15,000 to $24,999",
            "$25,000 to $34,999",
            "$35,000 to $49,999",
            "$50,000 to $64,000",
            "$65,000 to $74,999",
            "$75,000 to $99,999",
            "$100,000 or more")
bounds <- c(1, 10000, 15000, 25000, 35000, 50000, 65000, 75000, 100000)
size <- c(9999, 4999, 9999, 9999, 14999, 14999, 9999, 24999, 99999)
center <- bounds + (size / 2)
total <- c(0.022, 0.047, 0.158, 0.183, 0.212, 0.139, 0.058, 0.084, 0.097)

df44 <- data.frame(income, center, total)
df44

##                income   center total
## 1 $1 - $9,999 or loss   5000.5 0.022
## 2  $10,000 to $14,999  12499.5 0.047
## 3  $15,000 to $24,999  19999.5 0.158
## 4  $25,000 to $34,999  29999.5 0.183
## 5  $35,000 to $49,999  42499.5 0.212
## 6  $50,000 to $64,000  57499.5 0.139
## 7  $65,000 to $74,999  69999.5 0.058
## 8  $75,000 to $99,999  87499.5 0.084
## 9    $100,000 or more 149999.5 0.097

(a) Describe the distribution of total personal income.

require(ggplot2)
myTheme <- theme(axis.ticks=element_blank(),  
                  panel.border = element_rect(color="yellow", fill=NA), 
                  panel.background=element_rect(fill="blue"), 
                  panel.grid.major.y=element_line(color="white", size=0.5), 
                  panel.grid.major.x=element_line(color="white", size=0.5))


g44a <- ggplot(data=df44) + 
  geom_bar(aes(x=center, y=total, width=size), stat='identity', position="identity") + 
  labs(x="Income (US$)", 
       y="% of Survey Sample", 
       title="Distribution of Total Personal Income") + 
  myTheme + 
  theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1))

## Warning: Ignoring unknown aesthetics: width

g44a

(b) What is the probability that a randomly chosen US resident makes less than $50,000 per year?

prb <- sum(df44[1:5,]$total)
prb

## [1] 0.622

(c) What is the probability that a randomly chosen US resident makes less than $50,000 per year and is female? Note any assumptions you make.

P(A and B) = P(A) x P(B) 41% females

prf <- 0.41 * prb
prf

## [1] 0.25502

(d) The same data source indicates that 71.8% of females make less than $50,000 per year. Use this value to determine whether or not the assumption you made in part (c) is valid.

F_50K <- 0.718 * .41
F_50K

## [1] 0.29438

while the results of d is close to c, but not equal. therefore they are independent events.

Raghu_606_HW2

Raghu Ramnath

2/10/2017