2.6. Dice rolls. If you roll a pair of fair dice, what is the probability of

(a) getting a sum of 1?

  • P(1/6)*P(0/6) …proability of rolling a 1 and rolling a zero.
  • 0

b)all the Probability’s of getting sums of 5 added together

die 1 | die 2

(4/6) * (1/6)= 4/36 = 1/9

c) getting a sum of 12?

  • (1/6)*(1/6)=1/36
library(reshape)
dice_game <- c()
dice_1 <- c(1,2,3,4,5,6)
dice_2 <- c(1,2,3,4,5,6)
for (x in dice_1){
    sum <- dice_2+x
    dice_game <- append(dice_game,sum)
}

Z = matrix(rep(1,36),ncol=6)
dice_game <- cbind(Z*dice_game)
dice_game
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    2    3    4    5    6    7
## [2,]    3    4    5    6    7    8
## [3,]    4    5    6    7    8    9
## [4,]    5    6    7    8    9   10
## [5,]    6    7    8    9   10   11
## [6,]    7    8    9   10   11   12
#sum_5 <- length(subset(dice_game, dice_game==5)

2.8

name variable_name
live_below_poverty P(bel_pov)
live_above_poverty P(above_pov)
speak_foreign_language P(2nd_lang)
speak_english_only P(eng_only)

a Are living below the poverty line and speaking a foreign language at home disjoint?

b Draw a Venn diagram summarizing the variables and their associated probabilities

## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], text[GRID.text.5], text[GRID.text.6], text[GRID.text.7], text[GRID.text.8], text[GRID.text.9])

C What percent of Americans live below the poverty line and only speak English at home?

Answer: 10.4%

Table continues below
live_below_poverty speak_another_language.below_poverty
14.6% 4.2%
speak_english_only_.below_poverty_line
10.4

D What percent of Americans live below the poverty line or speak a foreign language at home?

Answer: 31.1%

E What percent of Americans live above the poverty line and only speak English at home?

Answer: (above_pov and eng_only)=68.9%

F Is the event that someone lives below the poverty line independent of the event that the person speaks a foreign language at home?

2.20

A. What is the probability that a randomly chosen male respondent or his partner has blue eyes?

  • P(M&Blue)= 114/204
  • P(f& Blue)=108/204
  • P (m&f blue)= 78/204
  • total_probability= (114+108-78)/204
total_probability= (114+108-78)/204
print(paste("Answer:",total_probability*100,"%"))
## [1] "Answer: 70.5882352941177 %"

Answer: 70.59%

B. What is the probability that a randomly chosen male respondent with blue eyes has a partner with blue eyes?

  • P (fblue | M blue)= P(F_Blue and M_Blue) / P(M_Blue)
    • (78/204) / (114 / 204)=0.6842

Answer: 68.42%

C.

  • 1 What is the probability that a randomly chosen male respondent with brown eyes has a partner with blue eyes?
    • P(fblue| Mbrown)= P(fblue and Mbrown)/ P(Mbrown)
    • 19/204 / 54/204

Answer: 35.2%

 19/204 / (54/204)
## [1] 0.3518519
  • 2 What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes?
    • P(fblue| Mgreen)= P(fblue and Mbgreen)/ P(Mgreen)
    • 11/204 / 36/204

Answer: 30.6%

11/204 / (36/204)
## [1] 0.3055556

D. Test for independence…If independnet= p(mate_a_col| probability of mate_b_col)=p(mate_a_col)

  • p(f_blue|probability of male_blue)=p(female_blue)?
  • 0.6842 != .53
P_female_blue <- 108/204

Answer

  • It appears that the partner matching by eye color isn’t independent. However, presumably as this is a sample, there will be some random variability. So presuming the liklihood of these matches can not be attributed to chance, the eye color pairings would be dependent

2.30

A. Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.

hard_book_1st <- 28/95
paper_back_fiction <- 59/94
total_prob <- hard_book_1st* paper_back_fiction
#paste("the Answer is ",total_prob *100, "%" )
  • The Answer is 18.4994400895857 %

B Determine the probability of drawing a fiction book first and then a hardcover book second, when drawing without replacement.

  • Account for how 1st book being fiction can effect probability of 2nd book being hardcover by taking the expected value of fiction book being hard_cover
fiction_1st <- 72/95
chance_hard_cover <-  13/59
chance_not_hard_cover <- 46/59
expected_value_hard_cover_was_fiction <-(27/94*(13/59)) +( 28/94*(46/59))
drawing_hard_cover_given_fiction <- fiction_1st * expected_value_hard_cover_was_fiction 

#paste("the Answer is ",drawing_hard_cover_given_fiction*100,"%" )
  • The Answer is 22.3379501385042 %

C. Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.

first_fiction <- 72/95
second_hard <- 28/95
#paste("the Answer is ",first_fiction*second_hard*100,"%" )
  • The Answer is 22.3379501385042 %

D.The final answers to parts (b) and (c) are very similar. Explain why this is the case.

  • They are very similar because the sample size is large enough and we are only applying one additional condition to our probability.

##small difference
28/95-28/94
## [1] -0.003135498

2.38

A. Build a probability model, compute the average revenue per passenger, and compute the corresponding standard deviation.

bag_costs <- c(0,25,60)
freq <- c(.54,.34,.12)
average_revenue <-  sum(bag_costs*freq)


## lets get variance
variance <- (bag_costs-average_revenue)^2*freq
total_variance <- sum(variance)
std_dev_bag <- total_variance**(1/2)
cbind(bag_costs,freq,average_revenue,variance,std_dev_bag )
##      bag_costs freq average_revenue variance std_dev_bag
## [1,]         0 0.54            15.7 133.1046    19.95019
## [2,]        25 0.34            15.7  29.4066    19.95019
## [3,]        60 0.12            15.7 235.4988    19.95019
#paste("expected revenue per customer is",average_revenue, "$")
#paste("standard deviation is",std_dev_bag_revenue, "$")

Answer

  • Expected revenue per customer is 15.7 $"

  • Standard deviation is 19.9501879690393 $"

(b) About how much revenue should the airline expect for a flight of 120 passengers? With what standard deviation? Note any assumptions you make and if you think they are justified.

expected_revenue <- 120*average_revenue
freq_given_120 <- (120*freq)
variance <- (bag_costs-average_revenue)^2*freq_given_120
total_variance <- sum(variance)
std_dev_bag_revenue <- total_variance**(1/2)

range_of_outcomes <- c(expected_revenue-std_dev_bag_revenue,expected_revenue+std_dev_bag_revenue)

#paste("expected total revenue is",expected_revenue, "$")
#paste("our expected range of total revenue 95% of the time is(",round(range_of_outcomes[1],2), #"$","-",round(range_of_outcomes[2],2),"$ )")

Answer

  • Expected total revenue is 1884 $

  • our expected range of total revenue 95% of the time is( 1665.46 $ - 2102.54 $ )

2.44

A. Describe the distribution of total personal income.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:reshape':
## 
##     rename
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
income_ranges <- c("$1 to $9,999","$10,000 to $14,999","$15,000 to $24,999","$25,000 to $34,999","$35,000 to $49,999","$50,000 to $64,999","$65,000 to $74,999","$75,000 to $99,999","More than 100,000")
Total_hist <- c(2.2,4.7,15.8,18.3,21.2,13.9,5.8,8.4,9.7)
#income_table <- as_data_frame(income_ranges,Total_hist)
table_5 <-data.frame(income_ranges,Total_hist)
table_5$income_ranges <- as.factor(table_5$income_ranges)
#income_table$income_ranges
#str(income_table)
#income_table
ggplot(table_5, aes(income_ranges,Total_hist))+
geom_bar(stat = "identity")+
theme(axis.text.x=element_text(angle=60, hjust=1))

  • The table uses inconsistent income ranges which makes data misleading
  • 40% of population makes under 35k
  • The distribution likely has a rightward skew
  • The mean is likely significantly higher than the median due to the rightward skew and more than 100k column
  • The median income falls in the 35-50k range

B. What is the probability that a randomly chosen US resident makes less than $50,000 per year?

  • The Answer is 62.2

chance <- 21.2+18.3+15.8+4.7+2.2
chance
## [1] 62.2

C. What is the probability that a randomly chosen US resident makes less than $50,000 per year and is female? Note any assumptions you make.

  • Assume men and women are paid equally and have the same proportion of employment in all fields
paste("the probability is",chance *.41)
## [1] "the probability is 25.502"

Answer the probability is 25.502%

D. The same data source indicates that 71.8% of females make less than $50,000 per year. Use this value to determine whether or not the assumption you made in part (c) is valid.

  • Clearly my assumption was not valid. I assumed that 62.2 % of women maid less than 50k
    • Gender disrimination and differing fields of employment likely contribute to this difference
71.8*.41
## [1] 29.438

Given our new data, the answer to question c would be 29.438% as opposed to 25.5%