See https://data606.net/assignments/homework/ for more information. Chapter 2 - Probability Practice: 2.5, 2.7, 2.19, 2.29, 2.43 Graded: 2.6, 2.8, 2.20, 2.30, 2.38, 2.44
If you roll a pair of fair dice, what is the probability of
P(1) = P(1)*P(0) + P(0)*P(1) = (1/6)*(0) + (0)*(1/6) = 0P(5) = P(1)*P(4) + P(2)*P(3) + P(3)*P(2) + P(4)*P(1) = (1/6)*(1/6) + (1/6)*(1/6) + (1/6)*(1/6) + (1/6)*(1/6) = 4(1/36) = 1/9If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
P(12) = P(6)*P(6) = (1/6)*(1/6) = 1/36The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English (foreign language) at home, and 4.2% fall into both categories.
P (below poverty line) = P(<pl) = 14.6/100 = 0.146
P (speak foreign language) = P(sfl) = 20.7/100 = 0.207
P(below poverty line + speak foreign language) = P(<pl + spl) = 4.2/100 = 0.042
P(!below poverty line + !speak foreign language) = 0, since 4.2% do overlaplibrary(VennDiagram)## Loading required package: grid
## Loading required package: futile.logger
Venn <- draw.pairwise.venn(14.6,
20.7,
cross.area=4.2,
c("Below Poverty Line", "Speak Foreign Language"),
fill=c("red", "blue"),
cat.dist=-0.10,
ind=TRUE)
grid.draw(Venn)P(below poverty line + speak english language) = P(<pl + !spl) = P(<pl) - P(<pl + spl) = 0.146-0.042 = 0.104
Percent(below poverty line + speak english language) = 0.104*100 = 10.4%P(below poverty line | speak foreign language) = P(<pl | spl) = P(<pl) + P(spl) - P(<pl + spl) = 0.146+0.207-0.042 = 0.311
Percent(below poverty line | speak foreign language) = 0.311*100 = 31.1%P(above poverty line + speak english language) = P(>pl + spl) = 1-(0.146+0.207-0.042) = 0.689
Percent(below poverty line | speak english language) = 0.689*100 = 68.9%P(!below poverty line + !speak foreign language) = 0 - It is not disjoint or independent instead it is joint or dependentAssortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 204 Scandinavian men and their female partners. The table below summarizes the results. For simplicity, we only include heterosexual relationships in this exercise.
|Self (male) / Partner (female)| Blue |Brown |Green |Total| |Blue |78 |23 |13 |114| |Brown |19 |23 |12 |54| |Green |11 |9 |16 |36| |Total |108 |55 |41 |204|
P(Male Blue_eyes) = 114/204
P(Female Blue_eyes) = 108/204
P(Male Blue_eyes + Female Blue_eyes) = 78/204
P(Male | Blue_eyes) = P(Male Blue_eyes) + P(Female Blue_eyes) + P(Male Blue_eyes + Female Blue_eyes) = 114/204 + 108/204 - 78/204 = 144/204 = 0.706P(Male Blue_eyes / Female Blue_eyes) = P(Male Blue_eyes + Female Blue_eyes)/P(Male Blue_eyes) = 78/114 = 0.684What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes?
P(Male Brown_eyes / Female Blue_eyes) = P(Male Brown_eyes + Female Blue_eyes)/P(Male Brown_eyes) = 19/54 = 0.352
P(Male Green_eyes / Female Blue_eyes) = P(Male Green_eyes + Female Blue_eyes)/P(Male Green_eyes) = 11/36 = 0.306Explain your reasoning.
Using the last example, the eye colors of male respondents and their partners would be independent if
P(Male Green_eyes / Female Blue_eyes) = P(Female Blue_eyes)
However, that formula is not true
P(Male Green_eyes / Female Blue_eyes) != P(Female Blue_eyes) => 11/36 != 108/204
Hence, they are dependent.The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback. Format |Type |Hardcover |Paperback |Total| |Fiction |13 |59 |72| |Nonfiction |15 |8 |23| |Total |28 |67 |95|
P(Hardcover) = 28/95
P(Fiction/Paperback-1) = 59/94
P(Hardcover + Fiction/Paperback) = P(Hardcover)*P(Fiction/Paperback-1) = (28/95)???(59/94)=0.185P(Hardcover) = (13+15)/95 = 28/95
P(Fiction/Hardcover) = 13/95
P(Fiction + Hardcover-1) = Scenario1 + Scenario2 = P(Fiction/Paperback + Hardcover-1) + P(Fiction/Hardcover + Hardcover-1) = P(Fiction/Paperback)*P(Hardcover-1) + P(Fiction/Hardcover)*P(Hardcover-1) = ((59/95)*(28/94)) + ((13/95)*(27/94)) = 0.2243P(Fiction + Hardcover) = P(Fiction)*P(Hardcover) = (72/95)*(28/95) = 0.2234Explain why this is the case.
Given the fact that dividing by 95 will result in a smaller number than dividing by 94, scenario (c) will have a smaller probability than scenario (b). However in scenario (b) we have to factor in the slight probablity event that the first book was both hardcover and fiction. When taking this into account it reduces the overall probabilty of (b) and by coincidence makes it similar to (c).An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.
baggage_fees = c(0, 25, 60)
prob_of_ppl_checked_in_bag = c(0.54, 0.34, 0.12)
no_of_checked_in_bag = c(0, 1, 2)
airline_fees <- data.frame(baggage_fees, prob_of_ppl_checked_in_bag, no_of_checked_in_bag)
airline_fees$weighted <- airline_fees$baggage_fees * airline_fees$prob_of_ppl_checked_in_bag
revenue_per_passenger <- sum(airline_fees$weighted)
revenue_per_passenger## [1] 15.7
airline_fees$variance <- (airline_fees$baggage_fees - revenue_per_passenger)^2 * airline_fees$prob_of_ppl_checked_in_bag
std_dev_per_passenger <- sqrt(sum(airline_fees$variance))
std_dev_per_passenger## [1] 19.95019
Revenue = 120*revenue_per_passenger
Revenue## [1] 1884
Standard_Deviation = sqrt(120*std_dev_per_passenger^2)
Standard_Deviation## [1] 218.5434
The relative frequency table below displays the distribution of annual total personal income (in 2009 inflation-adjusted dollars) for a representative sample of 96,420,486 Americans. These data come from the American Community Survey for 2005-2009. This sample is comprised of 59% males and 41% females. Income Total $1 to $9,999 or loss 2.2% $10,000 to $14,999 4.7% $15,000 to $24,999 15.8% $25,000 to $34,999 18.3% $35,000 to $49,999 21.2% $50,000 to $64,999 13.9% $65,000 to $74,999 5.8% $75,000 to $99,999 8.4% $100,000 or more 9.7%
library(ggplot2)
personal_income <- data.frame("Income" = c("1-9999", "10000-14999", "15000-24999", "25000-34999", "35000-49999", "50000-64999", "65000-74999", "75000-99999", "1000000+"), "Total" = c(0.022, 0.047, 0.158, 0.183, 0.212, 0.139, 0.058, 0.084, 0.097))
#names(personal_income) <- c("Income", "Total")
personal_income$Income <- factor(personal_income$Income, levels = c("1-9999", "10000-14999", "15000-24999", "25000-34999", "35000-49999", "50000-64999", "65000-74999", "75000-99999", "1000000+"))
personal_income## Income Total
## 1 1-9999 0.022
## 2 10000-14999 0.047
## 3 15000-24999 0.158
## 4 25000-34999 0.183
## 5 35000-49999 0.212
## 6 50000-64999 0.139
## 7 65000-74999 0.058
## 8 75000-99999 0.084
## 9 1000000+ 0.097
ggplot(personal_income, aes(y = Total, x = Income)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90))The data has the highest frequency for the income range of $35,000 to $49,999 with a left skew. However, the distribution is bimodal with a second smaller peak at the $100,000+ range.sum(personal_income$Total[1:5])## [1] 0.622
Note any assumptions you make.
P(<$50,000 | Female) = P(<$50,000 & Female) / P(Female)
P(<$50,000 & Female) = P(<$50,000 | Female) * P(Female)
Assuming that females and males have same proportion of earning less than $50,000 as the overall sample, we can assume:
P(<$50,000 | Female) = P(<$50,000 | Male) = 0.622
P(<$50,000 & Female) = 0.622???0.41 = 0.255Use this value to determine whether or not the assumption you made in part (c) is valid.
P(<$50,000 | Female) = P(<$50,000 & Female) / P(Female)
P(<$50,000 & Female) = P(<$50,000 | Female) * P(Female)
P(<$50,000 & Female) = 0.718???0.41 = 0.294