Libraries used
#install.packages('VennDiagram')
#install.packages('pander')
library(grid)
library(futile.logger)
library(VennDiagram)
library(knitr)
Practice Questions:
Q 1: (2.5) Coin flips. If you flip a fair coin 10 times, what is the probability of
A: Probability of fair coin landing heads or tails is 50%. And event is mutually exclusive, each coin flip will land hear or tail.
Since each event is independent P(CT) = 1/2. Since coin is flipped 10 times it (1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2) = 1/1024 = 0.0009765625. This suggests probability of getting all tails is 0.098(rounded to 3rd decimal).
Since each event is indecent P(CH) = 1/2. Since coin is flipped 10 times it (1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2) = 1/1024 = 0.0009765625. This suggests probability of getting all heads is 0.098(rounded to 3rd decimal).
Since coin is flipped 10 times, there are 10 chances of getting tails at least once. It can be derived by 10C1 = 10!/1!(10 - 1)! = 10. On each flip chance is 1/2. So 10((1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)) = 0.009765625. This suggests probability of getting at least one tail is 0.98(rounded to 2nd decimal).
Q 2: (2.7) Swing voters. A 2012 Pew Research survey asked 2,373 randomly sampled registered voters their political affiliation (Republican, Democrat, or Independent) and whether or not they identify as swing voters. 35% of respondents identified as Independent, 23% identified as swing voters, and 11% identified as both.
# 11% of 2373 are swing and independent
# round((2372*.23) - 2373*11) are swing and non independent
# round((2372*.35) - (2373*11)) are non swing and indenpendent
# (2373 - (round(2373*11) + round((2372*.23) - 2373*11) + round((2372*.35) - 2373*11))) are non swing and non independent
voter.data <- matrix(c(round(2373*.11), round((2372*.23) - 2373*.11), round((2372*.35) - (2373*.11)), (2373 - (round(2373*.11) + round((2372*.23) - 2373*.11) + round((2372*.35) - 2373*.11)))),ncol=2,byrow=TRUE)
# calculate percent for swing and independent, swing and non independent, swing and non independent, non swing and indenpendent
voter.data.percentages <- round((voter.data/2373),2)
# Attach column and row names
colnames(voter.data)<- c("Independent:Yes","Independent:No")
rownames(voter.data)<- c("Swing:Yes","Swing:No")
# Attach column and row names for percent also
colnames(voter.data.percentages)<- c("Independent:Yes","Independent:No")
rownames(voter.data.percentages)<- c("Swing:Yes","Swing:No")
# Calculate row and column sums
voter.data <- cbind(voter.data, Total = rowSums(voter.data))
voter.data <- rbind(voter.data, Total = colSums(voter.data))
# Calculate row and column sums
voter.data.percentages <- cbind(voter.data.percentages, Total = rowSums(voter.data.percentages))
voter.data.percentages <- rbind(voter.data.percentages, Total = colSums(voter.data.percentages))
voter.data
## Independent:Yes Independent:No Total
## Swing:Yes 261 285 546
## Swing:No 569 1258 1827
## Total 830 1543 2373
voter.data.percentages
## Independent:Yes Independent:No Total
## Swing:Yes 0.11 0.12 0.23
## Swing:No 0.24 0.53 0.77
## Total 0.35 0.65 1.00
A: (a) If a random voter is picket from the population probability of being swing voter is 0.23. Probability of being independent voter is 0.35. Probability of of being swing and independent = 0.11. P(Vs) + P(Vi) = 0.23 * 0.35 = 0.08 Since the equation does not hold. Event being independent and being a swing is disjoint.
(b), (c), (d), (e)
# Percent of voters that are Idependent (no party affiliation) and decided whom to vote (Non Swing) : 0.24
# Percent of voters that are Idependent (no party affiliation) and not yet decided whom to vote (Swing) : 0.11
# Percent of voters that are Non Idependent (has party affiliation) and not yet decided whom to (Swing) : 0.12
# Percent of voters that are Non Idependent (has party affiliation) and decided whom to vote (Non Swing) : 0.53
grid.newpage()
# Venn diagram for Swing and Independent
draw.pairwise.venn(area1 = voter.data.percentages[1,3], area2 = voter.data.percentages[3,1], cross.area = voter.data.percentages[1,1], category = c("Swing","Independent"))
## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], text[GRID.text.5], text[GRID.text.6], text[GRID.text.7], text[GRID.text.8], text[GRID.text.9])
Q 3: (2.19) Burger preferences. A 2010 Survey USA poll asked 500 Los Angeles residents, “What is the best hamburger place in Southern California? Five Guys Burgers? In-N-Out Burger? Fat Burger? Tommy’s Hamburgers? Umami Burger? Or somewhere else?” The distribution of responses by gender is shown below
A: (a) Yes. Being female and liking Five Guys Burger is mutually exclusive (b) Of 248 total males 162 like In-N-Out. Probability of liking is 162*100/248 = 65.32% (c) Of 252 total males 181 like In-N-Out. Probability of liking is 181*100/252 = 71.83% (d) Probability of men liking In-N-Out is 0.6532 and women liking In-N-Out 0.7183. In this case picking of second person is dependent on first person. First person being man or woman probability is 0.5. But if first person is man then second person has to be woman. And if first person is woman then second person has to be man. With assumptions probability of liking In-N-Out by man and woman who are dating is (0.6532*0.7183)*(0.5+0.5) = 0.4692. There is 46.92 percent chance that man and woman who are dating will like In-N-Out. (e) Of 500 people 6 like Umami Probability of liking is 6/500 = 0.012. Probability of being female is 252/500 = 0.504. There is one case were female liking Umami. Probability of that person being selected is 1/500 = 0.002. Since that person counted twice in female and Umami population it needs to subtracted. Probability of liking Umami or female is 0.012 + 0.504 - 0.002 = 0.514
Q 4: (2.29) Chips in a bag. Imagine you have a bag containing 5 red, 3 blue, and 2 orange chips.
A: (a) Total chips = 5 + 3 + 2 = 10. Probability of drawing blue on first draw = 3/10. Since chip is not replaced probability of second event = 2/9. Probability of getting blue both times = (3/10)*(2/9) = 1/15 = 6.67% (b) On the second draw total chips = 5 + 3 + 1 = 9. Probability of drawing blue = 3/9 = 1/3 = 33.33% (c) 3/10*2/9 = 6.67% (d) Drawing without replacement is independent. It is called dependent probability.
Q 5: (2.43) Cat weights. The histogram shown below represents the weights (in kg) of 47 female and 97 male cats.
A: (a) (30 + 33)/Total cats = 63/144 (b) (20 + 27)/Total cats = 47/144 (c) (15 + 12)/Total cats = 27/144
Graded Questions
Q 1: (2.6) Dice rolls. If you roll a pair of fair dice, what is the probability of
A: (a) Zero. Because fair dice has a minimum value of 1. Hence sum cannot be equal to 1. (b) Sum of 5 can happen multiple ways. 1+4, 2+3, 3+2, 4+1. Hence probability of getting 5 is 4/36. (c) Sum of 12 can happen only one way. 6+6. Hence probability of getting 12 is 1/36.
Q 2: (2.8) Poverty and language. The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English (foreign language) at home, and 4.2% fall into both categories.
# 14.6% of population live below Poverty (0.146 percent)
# 20.7% speak No English (0.207 percent)
# 4.2% fall into both categories, speak No English and live below the poverty line (0.042 percent)
# Percent speak No English and live above poverty line = *(0.207 - 0.042)*
# Percent living above poverty line *(1 - 0.146)*
# Percent speaking English (1 - 0.207)
# Percent speaking English and live above poverty line = *Percent living above poverty line - Percent speak No English and live above poverty line*
population.data <- matrix(c(round((0.146-0.042),3), round((1-0.207-(0.146-0.042)),3), round(0.042,3), (round((0.207-0.042),3))),ncol=2,byrow=TRUE)
# Attach column and row names
colnames(population.data)<- c("Poverty:Yes","Poverty:No")
rownames(population.data)<- c("English:Yes","English:No")
# Calculate row and column sums
population.data <- cbind(population.data, Total = rowSums(population.data))
population.data <- rbind(population.data, Total = colSums(population.data))
population.data
## Poverty:Yes Poverty:No Total
## English:Yes 0.104 0.689 0.793
## English:No 0.042 0.165 0.207
## Total 0.146 0.854 1.000
(a) If a random *person (family)* is picked from the population probability of living below the poverty line is 0.146. Probability of speaking a foreign language at home is 0.207. Probability of living below the poverty and speaking a foreign language at home = 0.042. *P(A~p~) + P(A~e~) = 0.146 \* 0.207 = 0.030* Since the equation does not hold. Event living below the poverty and speaking a foreign language at home is disjoint.
(b), (c), (d), (e)
# Percent of population living above the Poverty line and speaking a English language : 0.689
# Percent of population living below the Poverty line and speaking a English language : 0.104
# Percent of population living below the Poverty line and speaking a No English language : 0.042
# Percent of population living above the Poverty line and speaking a No English language : 0.165
grid.newpage()
# Venn diagram for Swing and Independent
draw.pairwise.venn(area1 = population.data[1,3], area2 = population.data[3,1], cross.area = population.data[1,1], category = c("English:Yes","Poverty:Yes"))
## (polygon[GRID.polygon.10], polygon[GRID.polygon.11], polygon[GRID.polygon.12], polygon[GRID.polygon.13], text[GRID.text.14], text[GRID.text.15], text[GRID.text.16], text[GRID.text.17], text[GRID.text.18])
Q 3: (2.20) Assortative mating. Assortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 204 Scandinavian men and their female partners. The table below summarizes the results. For simplicity, we only include heterosexual relationships in this exercise.65
A: (a) Probability of picking male with blue eyes = 114 / 204 = 0.5588. Probability of picking female with blue eyes = 108 / 204 = 0.5294. Probability of picking male and female with blue eyes = 77 / 204 = 0.3774. By applying additional rule probability of randomly chosen male respondent or his partner has blue eyes = (Pm) + (Pf) - (Pm and Pf) = (0.5588 + 0.5294 - 0.3774) = 0.7108. (b) Partner has blue eyes given random male has blue eyes. 78 / 114 = 0.6842 (c) Partner has blue eyes given random male has brown eyes. 19 / 54 = 0.3518. Partner has blue eyes given random male has green eyes. 11 / 36 = 0.3055 (d) By looking at the data it appears high percentage of male respondents and their partners have blue eyes. Hence it is not independent.
Q 4: (2.30) Books on a bookshelf. The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.
Book data
book.data <- matrix(c(13, 59, 15, 8),ncol=2,byrow=TRUE)
# Attach column and row names
colnames(book.data)<- c("Format:Hardcover","Format:Paperback")
rownames(book.data)<- c("Type:Fiction","Type:Nonfiction")
# Calculate row and column sums
book.data <- cbind(book.data, Total = rowSums(book.data))
book.data <- rbind(book.data, Total = colSums(book.data))
book.data
## Format:Hardcover Format:Paperback Total
## Type:Fiction 13 59 72
## Type:Nonfiction 15 8 23
## Total 28 67 95
Q: 5: (2.38) Baggage fees. An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.
A: (a) Lets assume there will be n number of passengers. No fees is collected if zero bags are checked in, 1 bag $25 and 2 bags $60($25 + $35)
Number of passengers with zero bags checked = n*0.54
Number of passengers with one bags checked = n*0.34
Number of passengers with two bags checked = n*0.12
Fees collected from passengers with zero bags checked = n*0.54*0
Fees collected from passengers with one bags checked = n*0.34*25
Fees collected from passengers with one bags checked = n*0.12*60
Expected revenue from fees is (n*0.54*0) + (n*0.54*25) + (n*0.54*60)
Average revenue per passenger is m = ((n*0.54*0) + (n*0.54*25) + (n*0.54*60)) / n
Standard deviation,
1. Calculate difference in revenues per passenger based on mean(average m) (((n*[percentage of bag checkers]*0) - m) for each percentage class(0.54, 0.34, 0.12)
Calculate square for each above values as (((n*[percentage of bag chekers]*0) - m)2 for each percentage class(0.54, 0.34, 0.12)
Multiple values from step 2 with baggage checker percentages (((n*[percentage of bag chekers]*0) - m)2 * [percentage of bag chekers]
Sum all the values from step 3 and find the square root to get standard deviation.
#Total Passangers = 120
passengers.data <- matrix(c(round(120 * 0.54), 0.54, 0, 0, round(120 * 0.34), 0.34, 1, 25, round(120 * 0.12), 0.12, 2, 25+35),ncol=4,byrow=TRUE)
# Attach column and row names
colnames(passengers.data)<- c("Passengers","Percentage","NumberOfBags","Fees")
# Total fees collected
passengers.data <- cbind(passengers.data, TotalFeesCollected = passengers.data[,1] * passengers.data[,4])
# Create a table
passenger.table <- data.frame(passengers.data)
#Average fee
totalRevenuePerPassanger <- sum(passenger.table$TotalFeesCollected) / 120 #Average fee
#Calculate difference in fee from average
passenger.table$DiffInFee <- passenger.table$Fees - totalRevenuePerPassanger
#Square the difference
passenger.table$SqDiff <- passenger.table$DiffInFee^2
#Calulate weighed difference fo squared value
passenger.table$WtDiff <- passenger.table$SqDiff * passenger.table$Percentage
#Get sum of weighted differences. Also known as variance
sumWtDiff <- sum(passenger.table$WtDiff)
#Standard deviation
sdValue <- sqrt(sumWtDiff)
Total revenues : $1865
Average revenue per passenger : $15.54
Standard deviation : $20
Q 6: (2.44) Income and gender. The relative frequency table below displays the distribution of annual total personal income (in 2009 inflation-adjusted dollars) for a representative sample of 96,420,486 Americans. These data come from the American Community Survey for 2005-2009. This sample is comprised of 59% males and 41% females.
A: (a) Distribution of total personal income looks like bell curve, with peak around $35,000 to $49,999 income range.
Data shows percentage of people making less than $50,000 is 2.2 + 4.7 + 15.8 + 18.3 + 21.2 = 62.2%. So probability is 0.622.
If person is picked randomly probability of male or female is 0.5. From earlier problem it is established probability of picking a person making less than $50,000 is 0.622. Probability of picking female making less $50,000 is 0.622*0.5 = 0.311.
If 71.8% females are making less than $50,000, means gender distribution is uneven. Probability of picking a female may go up a little. Assuming person is picked at random my assumption from above problem still holds.