Hw2

2.6 Dice Rolls, If you roll a pair of fair fice, what is the P(x) of?

The total possible combinations are 6*6 or 36 Note, I had this same problem during the winter bridge and reused my code to demostrate the combos.

dice_combo1=matrix(c(1,1,1,1,1,1,1,2,3,4,5,6), ncol=2)
dice_combo2 <- dice_combo1
dice_combo2[,1] <- dice_combo1[,1] + 1
dice_combo3 <- dice_combo2
dice_combo3[,1] <- dice_combo2[,1] + 1
dice_combo4 <- dice_combo3
dice_combo4[,1] <- dice_combo3[,1] + 1
dice_combo5 <- dice_combo4
dice_combo5[,1] <- dice_combo4[,1] + 1
dice_combo6 <- dice_combo5
dice_combo6[,1] <- dice_combo5[,1] + 1
total_combinations=rbind(dice_combo1,dice_combo2,dice_combo3,dice_combo4,dice_combo5,dice_combo6)

## convert to a lsit to check for a value and count
dice_sums <- as.list(rowSums(total_combinations))

x <- function(dice_sum, sum)
{
  total_sum_found = 0
  for(counter in 1:36)
  {
    if(dice_sum[[counter]] == sum)
      total_sum_found <- total_sum_found + 1
  }
  return(total_sum_found)
}

getting a sum of 1?

#(a) This is 0 because if you roll a pair of dice, the lowest you can get is 2
## Let's find the number of sums for 1

cat("Total sum values for 1 are: ", x(dice_sums, 1 ))

## Total sum values for 1 are:  0

cat("Probability of getting a sum of 1: ", round(x(dice_sums, 1)/36))

## Probability of getting a sum of 1:  0

getting a sum of 5?

#(a) This is zero because if you roll a pair of dice, the lowest you can get is 2
## Let's find the number of sums for 5

cat("Total sum values for 5 are: ", round(x(dice_sums, 5 )))

## Total sum values for 5 are:  4

cat("Probability of getting a sum of 1: ", round(x(dice_sums, 5)/36))

## Probability of getting a sum of 1:  0

Getting a sum of 12?

## Let's find the number of sums for 12

cat("Total sum values for 12 are: ", x(dice_sums, 12 ))

## Total sum values for 12 are:  1

cat("Probability of getting a sum of 1: ", round(x(dice_sums, 12)/36))

## Probability of getting a sum of 1:  0

Problem 2.8 Poverty and language

The American Community Survery is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6 of Americans live below the poverty line, 20.7 speak a language other than english at home (foreign language) at home, and 4.2% fall into both categories. (a) Are living below the poverty line and speaking a foreign language at home disjoint?

#No. There is 4.2% of the sample population that fall into both categories

Draw a venn diagram summarizing the variables and their associated probabilities.

# in this section I will use a table to summarize
prob_table <- matrix(c(0.042, 0.165, 0.207, 0.104, 0.689, 0.793, 0.146, 0.854, 1), byrow=TRUE, nrow=3)

row.names(prob_table) <- c("Speak foreign language at home: YES", "Speak foreign language at home: NO", "Total")
colnames(prob_table) <- c("Live in Poverty", "Do Not Live in Poverty", "Total")
library(gridExtra)
grid.table(prob_table)

What % of americans live below the poverty line and only speak english at home?

# Looking at the table the total who do not not speak a foreign language and live in poverty is 0.104
prob_table[2,1]

## [1] 0.104

What % of americans live below the poverty line OR speak a foreign language at home?

# The total who live below the poverty line is 0.146
# The totel who speak a foreign language at home is 0.207
# The P of either of these two is their sum P(live below poverty) + P(speak foreign language)
prob_table[3,1] + prob_table[1,3]

## [1] 0.353

What % of americans live above the poverty line and only speak english at home?

# Looking at the table the total who do not not speak a foreign language and do not live in poverty is 0.689
prob_table[2,2]

## [1] 0.689

Is the event that someone lives below the poverty line independent of the event that the person speaks a foreign language at home?

# The two events are not independent. If we check with the multiplication rule, we will see that
# P(poverty) * P(Speak Foreign Language) != P(Speak Foreign and below povery)
prob_table[1,3]*prob_table[3,1] == prob_table[1,1]

## [1] FALSE

Problem 2.20 Assortative Mating

eye_table <- matrix(c(78, 23, 13, 114, 19, 23, 12, 54, 11, 9, 16, 36, 108, 55, 41, 204), byrow=TRUE, nrow=4)
colnames(eye_table) <- c("Blue.F", "Brown.F", "Green.F", "Total")
rownames(eye_table) <- c("Blue.M", "Brown.M", "Green.M", "Total")
library(gridExtra)
grid.table(eye_table)

P(M or partner has blue eyes)

# P(Blue Male) + P(Blue Female) - P(blue male and blue female)
(eye_table[1,4] + eye_table[4,1]- eye_table[1,1])/eye_table[4,4]

## [1] 0.7058824

P(random male with blue eyes has partner with blue eyes)?

#P(Blue) 78/ Total 204
78/204

## [1] 0.3823529

P(Random Brown Eye has a partner with blue eyes)? Total 54 Brown Eye males Total 19 blue eye females given 54 brown eye males

Total 36 green eye males total 11 blue eye females given 36 males

# P(make blue eyes | male brown eyes)
eye_table[2,1]/eye_table[2,4]

## [1] 0.3518519

# P(female blue eyes | male green eyes)
eye_table[3,1]/eye_table[3,4]

## [1] 0.3055556

Does it appear that eye colors of males respondents and their partners are independent? Using a test with formula P(A|B) = P(A) and P(B|A) = P(B) Using P(78 brown eye females | 114 blue eye male) = 0.2018 P(brown eye females) = 55/204 = .2696 and Using P(19 blue females | 54 brown males) = 0.3518 P(blue eye females) = 108/204 = .52 Therefore, the two events are not independent

Problem 2.30

books=matrix(c(13,59,72,15,8,23,28,67,95),nrow=3,byrow=TRUE)
colnames(books)=c("hard","paper", "Total")
rownames(books)=c("fiction","nonfiction", "Total")
## Totals Column
library(gridExtra)
grid.table(books)

(a) Find P(Hardbook) then P(paperback) second without replacement

#P(hardbook) = 28/95
28/95

## [1] 0.2947368

#P(paperback after hardbook taken out) 59/94
59/94

## [1] 0.6276596

#P(hardbook)*P(softbook) = (28/95)*(59/94)
(28/95)*(59/94)

## [1] 0.1849944

Determine the P(Fiction Book | Hardbook)

### “P(fiction book) = 72/95 P(hardbook) = 28/94  P(Hardbook) * P(Fiction) = 0.225727
round((72/95)*(28/94), 4)

## [1] 0.2258

Determine P(fiction book) then P(hardbook) with replacement P(fiction book) = 0.76 P(hard book) = 0.29

0.76*0.29

## [1] 0.2204

b and c are similar, why? This is due to the sample size being only one less.

Problem 2.38 Baggage Fees

AN airline chagres the following fees: 25 for the first bag 35 for the 2nd bag supposed 54% of passengers have no checked luuage 34% have 1 piece and 12% have 2 pieces (a) Build probability model, compute avg rev per passenger, and the std dev

## Prob Model
baggage.prob.model <- matrix(c(.54,.34,.12, 1.0, 0,25,35,0), byrow = TRUE, nrow = 2)
colnames(baggage.prob.model) <- c("0 Bags", "1 Bag", "2 Bags", "Total")
rownames(baggage.prob.model) <- c("Probability of passengers","Fee")
grid.table(baggage.prob.model)

## Avg Rev per passenger
#E(x) = P(no bags) * 0 + P(1 bag) * 25 fee + P(2 bags) * 35 fee
avg_fee <- (.54*0) + (25*.34)+(35*.12)
avg_fee

## [1] 12.7

## Standard Deviation
var <- c((baggage.prob.model[2,1] - avg_fee)^2, (baggage.prob.model[2,2] - avg_fee)^2, (baggage.prob.model[2,3] - avg_fee)^2, 0)
var

## [1] 161.29 151.29 497.29   0.00

baggage.prob.model <-  rbind(baggage.prob.model, var)
var.mult.prob <- baggage.prob.model[1,] * baggage.prob.model[3,]
baggage.prob.model <- rbind(baggage.prob.model, var.mult.prob)
baggage.prob.model

##                             0 Bags    1 Bag   2 Bags Total
## Probability of passengers   0.5400   0.3400   0.1200     1
## Fee                         0.0000  25.0000  35.0000     0
## var                       161.2900 151.2900 497.2900     0
## var.mult.prob              87.0966  51.4386  59.6748     0

## the Variance is the sum of var.mult.prob
sum.var <- sum(baggage.prob.model[4,])
paste0("The standard deviation is: $", round(sqrt(sum.var),4))

## [1] "The standard deviation is: $14.0787"

How much should the airline expect for 120 passengers? what about the standard deviation?

# The expected revenue should be 120* the expected fair price
# avg_fee * 120
paste0("The expected revenue is 120 * 12.7: $", avg_fee * 120)

## [1] "The expected revenue is 120 * 12.7: $1524"

## the Variance for this is
paste0("The variance is 120^2 * var: ", sum.var*120^2)

## [1] "The variance is 120^2 * var: 2854224"

paste0("The standard deviation is: ", round(sqrt(sum.var*(120^2)),2))

## [1] "The standard deviation is: 1689.44"

Problem 2.44 Income and Gender

Describe the distribution

income.chart <- matrix(c("$1 to $9,999 or less", 2.2, "$10,000 - $14,999", 4.7, "$15,000 - $24,999", 15.8, "$25,000 - $34,999", 18.3,"$35,000 - $49,999", 21.2, "$50,000 - $64,999", 13.9, "$65,000 - $74,999", 5.8, "$75,000 - $99,000", 8.4, "$100,000 or more", 9.7), byrow = TRUE, nrow = 9)
income_percentages <- as.numeric(income.chart[,2])
names(income_percentages) <- income.chart[,1]
income_percentages

## $1 to $9,999 or less    $10,000 - $14,999    $15,000 - $24,999 
##                  2.2                  4.7                 15.8 
##    $25,000 - $34,999    $35,000 - $49,999    $50,000 - $64,999 
##                 18.3                 21.2                 13.9 
##    $65,000 - $74,999    $75,000 - $99,000     $100,000 or more 
##                  5.8                  8.4                  9.7

barplot(income_percentages)

### The distribution appears to look bimodal with two peaks.

What is the P(random choosen resident makes < 50000)?

income_percentages[1:5]

## $1 to $9,999 or less    $10,000 - $14,999    $15,000 - $24,999 
##                  2.2                  4.7                 15.8 
##    $25,000 - $34,999    $35,000 - $49,999 
##                 18.3                 21.2

paste0("The probability of a resident making less than $50,000 is: %", sum(income_percentages[1:5]))

## [1] "The probability of a resident making less than $50,000 is: %62.2"

What is the P( a resident is female and makes less than 50,000) Because we do not have the data split by gender, we can only assume that we can multiply .622 * .41 (for females)

#assume there is equal split since there is no data
paste0("Assuming there is an equal split of males/females, then females that make less than 50K: ", round(.622*.41, 4))

## [1] "Assuming there is an equal split of males/females, then females that make less than 50K: 0.255"

Assuming that .662 percentage still make less than $50,000, then 71.8% of this total would be roughly 48%. It shows that we wrongly assumed that more females make more than 50k.

Hw3

David Apolinar