Assignment #2

Raj Kumar

# cleanup the env to start fresh
rm(list=ls())
  1. Exercise: 2.6

Question a: 0

Question b: 4/36

Question c: 1/36

  1. Exercise: 2.8 - Powerty and Language

Question a: No. these two values are not disjoint.

Question b: Please see venn diagram below

library(VennDiagram)
## Loading required package: grid
## Loading required package: futile.logger
grid.newpage()
draw.pairwise.venn(area1 = 14.6, area2 = 20.7, cross.area = 4.2, 
                   category = c("Below_Poverty", "Foreign_Language"), 
                   ty = rep("blank", 2), 
                   fill = c("light blue", "pink"), 
                   alpha = rep(0.5, 2) )    

## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], text[GRID.text.5], text[GRID.text.6], text[GRID.text.7], text[GRID.text.8], text[GRID.text.9])

Question c:

American BELOW poverty line and only speak english at home
14.6(below_poverty) - 4.2(speak_foreign) = 10.4

Question d:

20.7(speak_foreign) + 14.6(below_poverty) - 4.2(crosssection) = 31.1

Question e:

American ABOVE poverty line and only speak english at home
P(above_poverty) AND P(speak_english)
100 - (14.6+20.7-4.2) = 68.9

Question f:

These events are clearly not independent.
  1. Exercise: 2.20 Assortative Mating

Exercise: 2.20

Question a:

P(random_male_blue_eyes) = 114/204 +
P(random_partner_blue_eyes) = 108/204 -
P(both_blue_eyes) = 78/204
.7059

Question b:

Both randomly chosen male and female need to have blue eyes
P(male_blue_eyes) AND P(female_blue_eyes) = 78/114 =
.6842

Question c:

Male needs brown eyes and female needs blue eyes
P(male_brown_eyes) AND P(female_blue_eyes) = 19/54 =
.3518
Male needs green eyes and female needs blue eyes
.3055

Question d:

Is Male eye color and Female eye color independent the eye colors are not independent
For these to be independent P(A and B) needs to equal P(A) * P(B)
P(male_blue_eyes and female_blue_eyes) = 78/204 = .3823
P(male_blue_eyes) * P(female_blue_eyes) = 114/204 *108/204 = .2958 ##### As these are not equal, these are not independent
  1. Exercise: 2.30

Exercise: 2.30

Question a:

P(HardcoverBook) = 28/95 = .2947 and then
P(HardcoverBook) = 59/94 = .6277
Combined .2947 * .6277 = .1850

Question b:

P(FictionBook) = 72/95 = .75789 and then
P(HardcoverBook) = 28/94 = .29787

This can also be 27/94 incase the fiction book was hardcover But since we are doing probability, this should be very close to 28/94 ##### Combined .75789 * .29787 = .22575

Question c:

P(FictionBook) = 72/95 = .75789 and then
P(HardcoverBook) = 28/95 = .29473
Combined .75789 * .29473 = .2233

Question d:

These answers are very similar since the difference itself in terms for probability in drawing second book with/without replacement is small. But when multiplied to probability of first event, it further decreases.
  1. Exercise: 2.38

Exercise: 2.38 - Baggage Fees

Question a:

setwd("C:\\CUNY\\606Statistics\\Assignments")
Baggage Fee Probability Table

Baggage Fee Probability Table

Question b:

Airline should expect revenue of

$15.70(avg. revenue per passenger) * 120 = ##### $1884

Standard Deviation of 19.95
  1. Exercise: 2.44

Exercise: 2.44

Income <- c("$1 to $9,999 or loss",
            "$10,000 to $14,999",
            "$15,000 to $24,999",
            "$25,000 to $34,999",
            "$35,000 to $49,999",
            "$50,000 to $64,999",
            "$65,000 to $74,999",
            "$75,000 to $99,999",
            "$100,000 or more"
            )

Total <- c(2.2, 
           4.7,
           15.8,
           18.3,
           21.2,
           13.9,
           5.8,
           8.4,
           9.7
           )

personalIncomeDF <- data.frame(income=Income, total=Total)

#grid.newpage()
barplot(personalIncomeDF$total, names.arg=personalIncomeDF$income, col="green")

Question a:

(a): It is a continous distribution with a skew on the right side

Question b:

(b): 62.2% people make less than $50K, so that is the probability of random person makes less than that
sum(personalIncomeDF$total[1:5])
## [1] 62.2

Question c:

(c): P(less_than_50k AND female)
P(less_than_50k) = 62.2
P(female) = 41 (statement made in the problem)

We also assume that women and mean are equally spread in the income sections This would mean the income and gender is independent ##### P(less_than_50k) * P(female) = 25.50%

Question d:

(d):

Since 71.8% of females make less than $50k a year

P(less_than_50k) * P(female) IS NOT equal to 71.8%

There would be lesser women in the higher income segments This would mean that there is dependence between income and gender

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.