Habib Khan

Data 606 - Homework 6

Problem 6.6

False, as the confidence interval is used for the population proportion and not the sample.
True, we are 95% confident that Americans who supported range between 43% and 49% (46% +- 3% margin)
False, our confidence interval gives only the possible outcomes for the population proportion.
False, as we lower our confidence interval our margin of error will get lower too and it makes sense.

Problem 6.12 - Legalization of Marijuana - Part I

It is sample statistic as it was conducted out from US residents

n <- 1259
p <- 0.48
SE <- sqrt((p * (1-p))/n)
me <- 1.96 * SE
me

## [1] 0.02759723

uppertail <- p + me
lowertail <- p - me
c(uppertail,lowertail)

## [1] 0.5075972 0.4524028

We are 95 percent confident that people who think Marijuana should be legalized are somewhere between 0.45 and 0.50

We have no idea if the sample was taken randomly or not, if the data is independent or not. But if we see at the sample size i.e. 1259 respondents then we can see that is far away lower than 10 percent of the entire population and both cases who think and who do not agree that marijuana should be legalized have more than 10 each. If we look at these points we might say that the data is good for further data analysis.
No it’s not enough. Although confidence interval ranges between 45% and 50.7% but still we cannot be sure as most of the range is less than 50% (which is not enough to make majority)

Problem 6.20

p <- 0.48
me <- 0.02

# ME <- 1.96 x SE
SE <- me / 1.96

# SE <- sqrt((p*(1-p))/n)
# SE^2 <- p*(1-p)/n

# n <- p*(1-p)/SE^2
n <- (p*(1-p)/SE^2)
n

## [1] 2397.158

We need at least 2398 respondents to get 2 percent of margin or error

Problem 6.28

p_cal <- 0.08
p_org <- 0.088
p <- p_cal - p_org
p

## [1] -0.008

n_cal <- 11545
n_org <- 4691
se_cal <- (p_cal * (1-p_cal)) / n_cal
se_org <- (p_org * (1-p_org)) / n_org
se <- sqrt(se_cal + se_org)
me_28 <- 1.96 * se
me_28

## [1] 0.009498128

uppertail2 <- p + me_28
lowertail2 <- p - me_28
c(uppertail2, lowertail2)

## [1]  0.001498128 -0.017498128

Since the confidence interval contains 0 so we fail to reject Ho and sleep deprivation for california and oregonian residents might be same.

Problem 6.44

H0: Barking deers do not have preference over certain habitats Ha: Barking deers have preference over certain habitats
Chi-square
Independence - Let’s assume the cases are all independent as it is not given Sample size:

w <- round(426*0.48,1)
c <- round(426*0.147,1)
d <- round(426*0.396,1)
o <- round(426*0.409,1)
proportions=c(w,c,d,o)
proportions

## [1] 204.5  62.6 168.7 174.2

Sample size for each cases have more than 5 so we fulfill all the assumptions for chi-square test.

x = c(4,16,67,345)
k = 4
dF = 3
chi = 0
for(biome in 1:4){
  chi = chi + ((x[biome]-proportions[biome])^2/proportions[biome])
}
p = pchisq(chi,dF,lower.tail = FALSE)
p

## [1] 2.173304e-99

We reject the Ho and hence barking deers prefers certain habitats to forage.

Problem 6.48

Chi-square for two-way tables
Ho: Depression and caffeinated coffee consumption does not have association Ha: Depression and caffeinated coffee consumption have association

dep_women <- 2607 / 50739
nodep_women <- 48132 / 50739
c(dep_women,nodep_women)*100

## [1]  5.138059 94.861941

5.13 percent women are depressed while 94.86 percent women are not depressed

group <- 5
df <- 3
expected <- dep_women*6617
cell <- (373 - expected)^2/expected
cell

## [1] 3.205914

k <-5
df <- 5-1
pval <- pchisq(20.93,df, lower.tail=FALSE)
pval

## [1] 0.0003269507

We reject the null hypothesis and hence depression and caffeinated coffee consumption have association
I might agree with the author as it does not necessarily mean that caffeinated coffee consumption and depression have association and proper research needs to be conducted (experimental research rather than just observation)