HW 6

6.6

a)

False, for the sample 46% agree.

b)

True, the confidence interval is 95% with 3% margin of error.

c)

False, the confidence interval means that 95% of the population will be in the range.

d)

False, with a lower confidence interval the margin of error also lowers.

6.12

a)

48% is a sample statistic since the statement says 48% of the 1,259 US residents.

b)

n <- 1259
p <- .48
z <- 1.96
SE <- sqrt((p*(1-p))/n)
lower <- p - (z * SE)
lower

## [1] 0.4524028

upper <- p + (z * SE)
upper

## [1] 0.5075972

Interval is 45.24% to 50.76%

c)

This is true. Since both (1259 x .48) > 10 and (1259 x(1-.48)) > 10 the distribution is normal an dthe CI is accurate.

d)

With the confidence interval being between 45% and 51% it can be said that over 50% of the Americans think marijuana should be legal.

6.20

p <- 0.48
ME <- 0.02
z <- qnorm(0.975)

SE <- ME/z

n <- (p * (1-p)) / SE^2
round(n,1)

## [1] 2397.1

We need to survey 2,398 Americans.

6.28

ncali <- 11545
noregon <- 4691

pcali <- 0.08
poregon <- 0.088
pDiff <- poregon - pcali


SE <- sqrt( ((pcali * (1 - pcali)) / ncali) +  ((poregon * (1 - poregon)) / noregon))
me <- qnorm(0.975) * SE

lower <- pDiff - me
lower

## [1] -0.001497954

upper <- pDiff + me
upper

## [1] 0.01749795

The Ci is -.0015 to .0175. With 0 included in this interval we can say that with a 95% confidence level that the proportions are not statistically different between California and Oregon.

6.44

a)

\[{ H }_{ 0 }:\quad The\quad sites\quad where\quad barking\quad deer\quad forage\quad are\quad distributed\quad according\quad to\quad the\quad proportions\quad of\quad each\quad habitat\] Woods: 20.45 (4.8%)
Cultivated grassplot: 62.62 (14.7%)
Deciduous forests: 168.7 (39.6%)
Other: 174.23 (40.9%)

\[{ H }_{ A }:\quad The\quad sites\quad where\quad barking\quad deer\quad forage\quad are\quad not\quad distributed\quad according\quad to\quad the\quad proportions\quad of\quad each\quad habitat\]

b)

Chi-square test

c)

For independence, we can assume the observations are independent. For size, we see that each observation has more than 5 cases.

d)

obs <- c(4, 16, 67, 345)
ratio <- c(20.45, 62.62, 168.70, 174.23)

chi <- sum((obs - ratio ) ^ 2 / ratio)

p <- 1 - pchisq(chi, 3)
p

## [1] 0

The p value is 0 so we can conclude that there is evidence that the barking deer forage in some habitats more than others.

6.48

a)

Chi squared test.

b)

\[ { H }_{ 0 }:\quad There\quad is\quad no\quad association\quad between\quad coffee\quad and\quad depression\]
\[{ H }_{ A }:\quad There\quad is\quad an\quad association\quad between\quad coffee\quad and\quad depression\]

c)

depress <- 2607/50739
depress

## [1] 0.05138059

not_depress <-  48132/50739
not_depress

## [1] 0.9486194

The propotion of women who do suffer from depression is .05 and the proportion of women who do not suffer from depression is .95.

d)

obsv <- 373
expect <- depress *  6617

expected_count <- ((obsv - expect)^2) / expect
expected_count

## [1] 3.205914

The expected count is 3.21.

e)

chisq <- 20.93
df <-  (5-1)*(2-1)
  
p <- 1-pchisq(chisq, df)
p

## [1] 0.0003269507

The p value is .00033.
###f)
The p value is less than .05 so we can reject the null hypothesis that there is no association between coffee and depression.

g)

I agree with this statement. While one test gave results that there is no association between coffee and depression there could be other affects of having a lot of coffee. More studies would have to be done.

Data 606 HW 6

David Quarshie

October 20, 2017