DATA606_Assignment6

a.False: this sample has a 46% approval rate. We know that the US population approval rate with 95% confidence intervals is between 43% and 49%.

b.True: This aligns with definiton of confidence interval. The sample is less than 10% of the population.

c.False: 95% will contain the population proportion.

d.False: The margin of error at a 90% confidence level, since we are lowering our confidence.

a.48%, was derived from a sample of 1259 US residents, and NOT from the US population.
b.

# 95% Confidence Interval -> alpha of 0.05, z = 1.96
z <- 1.96
p <- .48 #   proportion is 0.48
n <- 1259 
me <- z * sqrt(p*(1-p)/n)
ci.lower <- (p - me) * 100
ci.upper <- (p + me) * 100
ci.lower

## [1] 45.24028

ci.upper

## [1] 50.75972

The 95% confidence interval for the proportion of US residents who think marijuana should be made legal is from 45.240277 and 50.759723.

c.If observations are independent: < 10% of the population and sample size is sufficent this will hold true.
d.False. The 95% confidence interval falls between 45.24% to 50.76%. The chances are high that it will be < 50%.

#margin of err =2%
mj.p <- .48
se <- .02/1.96
# Standard of Error = sqrt(p * (1-p) / n)
mj.n <- (mj.p * (1-mj.p))/(se^2)
mj.n

## [1] 2397.158

ca.n <- 11545
ca.p <- 0.08
or.n <- 4691
or.p <- 0.088

z <- 1.96 # For 95% Confidence Interval
se.ca <- sqrt((ca.p)*(1-ca.p)/ca.n) 
me.ca <- z * se.ca # Margin of Error at 95% confidence interval
se.or <- sqrt((or.p)*(1-or.p)/or.n)
me.or <- z * se.or # Margin of Error at 95% confidence interval
round(me.ca * 100, 2)

## [1] 0.49

# Now to calculate the confidence interval
ca.lower <- ca.p - me.ca
ca.upper <- ca.p + me.ca
or.lower <- or.p - me.or
or.upper <- or.p + me.or

ca.lower

## [1] 0.07505122

ca.upper

## [1] 0.08494878

or.lower

## [1] 0.07989296

or.upper

## [1] 0.09610704

se <- sqrt((ca.p)*(1-ca.p)/ca.n + (or.p)*(1-or.p)/or.n) # Calculating a new SE for the differences
state.me <- z * se
state.me

## [1] 0.009498128

#95% CI
diff <- or.p - ca.p
diff.lower <- diff - state.me
diff.upper <- diff + state.me
diff.lower

## [1] -0.001498128

diff.upper

## [1] 0.01749813

a.Hypothesis
H0: the barking deer have no preference to certain habitats and that they have equal preference among them all.
Ha: The barking deer have a preference to certain habitats
b.The Chi-square test can be used here.
c.Check conditions for inference
1.The deer are not fluencing each otehr and are independent of each other
2.Sample size and distribution: .048 * 426 = 20.448, which is greater or equal to 5

deer <- c(4, 16, 61, 345)
percnt <- c(0.048, 0.147, 0.396, 0.409)
chisq.test(x = deer, p = percnt)

## 
##  Chi-squared test for given probabilities
## 
## data:  deer
## X-squared = 284.06, df = 3, p-value < 2.2e-16

Since the p-value is less than 0.05, we reject the H0 and conclude that barking deer do perfer some habitats over others

library(visualize)
visualize.chisq(stat= 284.06, df = 3, section = "upper")

a.Chi-squared test
b.Hypothesis
H0: The coffee consumption and depression is not related.
Ha: The coffee consumption and depression is related.
c.

deprs <- 2607
not.deprs <- 48132
total <- deprs + not.deprs
deprs/total * 100

## [1] 5.138059

not.deprs/total * 100

## [1] 94.86194

d.Ans:

6617 * 2607 / 50739

## [1] 339.9854

(373 - 339.9854)^2 / 339.9854

## [1] 3.205914

e.Ans: p.value = 0.0003267

test.stat<- data.frame(Yes = c(670,373,905,564,95),
                    No =c(11545,6244,16329,11726,2288)
                    )
chisq.test(test.stat)

## 
##  Pearson's Chi-squared test
## 
## data:  test.stat
## X-squared = 20.932, df = 4, p-value = 0.0003267

library(visualize)
visualize.chisq(stat=20.932, df = 4, section = "upper")

f.We reject null Hypothesis(H0) g.I Agree that it is too early to make this recommendation. The chisquare test only shows that there is a relationship in the study, not exactly what that relationship is. Correlation does not necessarily mean there’s causation.

DATA606_Assignment6

Niteen Kumar

April 15, 2018