7310 week 6 assignment

5.6

obb <- 25
SampleMean <- (65+77)/2
ME <-(77-65)/2
tdf <- round(qt(c(.05, .95), df=24)[2], 3)
SE <- round((77-SampleMean)/tdf, 3)
sd <- SE * sqrt(25)

Mean is 71, The margin of error is 6 and the sample standard deviation is 17.535.

5.12

Ho: Police officers have not been exposed to a higher concentration of lead sMean Ha: Police officers appear to have been exposed to a higher concentration of lead sMean
Independence: sample size is 52 but the population is unknown. It may not be independent if the population size is over 520. Normality: Most of the observations is around 2 standard deviations.

n <- 52
pMean <- 35
sMean <- 124.32
sSD <- 37.74
t.test(rnorm(n = 52, sMean, sSD), mu = pMean)

## 
##  One Sample t-test
## 
## data:  rnorm(n = 52, sMean, sSD)
## t = 13.32, df = 51, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 35
## 95 percent confidence interval:
##  105.3753 130.3539
## sample estimates:
## mean of x 
##  117.8646

According to the result of the t.test there is significant evidence that policie officers have been exposed to more lead.

No, the standard deviation would only increase the CI.

5.18

The data may be paired if Southwest relies on intel’s services.
The data may be paired if the two stores are near with each other so they must compete in price.
The data may not be paired.

5.24 When the data is paired you should probably use a paired t-test. So the word always make the statement false in some aspects.

5.30

Z = 1.96
m_99 <- 44.51
SD_99 <- 13.32
n_99 <- 23
m_1 <- 56.81
SD_1 <- 16.13
n_1 <- 23
CI_99 <- c(m_99-(Z*SD_99/sqrt(n_99)), m_99+(Z*SD_99/sqrt(n_99)))
CI_1 <- c(m_1-(Z*SD_1/sqrt(n_1)), m_1+(Z*SD_1/sqrt(n_1)))
CI_99

## [1] 39.06627 49.95373

CI_1

## [1] 50.21786 63.40214

5.36

n<- 22
sMean <- 4.9
sSD <- 1.8
pMean <- 6.1

z = (sMean - pMean)/(sSD/sqrt(n))
alpha = .05
z.half.alpha = qnorm(1-(alpha/2))
c(-z.half.alpha, z.half.alpha)

## [1] -1.959964  1.959964

## [1] -3.126944

pval = 2 * pnorm(z)
pval

## [1] 0.001766337

Since the p value is less than .05 we do not reject the null

5.42 We may be able to use an ANOVA as it can determine the difference of means among the data.

5.48

mu <- c(38.67, 39.6, 41.39, 42.55, 40.85)
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51)
n <- c(121, 546, 97, 253, 155)
k <- 5
MSG <- 501.54
SSE <- 267382
n <- sum(n) - k
n

## [1] 1167

p <- 0.0682
#Find Df
dfG <- k-1
dfE <- n-k
dfT <- dfG + dfE
df <- c(dfG, dfE, dfT)
df

## [1]    4 1162 1166

# Find Mean Sq
MSE <- SSE / dfE
MS <- c(MSG, MSE, NA)
MSE

## [1] 230.105

MS

## [1] 501.540 230.105      NA

# Find F-value
Fv <- MSG / MSE
Fv

## [1] 2.179614

myTable.dt <- data.frame(df, SSE, MSG, c(Fv, NA, NA), c(p, NA, NA))
colnames(myTable.dt) <- c("Df", "Sum Sq", "Mean Sq", "F Value", "Pr(>F)")
rownames(myTable.dt) <- c("degree", "Residuals", "Total")
myTable.dt [1:5]

##             Df Sum Sq Mean Sq  F Value Pr(>F)
## degree       4 267382  501.54 2.179614 0.0682
## Residuals 1162 267382  501.54       NA     NA
## Total     1166 267382  501.54       NA     NA

Since the p value is greater than .05, we reject the null hypothesis and there is not a significant difference between the groups.

6.8

ME = 1.96*sqrt((.66*.44)/1018)
ME

## [1] 0.033104

The margin of error was correct.

pbar = .66
p0 = .70
n = 1018
z = (pbar-p0)/sqrt(p0*(1-p0)/n)
z

## [1] -2.784994

alpha = .05
z.alpha = qnorm(1-alpha)
z.alpha

## [1] 1.644854

pval = pnorm(z, lower.tail=FALSE)
pval

## [1] 0.9973236

Since the test statistic is less than 1.64 and the p-value is greater than the .05, we do not reject the null hypothesis.

6.16

Ho: >= 50% of American adults who decided not to go to college did so because they could not afford it Ha: < 50% of American adults who decided not to go to college did so because they could not afford it

pbar1 = .48
p01 = .50
n = 331
z1 = (pbar-p0)/sqrt(p0*(1-p0)/n) 
z1

## [1] -1.588051

The test statistic -1.58 is not less than -1.64 so we do not reject the null hypothesis, and there is no strong evidence to support the newspaper’s statement.

SE = sqrt((.48*(1-.48))/331)
SE

## [1] 0.02746049

With a standard error of 2.7% and a sample proportion of 48%, a 95% confidence interval can be formed. So we are 95% confident that the population proportion is within 2 stamdard deviations of the sample proportion, so the confidence interval contains .5.

6.24

myvector=c(4,30) 
mymatrix=matrix(c(4,30,24,45), nrow=2)
colnames(mymatrix) <- c("Control", "Treatment")
rownames(mymatrix) <-c("Alive", "Dead")
mymatrix

##       Control Treatment
## Alive       4        24
## Dead       30        45

We cannot form a confidence interval because the few number of variables, but we can use the Chi-Square distribution.

alive = c(4,24)
dead = c(30,45)
heartData<-rbind(alive,dead)
colnames(heartData)=c("control","treatment")
chisq.test(heartData)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  heartData
## X-squared = 4.9891, df = 1, p-value = 0.02551

Since chi-Square = 4.98 and p-value = 0.025, so we reject Ho.

6.32

myvector2=c(264,38,16) 
mymatrix2=matrix(c(264,38,16,299,55,15), nrow=3)
colnames(mymatrix2) <- c("Republican", "Democrat")
rownames(mymatrix2) <-c("Should", "Should Not", "Don't Know/No Answer")
mymatrix2

##                      Republican Democrat
## Should                      264      299
## Should Not                   38       55
## Don't Know/No Answer         16       15

Then we removed the independent and run the chis test. Ho: No difference in proportion Republicans and Democrats who think the full-body scans should be applied in airports Ha: There is a difference in proportion Republicans and Democrats who think the full-body scans should be applied in airports

chisq.test(mymatrix2)

## 
##  Pearson's Chi-squared test
## 
## data:  mymatrix2
## X-squared = 1.5381, df = 2, p-value = 0.4635

Chi-Square = 1.53 p-value = 0.4635 Since p > 0.05, we do not reject the null hypothesis.

6.40

True

pchisq(10,5,lower.tail=FALSE)

## [1] 0.07523525

p-value > .05 True

FALSE The chi-squared test is always a one-sided test since it is always squared.
True

6.48

We can use a chi-square test to evaluate if the variables are dependent
Ho: Depression in women is independent of coffee consumption Ha: Depression in women is NOT independent of coffee consumption
Proportion of women who do not suffer from depression = 48,132/50,739 Therefore, 2,607/50,739 suffer from depression

Exp=(2607*6617)/50739
Exp

## [1] 339.9854

The expected value is 339.98 and we can run the chi-test value.

ChiSq = (373-339.98)^2/339.98
ChiSq

## [1] 3.207013

Chi-Sq = 3.207

ChiSq=20.93
r=2
c=5
df=(r-1)*(c-1)
round(1-pchisq(ChiSq,df),5)

## [1] 0.00033

P-value = .0003 With a p-value less than .05, we can reject the null hypothesis. Therefore depression and coffee consumption are independent from each other.

6.56

Ho: Yawning is independent of seeing someone else yawn Ha: Yawning is not indpendent of seeing someone else yawn
Control Yawning Rate = 4/16 = .25 Treatment Yawning Rate = 10/34 = .29 observed difference=.04

myvector4=c(10,324) 
mymatrix4=matrix(c(10,24,4,12), nrow=2)
colnames(mymatrix4) <- c("Treatment", "Control")
rownames(mymatrix4) <-c("Yawn", "NotYawn")
mymatrix4

##         Treatment Control
## Yawn           10       4
## NotYawn        24      12

chisq.test(mymatrix4)

## Warning in chisq.test(mymatrix4): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  mymatrix4
## X-squared = 0, df = 1, p-value = 1

Yawn = c(10,4)
NotYawn = c(24,12)
YawnData<-rbind(Yawn,NotYawn)
colnames(YawnData)=c("Treatment","Control")
chisq.test(YawnData)

## Warning in chisq.test(YawnData): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  YawnData
## X-squared = 0, df = 1, p-value = 1

Since the p-value of 1 > .05, so we fail to reject the null hypothesis. Therefore we do not have enough evidence to state that yawning is contagious.

7310 week 6 assignment

Yifeng Qi

2021/3/8