606_final

3.

The time taken to complete a statistics final by all students is normally distributed with a mean of 120 minutes and a standard deviation of 10 minutes.

Find the probability that a randomly selected student will take more than 150 minutes to complete the test.

1 - pnorm(q = 150, mean = 120, sd = 10)

## [1] 0.001349898

Answer: the probability that a randomly selected student will take more than 150 minutes to complete the test is 0.13%

Find the probability that the mean time taken to complete the test by a random sample of 16 students would be between 122 and 126 minutes.

x1 = 122
x2 = 126
mean = 120
sd<-10
SE<-sd/sqrt(16)
Z1 <- (x1 - mean)/SE
Z2 <- (x2 - mean)/SE
p1 <- 1 - pnorm(Z1)
p2<- 1 - pnorm(Z2)
p1-p2

## [1] 0.2036579

Answer: the probability that the mean time taken to complete the test by a random sample of 16 students would be between 122 and 126 minutes is 20.4%

4.

Rh-negative blood appears in 15% of the United States population.

Find the probability that out of 7 randomly selected U.S. residents at least 3 of them have Rh-negative blood.

dbinom(3, 7, 0.15)

## [1] 0.06166199

Answer: the probability that out of 7 randomly selected U.S. residents at least 3 of them have Rh-negative blood is 6.2%

Use the normal approximation to find the probability that in a group 100 randomly selected people fewer than 17.5% will have a Rh-negative blood.

https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/binomial-theorem/normal-approximation-to-the-binomial/

np >5 (True) nq > 5 (True)

p = 0.15
q = 1-0.15 
n = 100
x = 17.5
mean = n*p
sd = sqrt(mean*q)
Z <- (x - mean)/sd
pnorm(Z)

## [1] 0.7580801

Answer: the probability that in a group 100 randomly selected people fewer than 17.5% will have a Rh-negative blood is 75.8%

5.

The U.S. Travel Industry estimated that Americans planned to spend an average of 4.8 nights away on vacations in 1995 (U.S. News & World Report, June 12, 1995). Suppose that this mean was based on a random sample of 100 Americans and the population standard deviation was 1.5 nights. Construct a 90% confidence interval for the mean length of vacations Americans planned in 1995.

sample_mean = 4.8
n = 100
sd = 1.5
se <- sd / sqrt(n)
lower <- sample_mean - 1.645 * se
upper <- sample_mean + 1.645 * se
c(lower, upper)

## [1] 4.55325 5.04675

Answer: 90% confidence interval is (4.55325 5.04675)

A poll of 1226 adults revealed that 49% believe that the devil may sometimes possess earthlings. Find a 95% confidence interval for the population proportion of the adults who hold this opinion. (Source:“Demons Begone,” Asheville Citizen-Times, April 5, 1991).

n = 1226
p = 0.49
Z = 1.96
lower <- p - Z * sqrt((p*(1-p))/n)
upper <- p + Z * sqrt((p*(1-p))/n)
c(lower, upper)

## [1] 0.462017 0.517983

Answer: 95% confidence interval is (0.462017 0.517983)

6.

Grocery stores, drugstores, and large supermarkets all use scanners to calculate a customer’s bill. Scanners should be as accurate as possible. A state agency regularly monitors stores by randomly selecting items and comparing with the shelf price with the checkout scanner price. During one check by the agency, 16 items were found to be incorrectly scanned. The amounts of overcharge(in cents) were

200, -99, 100, -50, 40, -60, 20, 30, 50, 300, -120, 100, 50, 30, -70, 40

A negative sign indicates an undercharge-the scanner price was below the shelf price.

Make a stemplot of the data interpret.

data <- c(200, -99, 100, -50, 40, -60, 20, 30, 50, 300, -120, 100, 50, 30, -70, 40)
stem(sort(data))

## 
##   The decimal point is 2 digit(s) to the right of the |
## 
##   -1 | 20
##   -0 | 765
##    0 | 2334455
##    1 | 00
##    2 | 0
##    3 | 0

Compute the mean and the range.

mean(data)

## [1] 35.0625

range(data)

## [1] -120  300

Give the five-number summary of the data.

summary(data)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -120.00  -52.50   35.00   35.06   62.50  300.00

Construct a boxplot and interpret.

data <- c(200, -99, 100, -50, 40, -60, 20, 30, 50, 300, -120, 100, 50, 30, -70, 40)
boxplot(data)

Boxplot shows that median is equal to 35

Upper and lower quartile approx are 60 and -50 respectively. It means that 75% of data are less than 60 and 25% of the data is less than -50.

The entire box represents the inter-quartile range (upper quartile - lower quartile)

In box plot the whiskers are defined as 1.5 times the inter-quartile range. Anything this outside the whiskers is considered as an outlier. Data has one outlier = 300

Use the 1.5xIQR criterion to spot suspected outliers.

# Q3 and Q1 were taken from the result of summary() function above
Q3<-62.50 
Q1<--52.50
IQR<-(Q3-Q1)
below<- Q1-1.5*IQR
above<- Q3+1.5*IQR
below

## [1] -225

above

## [1] 235

Answer: All numbers that are below -225 or above 235 we should consider as an outliers. In our case we have 1 outlier: 300

f.For this data sample standard deviation is 108.3. Test the hypothesis that the mean overcharge is more than 0 at 0.05 significance level.

H0: the mean overcharge is less than 0

H1: the mean overcharge is more than 0

t_test<-t.test(data, alternative = "greater")
t_test

## 
##  One Sample t-test
## 
## data:  data
## t = 1.295, df = 15, p-value = 0.1074
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  -12.40101       Inf
## sample estimates:
## mean of x 
##   35.0625

Answer: p-value is > 0.05, we do not have enough evidence to reject the H0 hypotesis in favor of H1.

Sorted data:

-120, -99, -70, -60, -50, 20, 30, 30, 40, 40, 50, 50, 100, 100, 200, 300

7

Do cars traveling in the right lane of I-94 travel slower than those in the left lane? The following sample information was obtained. Use the 0.01 significance level to provide an answer to this question.

code source:

https://stats.stackexchange.com/questions/30394/how-to-perform-two-sample-t-tests-in-r-by-inputting-sample-statistics-rather-tha

H0: cars of right line faster or same that cars of left line (>=)

H1: cars of right line slower that cars of left line (<)

t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
    if( equal.variance==FALSE ) 
    {
        se <- sqrt( (s1^2/n1) + (s2^2/n2) )
        # welch-satterthwaite df
        df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
    } else
    {
        # pooled standard deviation, scaled by the sample sizes
        se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) ) 
        df <- n1+n2-2
    }      
    t <- (m1-m2-m0)/se 
    dat <- c(m1-m2, se, t, 2*pt(-abs(t),df))    
    names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
    return(dat) 
}

n1 = 5
m1 = 65
s1 = 4.12
n2 = 6
m2 = 69
s2 = 3.22
t.test2(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)

## Difference of means           Std Error                   t 
##          -4.0000000           2.2633927          -1.7672585 
##             p-value 
##           0.1174305

Answer: p-value is more than 0.01, we do not have enough evidence to reject the H0 in favor H1.

8

A noted medical researcher has suggested that a heart attack is less likely to occur among adults who actively participate in athletics. A random sample of 300 adults is obtained. Of that total, 100 are found to be athletically active. Within this group, 10 suffered heart attacks; among the 200 athletically in active adults, 25 had suffered heart attacks.

Test the hypothesis that the proportion of adults who are active and suffered heart attacks is different than the proportion of adults who are not active and suffered heart attacks. Use the 0.05 significance level.

H0: proportion of adults who are active and suffered heart attacks = the proportion of adults who are not active and suffered heart attacks

H1: proportion of adults who are active and suffered heart attacks ≠ the proportion of adults who are not active and suffered heart attacks

http://www.sthda.com/english/wiki/two-proportions-z-test-in-r

result1<- prop.test(x = c(10, 25), n = c(100, 200),alternative = c("two.sided"))
result1

## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(10, 25) out of c(100, 200)
## X-squared = 0.19811, df = 1, p-value = 0.6562
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.10705274  0.05705274
## sample estimates:
## prop 1 prop 2 
##  0.100  0.125

Answer: p-value > 0.05 we do not have enough evidence to reject the H0 and we can not accept the H1.

Construct a 99% confidence interval for the difference between the proportions of all active and inactive adults who suffered heart attacks.

result2<- prop.test(x = c(10, 25), n = c(100, 200),alternative = c("two.sided"),conf.level = 0.99)
result2

## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(10, 25) out of c(100, 200)
## X-squared = 0.19811, df = 1, p-value = 0.6562
## alternative hypothesis: two.sided
## 99 percent confidence interval:
##  -0.13047891  0.08047891
## sample estimates:
## prop 1 prop 2 
##  0.100  0.125

The 99% confidence interval for the difference in proportion of of all active and inactive adults who suffered heart attacks ranges from -0.0015 to 0.0175. (-0.13047891 0.08047891)

9.

Perform a test to determine whether the data substantiate an association between the stability of a marriage and the period of acquaintanceship prior to marriage. Use a=0.05.

H0: the is NO association between the stability of a marriage and the period of acquaintanceship prior to marriage

H1: the is an association between the stability of a marriage and the period of acquaintanceship prior to marriage

chi = (11 - 10.3)^2/10.3 + (8-8.7)^2/8.7+(28-28.1)^2/28.1+(24-23.9)^2/23.9+(21-21.6)^2/21.6+(19-18.4)^2/18.4
chi

## [1] 0.1409008

num_col  = 2
num_row = 3
df = (num_col-1)*(num_row-1)
1-pchisq(chi, df)

## [1] 0.931974

The chi-square statistic is 0.1409008. The p-value is 0.931974. The result is not significant at p < 0.05. The is not enough evidence to reject the H0 in favor of H1.

606_final_exam

Olga Shiligin

17/12/2018

3.

4.

5.

6.

7

8

9.