The time taken to complete a statistics final by all students is normally distributed with a mean of 120 minutes and a standard deviation of 10 minutes.
1 - pnorm(q = 150, mean = 120, sd = 10)
## [1] 0.001349898
Answer: the probability that a randomly selected student will take more than 150 minutes to complete the test is 0.13%
x1 = 122
x2 = 126
mean = 120
sd<-10
SE<-sd/sqrt(16)
Z1 <- (x1 - mean)/SE
Z2 <- (x2 - mean)/SE
p1 <- 1 - pnorm(Z1)
p2<- 1 - pnorm(Z2)
p1-p2
## [1] 0.2036579
Answer: the probability that the mean time taken to complete the test by a random sample of 16 students would be between 122 and 126 minutes is 20.4%
Rh-negative blood appears in 15% of the United States population.
dbinom(3, 7, 0.15)
## [1] 0.06166199
Answer: the probability that out of 7 randomly selected U.S. residents at least 3 of them have Rh-negative blood is 6.2%
np >5 (True) nq > 5 (True)
p = 0.15
q = 1-0.15
n = 100
x = 17.5
mean = n*p
sd = sqrt(mean*q)
Z <- (x - mean)/sd
pnorm(Z)
## [1] 0.7580801
Answer: the probability that in a group 100 randomly selected people fewer than 17.5% will have a Rh-negative blood is 75.8%
sample_mean = 4.8
n = 100
sd = 1.5
se <- sd / sqrt(n)
lower <- sample_mean - 1.645 * se
upper <- sample_mean + 1.645 * se
c(lower, upper)
## [1] 4.55325 5.04675
Answer: 90% confidence interval is (4.55325 5.04675)
n = 1226
p = 0.49
Z = 1.96
lower <- p - Z * sqrt((p*(1-p))/n)
upper <- p + Z * sqrt((p*(1-p))/n)
c(lower, upper)
## [1] 0.462017 0.517983
Answer: 95% confidence interval is (0.462017 0.517983)
Grocery stores, drugstores, and large supermarkets all use scanners to calculate a customer’s bill. Scanners should be as accurate as possible. A state agency regularly monitors stores by randomly selecting items and comparing with the shelf price with the checkout scanner price. During one check by the agency, 16 items were found to be incorrectly scanned. The amounts of overcharge(in cents) were
200, -99, 100, -50, 40, -60, 20, 30, 50, 300, -120, 100, 50, 30, -70, 40
A negative sign indicates an undercharge-the scanner price was below the shelf price.
data <- c(200, -99, 100, -50, 40, -60, 20, 30, 50, 300, -120, 100, 50, 30, -70, 40)
stem(sort(data))
##
## The decimal point is 2 digit(s) to the right of the |
##
## -1 | 20
## -0 | 765
## 0 | 2334455
## 1 | 00
## 2 | 0
## 3 | 0
mean(data)
## [1] 35.0625
range(data)
## [1] -120 300
summary(data)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -120.00 -52.50 35.00 35.06 62.50 300.00
data <- c(200, -99, 100, -50, 40, -60, 20, 30, 50, 300, -120, 100, 50, 30, -70, 40)
boxplot(data)
Boxplot shows that median is equal to 35
Upper and lower quartile approx are 60 and -50 respectively. It means that 75% of data are less than 60 and 25% of the data is less than -50.
The entire box represents the inter-quartile range (upper quartile - lower quartile)
In box plot the whiskers are defined as 1.5 times the inter-quartile range. Anything this outside the whiskers is considered as an outlier. Data has one outlier = 300
# Q3 and Q1 were taken from the result of summary() function above
Q3<-62.50
Q1<--52.50
IQR<-(Q3-Q1)
below<- Q1-1.5*IQR
above<- Q3+1.5*IQR
below
## [1] -225
above
## [1] 235
Answer: All numbers that are below -225 or above 235 we should consider as an outliers. In our case we have 1 outlier: 300
f.For this data sample standard deviation is 108.3. Test the hypothesis that the mean overcharge is more than 0 at 0.05 significance level.
H0: the mean overcharge is less than 0
H1: the mean overcharge is more than 0
t_test<-t.test(data, alternative = "greater")
t_test
##
## One Sample t-test
##
## data: data
## t = 1.295, df = 15, p-value = 0.1074
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## -12.40101 Inf
## sample estimates:
## mean of x
## 35.0625
Answer: p-value is > 0.05, we do not have enough evidence to reject the H0 hypotesis in favor of H1.
Sorted data:
-120, -99, -70, -60, -50, 20, 30, 30, 40, 40, 50, 50, 100, 100, 200, 300
Do cars traveling in the right lane of I-94 travel slower than those in the left lane? The following sample information was obtained. Use the 0.01 significance level to provide an answer to this question.
code source:
H0: cars of right line faster or same that cars of left line (>=)
H1: cars of right line slower that cars of left line (<)
t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
if( equal.variance==FALSE )
{
se <- sqrt( (s1^2/n1) + (s2^2/n2) )
# welch-satterthwaite df
df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
} else
{
# pooled standard deviation, scaled by the sample sizes
se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) )
df <- n1+n2-2
}
t <- (m1-m2-m0)/se
dat <- c(m1-m2, se, t, 2*pt(-abs(t),df))
names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
return(dat)
}
n1 = 5
m1 = 65
s1 = 4.12
n2 = 6
m2 = 69
s2 = 3.22
t.test2(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
## Difference of means Std Error t
## -4.0000000 2.2633927 -1.7672585
## p-value
## 0.1174305
Answer: p-value is more than 0.01, we do not have enough evidence to reject the H0 in favor H1.
A noted medical researcher has suggested that a heart attack is less likely to occur among adults who actively participate in athletics. A random sample of 300 adults is obtained. Of that total, 100 are found to be athletically active. Within this group, 10 suffered heart attacks; among the 200 athletically in active adults, 25 had suffered heart attacks.
H0: proportion of adults who are active and suffered heart attacks = the proportion of adults who are not active and suffered heart attacks
H1: proportion of adults who are active and suffered heart attacks ≠ the proportion of adults who are not active and suffered heart attacks
http://www.sthda.com/english/wiki/two-proportions-z-test-in-r
result1<- prop.test(x = c(10, 25), n = c(100, 200),alternative = c("two.sided"))
result1
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: c(10, 25) out of c(100, 200)
## X-squared = 0.19811, df = 1, p-value = 0.6562
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.10705274 0.05705274
## sample estimates:
## prop 1 prop 2
## 0.100 0.125
Answer: p-value > 0.05 we do not have enough evidence to reject the H0 and we can not accept the H1.
result2<- prop.test(x = c(10, 25), n = c(100, 200),alternative = c("two.sided"),conf.level = 0.99)
result2
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: c(10, 25) out of c(100, 200)
## X-squared = 0.19811, df = 1, p-value = 0.6562
## alternative hypothesis: two.sided
## 99 percent confidence interval:
## -0.13047891 0.08047891
## sample estimates:
## prop 1 prop 2
## 0.100 0.125
The 99% confidence interval for the difference in proportion of of all active and inactive adults who suffered heart attacks ranges from -0.0015 to 0.0175. (-0.13047891 0.08047891)
Perform a test to determine whether the data substantiate an association between the stability of a marriage and the period of acquaintanceship prior to marriage. Use a=0.05.
H0: the is NO association between the stability of a marriage and the period of acquaintanceship prior to marriage
H1: the is an association between the stability of a marriage and the period of acquaintanceship prior to marriage
chi = (11 - 10.3)^2/10.3 + (8-8.7)^2/8.7+(28-28.1)^2/28.1+(24-23.9)^2/23.9+(21-21.6)^2/21.6+(19-18.4)^2/18.4
chi
## [1] 0.1409008
num_col = 2
num_row = 3
df = (num_col-1)*(num_row-1)
1-pchisq(chi, df)
## [1] 0.931974
The chi-square statistic is 0.1409008. The p-value is 0.931974. The result is not significant at p < 0.05. The is not enough evidence to reject the H0 in favor of H1.