Instructions

  1. Make sure you include author information (your name).
  2. For the two sample independent t-tests, just do the unequal variance test (either include var.equal = FALSE or totally ignore the var.equal option from the code.)



(Question 1)

16 students in a Statistics course are randomly divided into two groups of 8, each assigned to only one of two test formats – Project-based or Concept-based. The test scores for each group are as follows:

Project-based: 38, 54, 63, 45, 40, 49, 58, 57
Concept-based: 55, 84, 79, 65, 78, 74, 58, 69



A. Under regular normality assumptions, test whether the mean scores of the two tests are equal at \(\alpha = 0.05\). Clearly specify the name of the test you’re using, define the hypotheses, identify the test statistic, and justify your conclusion. Additionally, calculate the \(95\%\) confidence interval for the difference between the means in this test. (25 points)

\(H_0\): μ 1 μ 2 = 0 ; \(H_A\): μ 1 μ 2 0

testpc= c(38, 54, 63, 45, 40, 49, 58, 57,
          55, 84, 79, 65, 78, 74, 58, 69)
testpcs = c(rep("1.Project",8), rep("2.Concept",8))
t.test(testpc~testpcs, conf.level=1-0.05, 
       mu=0, alternative = "two.sided",var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  testpc by testpcs
## t = -4.0718, df = 13.728, p-value = 0.001188
## alternative hypothesis: true difference in means between group 1.Project and group 2.Concept is not equal to 0
## 95 percent confidence interval:
##  -30.172544  -9.327456
## sample estimates:
## mean in group 1.Project mean in group 2.Concept 
##                   50.50                   70.25

Welch Two sample T-test and CI Equal variance p value < α . Hence reject H 0



B. Without assuming any parametric distribution, test whether the median scores of the two test formats are equal at \(\alpha = 0.05\). Clearly specify the name of the test you’re using, define the hypotheses, specify the test statistic, and justify your conclusion. (Hint: Use a nonparametric test.) (25 points)

\(H_0\): \(H_A\):

median(testpc)
## [1] 58

median.



(Question 2)

All the 8 students from a Statistics class take both a project-based test and a concept-based test. The following are their scores for each test:

Student 1 Student 2 Student 3 Student 4 Student 5 Student 6 Student 7 Student 8
Project-based 38 54 63 45 40 49 58 57
Concept-based 55 84 79 65 78 74 58 69



A. Assuming normality, test if the mean population score for the Project-based test is smaller than that of the Concept-based test. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. Use a significance level of \(\alpha = 0.05\). Write the associated confidence interval also. (25 points)

\(H_0\): \(H_A\):

testpc= c(38,   54, 63, 45, 40, 49, 58, 57,
          55, 84, 79, 65, 78, 74, 58, 69)
testpcs = c(rep("1.Project",8), rep("2.Concept",8))
t.test(testpc~testpcs, conf.level=1-0.05, 
       mu=0, alternative = "two.sided",var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  testpc by testpcs
## t = -4.0718, df = 13.728, p-value = 0.001188
## alternative hypothesis: true difference in means between group 1.Project and group 2.Concept is not equal to 0
## 95 percent confidence interval:
##  -30.172544  -9.327456
## sample estimates:
## mean in group 1.Project mean in group 2.Concept 
##                   50.50                   70.25

Welch two sample test



B. (EXTRA CREDIT) Can you calculate the test statistic for the above from scratch (i.e., without using any inbuilt t.test or similar command)? (10 points)

#Your code here



C. Without any parametric assumption, test if the median population score for the Project-based test is smaller than that of the Concept-based test. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. Use a significance level of \(\alpha = 0.05\). (25 points)

\(H_0\): \(H_A\):

Project = c(38, 54, 63, 45, 40, 49, 58, 57)
Concept =c(55, 84, 79, 65, 78, 74, 58, 69)
median(Project)
## [1] 51.5
median(Concept)
## [1] 71.5

Under the nonparametric method, the median population score for “Project” is smaller than “Concept.”



(Question 3)

Practice hypothesis testing with datasets. My following code downloads the Diabetes Data1 in \(\texttt{R}\) in a dataframe named \(\texttt{diabetes_data}\). You need to be connected to internet for this.

diabetes_data = read.csv(url("https://hbiostat.org/data/repo/diabetes.csv"))



A. Do a nonparametric test that the median age (column name = age) is less than 40.5. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. Take \(\alpha = 0.10\). (25 points)

\(H_0\): \(H_A\):

# Hint: For the positive and negative count of the test, use:
diabetes_data = read.csv(url("https://hbiostat.org/data/repo/diabetes.csv"))
hypothised_median = 40.5
 Ages = diabetes_data$age
 positives_count = sum( Ages > 40.5)
 negatives_count = sum( Ages < 40.5)
 binom.test(positives_count, positives_count + negatives_count,
            alternative = "two.sided", conf.level = 1-0.10)
## 
##  Exact binomial test
## 
## data:  positives_count and positives_count + negatives_count
## number of successes = 243, number of trials = 403, p-value = 4.164e-05
## alternative hypothesis: true probability of success is not equal to 0.5
## 90 percent confidence interval:
##  0.5611779 0.6436461
## sample estimates:
## probability of success 
##              0.6029777

Exact binomial test.



B. Assuming normality, test whether there is a significant difference in the mean Stabilized Glucose levels (column name = stab.glu) between males and females (column name = gender). Use a significance level of \(\alpha = 0.10\). Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. (25 points)

\(H_0\): \(H_A\):.

diabetes_data = read.csv(url("https://hbiostat.org/data/repo/diabetes.csv"))
stabgluc= diabetes_data$stab.glu
genders= diabetes_data$gender
t.test(stabgluc ~ genders, mu=0, alternative = "two.sided", conf.level = 0.90)
## 
##  Welch Two Sample t-test
## 
## data:  stabgluc by genders
## t = -1.6937, df = 278.38, p-value = 0.09145
## alternative hypothesis: true difference in means between group female and group male is not equal to 0
## 90 percent confidence interval:
##  -18.940720  -0.245342
## sample estimates:
## mean in group female   mean in group male 
##             102.6496             112.2426

Two sample T test


  1. https://hbiostat.org/data, courtesy of the Vanderbilt University Department of Biostatistics↩︎