Homework 2

Instructions
(Question 1)
(Question 2)
(Question 3)

Instructions

Make sure you include author information (your name).
For the two sample independent t-tests, just do the unequal variance test (either include var.equal = FALSE or totally ignore the var.equal option from the code.)

(Question 1)

16 students in a Statistics course are randomly divided into two groups of 8, each assigned to only one of two test formats – Project-based or Concept-based. The test scores for each group are as follows:

Project-based: 38, 54, 63, 45, 40, 49, 58, 57 Concept-based: 55, 84, 79, 65, 78, 74, 58, 69

A. Under regular normality assumptions, test whether the mean scores of the two tests are equal at $\alpha = 0.05$. Clearly specify the name of the test you’re using, define the hypotheses, identify the test statistic, and justify your conclusion. Additionally, calculate the $95\%$ confidence interval for the difference between the means in this test. (25 points)

$H_0$: $μ_{1} - μ_{2} = 0$ ; $H_A$: $μ_{1} - μ_{2} \neq 0$

testpc= c(38, 54, 63, 45, 40, 49, 58, 57,
          55, 84, 79, 65, 78, 74, 58, 69)
testpcs = c(rep("1.Project",8), rep("2.Concept",8))
t.test(testpc~testpcs, conf.level=1-0.05, 
       mu=0, alternative = "two.sided",var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  testpc by testpcs
## t = -4.0718, df = 13.728, p-value = 0.001188
## alternative hypothesis: true difference in means between group 1.Project and group 2.Concept is not equal to 0
## 95 percent confidence interval:
##  -30.172544  -9.327456
## sample estimates:
## mean in group 1.Project mean in group 2.Concept 
##                   50.50                   70.25

Welch Two sample T-test and CI Equal variance p value < $α$ . Hence reject $H_{0}$

B. Without assuming any parametric distribution, test whether the median scores of the two test formats are equal at $\alpha = 0.05$. Clearly specify the name of the test you’re using, define the hypotheses, specify the test statistic, and justify your conclusion. (Hint: Use a nonparametric test.) (25 points)

$H_0$: $H_A$:

median(testpc)

## [1] 58

median.

(Question 2)

All the 8 students from a Statistics class take both a project-based test and a concept-based test. The following are their scores for each test:

	Student 1	Student 2	Student 3	Student 4	Student 5	Student 6	Student 7	Student 8
Project-based	38	54	63	45	40	49	58	57
Concept-based	55	84	79	65	78	74	58	69

A. Assuming normality, test if the mean population score for the Project-based test is smaller than that of the Concept-based test. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. Use a significance level of $\alpha = 0.05$. Write the associated confidence interval also. (25 points)

$H_0$: $H_A$:

testpc= c(38,   54, 63, 45, 40, 49, 58, 57,
          55, 84, 79, 65, 78, 74, 58, 69)
testpcs = c(rep("1.Project",8), rep("2.Concept",8))
t.test(testpc~testpcs, conf.level=1-0.05, 
       mu=0, alternative = "two.sided",var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  testpc by testpcs
## t = -4.0718, df = 13.728, p-value = 0.001188
## alternative hypothesis: true difference in means between group 1.Project and group 2.Concept is not equal to 0
## 95 percent confidence interval:
##  -30.172544  -9.327456
## sample estimates:
## mean in group 1.Project mean in group 2.Concept 
##                   50.50                   70.25

Welch two sample test

B. (EXTRA CREDIT) Can you calculate the test statistic for the above from scratch (i.e., without using any inbuilt t.test or similar command)? (10 points)

#Your code here

C. Without any parametric assumption, test if the median population score for the Project-based test is smaller than that of the Concept-based test. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. Use a significance level of $\alpha = 0.05$. (25 points)

$H_0$: $H_A$:

Project = c(38, 54, 63, 45, 40, 49, 58, 57)
Concept =c(55, 84, 79, 65, 78, 74, 58, 69)
median(Project)

## [1] 51.5

median(Concept)

## [1] 71.5

Under the nonparametric method, the median population score for “Project” is smaller than “Concept.”

(Question 3)

Practice hypothesis testing with datasets. My following code downloads the Diabetes Data¹ in $\texttt{R}$ in a dataframe named $\texttt{diabetes_data}$. You need to be connected to internet for this.

diabetes_data = read.csv(url("https://hbiostat.org/data/repo/diabetes.csv"))

A. Do a nonparametric test that the median age (column name = age) is less than 40.5. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. Take $\alpha = 0.10$. (25 points)

$H_0$: $H_A$:

# Hint: For the positive and negative count of the test, use:
diabetes_data = read.csv(url("https://hbiostat.org/data/repo/diabetes.csv"))
hypothised_median = 40.5
 Ages = diabetes_data$age
 positives_count = sum( Ages > 40.5)
 negatives_count = sum( Ages < 40.5)
 binom.test(positives_count, positives_count + negatives_count,
            alternative = "two.sided", conf.level = 1-0.10)

## 
##  Exact binomial test
## 
## data:  positives_count and positives_count + negatives_count
## number of successes = 243, number of trials = 403, p-value = 4.164e-05
## alternative hypothesis: true probability of success is not equal to 0.5
## 90 percent confidence interval:
##  0.5611779 0.6436461
## sample estimates:
## probability of success 
##              0.6029777

Exact binomial test.

B. Assuming normality, test whether there is a significant difference in the mean Stabilized Glucose levels (column name = stab.glu) between males and females (column name = gender). Use a significance level of $\alpha = 0.10$. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. (25 points)

$H_0$: $H_A$:.

diabetes_data = read.csv(url("https://hbiostat.org/data/repo/diabetes.csv"))
stabgluc= diabetes_data$stab.glu
genders= diabetes_data$gender
t.test(stabgluc ~ genders, mu=0, alternative = "two.sided", conf.level = 0.90)

## 
##  Welch Two Sample t-test
## 
## data:  stabgluc by genders
## t = -1.6937, df = 278.38, p-value = 0.09145
## alternative hypothesis: true difference in means between group female and group male is not equal to 0
## 90 percent confidence interval:
##  -18.940720  -0.245342
## sample estimates:
## mean in group female   mean in group male 
##             102.6496             112.2426

Two sample T test

https://hbiostat.org/data, courtesy of the Vanderbilt University Department of Biostatistics↩︎

Homework 2

Due Date: 09-15-2024

Azzi Parries. Please don’t try editing the date field below.

2024-09-15

Instructions

(Question 1)

16 students in a Statistics course are randomly divided into two groups of 8, each assigned to only one of two test formats – Project-based or Concept-based. The test scores for each group are as follows:

(Question 2)

All the 8 students from a Statistics class take both a project-based test and a concept-based test. The following are their scores for each test:

B. (EXTRA CREDIT) Can you calculate the test statistic for the above from scratch (i.e., without using any inbuilt t.test or similar command)? (10 points)

(Question 3)

Practice hypothesis testing with datasets. My following code downloads the Diabetes Data¹ in \(\texttt{R}\) in a dataframe named \(\texttt{diabetes_data}\). You need to be connected to internet for this.

A. Do a nonparametric test that the median age (column name = age) is less than 40.5. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. Take \(\alpha = 0.10\). (25 points)

Homework 2

Due Date: 09-15-2024

Azzi Parries. Please don’t try editing the date field below.

2024-09-15

Instructions

(Question 1)

16 students in a Statistics course are randomly divided into two groups of 8, each assigned to only one of two test formats – Project-based or Concept-based. The test scores for each group are as follows:

(Question 2)

All the 8 students from a Statistics class take both a project-based test and a concept-based test. The following are their scores for each test:

B. (EXTRA CREDIT) Can you calculate the test statistic for the above from scratch (i.e., without using any inbuilt t.test or similar command)? (10 points)

(Question 3)

Practice hypothesis testing with datasets. My following code downloads the Diabetes Data1 in \(\texttt{R}\) in a dataframe named \(\texttt{diabetes_data}\). You need to be connected to internet for this.

A. Do a nonparametric test that the median age (column name = age) is less than 40.5. Clearly write the name of the test, define the hypotheses, specify the test statistic, and provide justification for your conclusion. Take \(\alpha = 0.10\). (25 points)

Practice hypothesis testing with datasets. My following code downloads the Diabetes Data¹ in \(\texttt{R}\) in a dataframe named \(\texttt{diabetes_data}\). You need to be connected to internet for this.