The purpose of this homework is to refresh concepts learned in
previous statistics courses. In addition to submitting an Rmarkdown
generated pdf document, you must also submit an R file named
ts2test.R. This .R file must contain
only the function that you have written for the assignment
along with a commented section at the top that contains your name, date,
and assignment number. It must not contain any R analysis. The due date
is Monday, Jan 26th by 5pm as a Canvas upload (of both
files). Only one .pdf (or .html) and one
.R file will be accepted. No two students will program
exactly alike. So it is very easy to catch cheaters.
Use complete sentences and show a “reasonable” amount of algebraic work. (Remember I grade the entire solution not just the answers.) Final answers should include the correct units. Remember to use complete sentences/equations/etc, and context should be clear (answer the word problem). Your work should be rather professional. Do the problems in order. You will be docked points for the overall assignment if the homework isn’t properly organized. Any plots you make should be large enough so that the important aspects can be clearly seen. Every plot should include the appropriate x- and y- labels, main titles, and any other important labels such as the scale or group identifier.
If asked about the probability of a particular event, you should write, e.g., \[ \mathbb{P}(\mbox{event name or label}) = \mbox{appropriate amount of work} = \mbox{answer}. \]
You are encouraged to work with other students in the course. You are free to discuss these problems with each other, however this is not permission to copy each others work. By participating in this class, you must be aware of the course policies, CMDA Integrity Statement, and the university policies with regarding the VT Honor Code.
In particular, you acknowledge and support the statement: “I have neither given nor received unauthorized assistance on this assignment. The work I am presenting is ultimately my own.”
ts2test()Write a function in R called ts2test(), where “ts2”
(read as \((\mathrm{ts})^2\)) stands
for two-sample and two-sided, that implements functionality similar to
parts of the t.test() function in R. A function prototype
with default arguments is suggested below.
ts2test <- function(x, y, conf.level=0.95, type="welch", N=0)
{
if ( type == "welch"){
ybar <- mean(y)
ny <- length(y)
xbar <- mean(x)
nx <- length(x)
yvar <- var(y)
xvar <- var(x)
ypart <- yvar/ny
xpart <- xvar/nx
tsat <- (ybar - xbar) / sqrt((ypart + xpart))
num <- (ypart + xpart)^2
den <- ((ypart)^2/(ny-1))+((xpart)^2/(nx-1))
dof <- num/den
pval <- 2 * pt(-abs(tsat), dof)
ret <- list(tval = tsat, pval = pval, dof = dof)
}
else if (type == "paired"){
d <- x - y
dbar <- mean(d)
sd2 <- var(d)
nd <- length(d)
dof = nd - 1
tsat <- (dbar - 0)/sqrt(sd2/nd)
pval <- pt(-abs(tsat), nd - 1) + pt(abs(tsat), nd - 1, lower.tail = FALSE)
if (N > 0) {
sigma2 <- (nd-1)*sd2/ rchisq(N, nd-1)
dbars <- rnorm (N, 0, sqrt(sigma2/nd))
print(dbars)
pval <- (mean(dbars <- -abs(dbar)) + mean(dbars > abs(dbar)))
dof = NULL
}
ret <- list(estimate = dbar, tval = tsat, pval = pval, df = dof)
}
else if (type == "pooled"){
ybar <- mean(y)
ny <- length(y)
xbar <- mean(x)
nx <- length(x)
yvar <- var(y)
xvar <- var(x)
dof <- nx + ny - 2
sp <- (((ny-1)/(dof))* yvar + ((nx-1)/(dof))* xvar)
tsat <- (ybar - xbar)/ sqrt (sp *((1/ny) + (1/nx)))
pval <- 2 * pt(-abs(tsat), dof)
ret <- list(tval = tsat, pval = pval, df = dof)
}
return(ret)
}
Arguments will be discussed in detail momentarily. We must be able to call your function simply by typing the following (or similar, depending on options and variable names), or through defaults:
ts2test(x=vector1, y=vector2, conf.level=0.9, type="pooled", N=1000)
ts2test(vector1, vector2)
Like t.test() the function should provide test results,
but confidence intervals are optional. Unlike t.test(),
there is no alternative argument. We will only do two-sided
tests. The conf.level argument is the same as
t.test(). The options for type are
"welch", "pooled" (which is like
var.equal in t.test(), and,
"paired" only, for Welch’s, Pooled, and Paired \(t\)-tests, respectively. Positive integer
arguments N indicate a number of Monte Carlo samples for a
numerical test; the default of N=0 indicates an exact \(t\)-test.
Your function should return list whose appropriately
named entries (possibly including sub-lists) include all
the important results of the hypothesis test (and confidence
intervals):
type and degrees
of freedom unless N > 0N > 0
(including a reminder of N in this case)conf.level)While you are writing your function, and when in doubt, look at the
output from the t.test() function as a minimal guide and to
check your answers. To help you and assess whether your function works,
you will apply your function towards solving the following problems
below. You must show that you used your function to get the
correct answer and you must provide both N=0 and
N=1000000 solutions, showing that they are
similar.
A new post-surgical treatment is being compared with a standard treatment. Seven subjects receive the new treatment, while seven others (the controls) receive the standard treatment. The recovery times, in days, are given below.
treatment <- c(12, 13, 15, 19, 20, 21, 24)
control <- c(18, 23, 24, 30, 32, 35, 39)
data.frame(treatment, control)
## treatment control
## 1 12 18
## 2 13 23
## 3 15 24
## 4 19 30
## 5 20 32
## 6 21 35
## 7 24 39
Can you conclude that mean recovery time for treated patience different (hopefully less) than that of the control group? When ansewring, condider …
Which method should be used for conducting hypothesis tests and generating confidence intervals for this data? Why?
I used the Paired T-test method for this data because the data from the control group is paired with the group with the treated patients
What assumptions need to be checked in order to justify the use of the statistical method? Do this and include any appropriate plots and output from software.
I need to assure that the paired differences follow a normal distribution
paired_diff <- treatment - control
hist(paired_diff)
t.test(treatment, control, paired = TRUE)
##
## Paired t-test
##
## data: treatment and control
## t = -9.5263, df = 6, p-value = 7.633e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -13.82545 -8.17455
## sample estimates:
## mean difference
## -11
ts2test(treatment, control, type = "paired")
## $estimate
## [1] -11
##
## $tval
## [1] -9.526279
##
## $pval
## [1] 7.633413e-05
##
## $df
## [1] 6
ts2test(treatment, control,conf.level = 0.9, type = "paired", N = 10)
## [1] -0.2797741 0.7693689 -0.3422884 0.4965776 0.4309616 4.3627372
## [7] -0.4062469 0.6219027 0.5740814 -2.3240585
## $estimate
## [1] -11
##
## $tval
## [1] -9.526279
##
## $pval
## [1] -11
##
## $df
## NULL
In an experiment to compare two diets for fattening beef steers, nine pairs of animals were chosen from the heard; members of each pair were matched as closely as possible with respect to hereditary factors. The members of each pair were randomly allocated, one to teach diet. The following table shows the weight gains (lb) of the animals over a 140-day test period on Diet 1 \((X)\) and on Diet 2 \((Y)\).
one <- c(596, 422, 524, 454, 538, 522, 478, 654, 556)
two <- c(498, 460, 468, 458, 530, 482, 528, 598, 456)
data.frame(one, two)
## one two
## 1 596 498
## 2 422 460
## 3 524 468
## 4 454 458
## 5 538 530
## 6 522 482
## 7 478 528
## 8 654 598
## 9 556 456
What assumptions need to be checked in order to justify the use of the statistical method? Do this and include any appropriate plots and output from software.
In order to use Welch’s test, it is assumed that the variances are not equal. And by looking at the box plots below it appears that the variances are not equal at all.
boxplot(one, two)
t.test(treatment, control, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: treatment and control
## t = -3.3724, df = 9.863, p-value = 0.007231
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.281488 -3.718512
## sample estimates:
## mean of x mean of y
## 17.71429 28.71429
ts2test(control, treatment, type = "welch")
## $tval
## [1] -3.372353
##
## $pval
## [1] 0.007230533
##
## $dof
## [1] 9.863025
Because the P-value of 0.007231 is less than 0.05 we can reject the null hypothesis which implies that wear diets are not as equally effective. Based on the negative t-value we can conclude that the control group is better at reducing wear.
In a study of the relationship of the shape of a tablet to its dissolution time, 6 disk-shaped ibuprofen tablets and 8 oval-shaped ibuprofen tablets were dissolved in water. The dissolve times, in seconds, were as follows:
disk <- c(269.0, 249.3, 255.2, 252.7, 247.0, 261.6)
oval <- c(268.8, 260.0, 273.5, 253.9, 278.5, 289.4, 261.6, 280.2)
boxplot(disk, oval)
Can you conclude that the mean dissolve time is different for disk shaped tablets than for mean dissolve time for oval shaped tablets? Assume that the two samples come from normal distributions and \(\sigma_{\mathrm{disk}}^2 = \sigma_{\mathrm{oval}}^2\). Carry out the appropriate test/intervALS at the \(\alpha=0.05\) (95%) level.
t.test(disk, oval, var.equal = TRUE)
##
## Two Sample t-test
##
## data: disk and oval
## t = -2.6279, df = 12, p-value = 0.02206
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -27.322157 -2.552843
## sample estimates:
## mean of x mean of y
## 255.8000 270.7375
ts2test(oval, disk, type = "pooled")
## $tval
## [1] -2.62793
##
## $pval
## [1] 0.02205977
##
## $df
## [1] 12
based on the results of the pooled t-test we can conclude that based on the p-value of 0.02206 being lower than 0.05, we can reject the null hypothesis. Because of the negative t-value, it shows that disks are more likely to dissolve faster than ovals.