bcs <- haven::read_sav("~/Northeastern/Fall 2018/CAEP7712/Week 2/Breast cancer survival.sav")

Research Questions: For the dataset “Breast cancer survival 2.sav”, formulate 3 research questions that would correspond to single-sample, two-sample, and repeated-sample t-test. Be flexible with the variables in the dataset: some of them might not be dichotomous, but can be dichotomizedthrough recoding (for example, age can be categorized as < 50 years old and > or = 50 years old).
Hypothesis Testing: For each of the questions that you formulate, carry out hypothesis testing and provide a brief summary of your analysis. Make sure to include only relevant output and comment on the statistical results (e.g., report the p-value and the t-statistic) and practical implications, explaining what results mean for a particular problem at hand. Report effect sizes for each test (using the G Power software). Make sure to test assumptions of each test. Think of each problem as of a small research project. Your write-up is a reflection of a logical progression of data analyses, decision making, and conclusions. Any graphical summaries and statistical results would provide supporting elements in your story and included in the write-up. Make sure to answer your original research questions both at the level of statistical conclusions and plain English summaries that would be understandable by an intelligent person without any statistical background. In SPSS, all t-tests are under ANALYZE – COMPARE MEANS In R, the function is t.test, e.g., t.test (x, mu = …, alt = “two.sided”) t.test (x, y, alt = “two.sided”, var.equal = T/F)

Research Questions

What is the relationship between size and histological grade / estrogen status / progesterone status?
Is the estrogen and progesterone receptor status frequency of this sample statistically different than that of the population as specified on cancer.org ¹: 2/3 or .66?
Is the rate of growth of a tumor over time associated with histological grade / estrogen status / progesterone status?

Hypothesis Testing

Q1 (two sample)

What is the relationship between size and histological grade / estrogen status / progesterone status?

Hypothesis

\(H_0:\) There is no significant difference in pathological size between the various levels of histological grade / estrogen status / progesterone status.

\(H_a:\) There is a significant difference in pathological size between the various levels of histological grade / estrogen status / progesterone status.

\(\alpha: .05\)
Test type: two-sided

Assumption Testing

We know from the previous assignment that pathsize is right skewed, and is not normally distributed as indicated by the results below, therefore a t-test is not appropriate for this data. However, the assignment is testing for t-test comprehension so t-test will be performed. For statistical accuracy, a non-parametric Mann-Whitney test will be performed as well.

ggplot(data = bcs, mapping = aes(x = pathsize)) + geom_histogram()

shapiro.test(bcs$pathsize)

## 
##  Shapiro-Wilk normality test
## 
## data:  bcs$pathsize
## W = 0.90833, p-value < 2.2e-16

Analysis

# ----------------------- Wed Oct 03 15:09:03 2018 ------------------------#
# Coding the t-tests
var <- c("histgrad", "pr", "er")
t.out <- sapply(var, FUN = function(var) {
    fac <- bcs[[var]] %>% unique %>% sort
    # Subset pathsize according to the factor levels and remove NA
    
    tt <- vector(mode = "list", length = {
        length(fac) - 1
    })
    es <- vector(mode = "list", length = {
        length(fac) - 1
    })
    mw <- vector(mode = "list", length = {
        length(fac) - 1
    })
    # Create a list comparing the pathsizes corresponding to factor levels of each
    # variable
    for (i in seq_along(fac)[1:{
        length(fac) - 1
    }]) {
        x <- bcs[["pathsize"]][{
            bcs[[var]] == fac[i]
        } %>% which] %>% .[!is.na(.)]
        y <- bcs[["pathsize"]][{
            bcs[[var]] == fac[{
                i + 1
            }]
        } %>% which] %>% .[!is.na(.)]
        # t-tests
        tt[[i]] <- t.test(x = x, y = y, alternative = "two.sided")
        tt[[i]]$data.name <- paste("t-test of pathsize where", var, "is equal to", 
            fac[i], "compared to pathsize where", var, "is equal to", fac[i + 1])
        # Mann-whitney
        mw[[i]] <- wilcox.test(x = x, y = y, alternative = "two.sided", conf.int = T)
        mw[[i]]$data.name <- paste("Mann-whitney of pathsize where", var, "is equal to", 
            fac[i], "compared to pathsize where", var, "is equal to", fac[i + 1])
        # Cohen's d
        es[[i]] <- effsize::cohen.d(d = x, f = y)
        es[[i]]$method <- paste("cohen's d where", var, "is equal to", fac[i], "compared to pathsize where", 
            var, "is equal to", fac[i + 1])
    }
    # Add in effect sizes
    out <- list(list(t.tests = tt, `Mann-whitney` = mw, effect.sizes = es))
    return(out)
})
# p-value, t-stat, CI, Effect Size,practical implications.

Discussion

disc <- lapply(var, t.out = t.out, function(var, t.out) {
    out <- tagList(tags$h5(paste("The comparison of pathsize and", var, "required", 
        length(t.out[[var]][["t.tests"]]), "test(s) to compare the various factor levels")), 
        tags$strong("Test results:"), lapply(sapply(seq_along(t.out[[var]][["t.tests"]]), 
            t.out = t.out, function(t, t.out) {
                paste("For the", t.out[[var]][["t.tests"]][[t]][["data.name"]], "the t-statistic (", 
                  t.out[[var]][["t.tests"]][[t]][["statistic"]][["t"]] %>% format(scientific = T), 
                  ") had a p-value of", t.out[[var]][["t.tests"]][[t]][["p.value"]] %>% 
                    format(scientific = T), "with a confidence interval of (", paste(t.out[[var]][["t.tests"]][[t]][["conf.int"]] %>% 
                    format(scientific = T), collapse = ","), ") indicating that we", 
                  ifelse(t.out[[var]][["t.tests"]][[t]][["p.value"]] < 0.05, "reject", 
                    "failed to reject"), "the null hypothesis at the 95% confidence level. Thus the", 
                  t.out[[var]][["t.tests"]][[t]][["data.name"]], "provides evidence that there ", 
                  ifelse(t.out[[var]][["t.tests"]][[t]][["p.value"]] < 0.05, "is", 
                    "is not"), " a significant difference between pathological tumor size when", 
                  var, "is equal to", str_extract_all(t.out[[var]][["t.tests"]][[t]][["data.name"]], 
                    "\\d")[[1]][[1]], "vs when", var, "is equal to", str_extract_all(t.out[[var]][["t.tests"]][[t]][["data.name"]], 
                    "\\d")[[1]][[2]], ". The Mann-Whitney test for the same comparison gave a p-value of", 
                  t.out[[var]][["Mann-whitney"]][[t]][["p.value"]] %>% format(scientific = T), 
                  "at the same confidence level with a confidence interval of (", 
                  paste0(t.out[[var]][["Mann-whitney"]][[t]][["conf.int"]] %>% format(scientific = T), 
                    collapse = ","), ") which", ifelse(t.out[[var]][["t.tests"]][[t]][["p.value"]] < 
                    0.05 & t.out[[var]][["Mann-whitney"]][[t]][["p.value"]] < 0.05, 
                    "confirms", "contradicts"), "the results of the t-test. The effect size indicated by", 
                  t.out[[var]][["effect.sizes"]][[t]][["method"]], "is", t.out[[var]][["effect.sizes"]][[t]][["estimate"]] %>% 
                    format(scientific = T), "with confidence interval (", paste0(t.out[[var]][["effect.sizes"]][[t]][["conf.int"]] %>% 
                    format(scientific = T), collapse = ","), ") showing that the effect is considered", 
                  t.out[[var]][["effect.sizes"]][[t]][["magnitude"]], ". ")
            }), FUN = tags$p))
    
    
    return(out)
})
tagList(disc)

The comparison of pathsize and histgrad required 2 test(s) to compare the various factor levels

Test results:

For the t-test of pathsize where histgrad is equal to 1 compared to pathsize where histgrad is equal to 2 the t-statistic ( -4.29e+00 ) had a p-value of 3.95e-05 with a confidence interval of ( -6.41e-01,-2.36e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize where histgrad is equal to 1 compared to pathsize where histgrad is equal to 2 provides evidence that there is a significant difference between pathological tumor size when histgrad is equal to 1 vs when histgrad is equal to 2 . The Mann-Whitney test for the same comparison gave a p-value of 1.85e-06 at the same confidence level with a confidence interval of ( -6e-01,-2e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d where histgrad is equal to 1 compared to pathsize where histgrad is equal to 2 is -5e-01 with confidence interval ( -7.42e-01,-2.57e-01 ) showing that the effect is considered small .

For the t-test of pathsize where histgrad is equal to 2 compared to pathsize where histgrad is equal to 3 the t-statistic ( -5.63e+00 ) had a p-value of 2.88e-08 with a confidence interval of ( -5.66e-01,-2.73e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize where histgrad is equal to 2 compared to pathsize where histgrad is equal to 3 provides evidence that there is a significant difference between pathological tumor size when histgrad is equal to 2 vs when histgrad is equal to 3 . The Mann-Whitney test for the same comparison gave a p-value of 2.13e-08 at the same confidence level with a confidence interval of ( -5e-01,-2e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d where histgrad is equal to 2 compared to pathsize where histgrad is equal to 3 is -4.28e-01 with confidence interval ( -5.72e-01,-2.84e-01 ) showing that the effect is considered small .

The comparison of pathsize and pr required 1 test(s) to compare the various factor levels

Test results:

For the t-test of pathsize where pr is equal to 0 compared to pathsize where pr is equal to 1 the t-statistic ( 3.23e+00 ) had a p-value of 1.27e-03 with a confidence interval of ( 8.74e-02,3.57e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize where pr is equal to 0 compared to pathsize where pr is equal to 1 provides evidence that there is a significant difference between pathological tumor size when pr is equal to 0 vs when pr is equal to 1 . The Mann-Whitney test for the same comparison gave a p-value of 1.61e-03 at the same confidence level with a confidence interval of ( 4.65e-05,3.00e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d where pr is equal to 0 compared to pathsize where pr is equal to 1 is 2.3e-01 with confidence interval ( 9.17e-02,3.68e-01 ) showing that the effect is considered small .

The comparison of pathsize and er required 1 test(s) to compare the various factor levels

Test results:

For the t-test of pathsize where er is equal to 0 compared to pathsize where er is equal to 1 the t-statistic ( 4.37e+00 ) had a p-value of 1.47e-05 with a confidence interval of ( 1.72e-01,4.53e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize where er is equal to 0 compared to pathsize where er is equal to 1 provides evidence that there is a significant difference between pathological tumor size when er is equal to 0 vs when er is equal to 1 . The Mann-Whitney test for the same comparison gave a p-value of 4.28e-05 at the same confidence level with a confidence interval of ( 1e-01,4e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d where er is equal to 0 compared to pathsize where er is equal to 1 is 3.25e-01 with confidence interval ( 1.86e-01,4.65e-01 ) showing that the effect is considered small .

The practical significance of the relationship between histological grade and pathological size is that it suggests that the size of a tumor is going to give a rough estimate of its histological grade or severity, or that perhaps the histological grade is in part determined based on it’s size. Furthermore, the presence of absence of receptors in a tumor is also going to be associated with size in a significant way.

Q2 (one sample)

Is the estrogen and progesterone receptor status frequency of this sample statistically different than that of the population as specified on cancer.org ¹: 2/3 or .66?

Hypothesis

\(H_0: \mu = .66\) \(H_a: \mu \neq .66\) #### Assumption Testing

We are going to examine whether or not the sample frequency matches the population frequency, and thus we will need to split our data into multiple groups such that each group is about 100 observations to derive a mean and standard deviation.

Analysis

# ----------------------- Wed Oct 03 18:45:40 2018 ------------------------# Add
# a factor 'group' to split the data into groups. Add the two columns with
# receptor types into the data frame, create a column 'HasRcptr' with TRUE values
# where a receptor is present. Group by the 'group' column and summarise each
# group by the frequency of receptors.
(df <- data.frame(group = rep_len(letters[1:10], length(bcs$pr)), pr = bcs$pr, er = bcs$er) %>% 
    mutate(HasRcptr = pr == 1 | er == 1) %>% group_by(group) %>% summarise(freq = sum(HasRcptr, 
    na.rm = T)/nrow(.)))

(z.out <- BSDA::z.test(x = df$freq, sigma.x = sd(df$freq), mu = 0.66))

## 
##  One-sample z-Test
## 
## data:  df$freq
## z = -535.21, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0.66
## 95 percent confidence interval:
##  0.04672673 0.05120202
## sample estimates:
##  mean of x 
## 0.04896437

Discussion

tags$p(paste("The z-statistic of", z.out[["statistic"]], " CI(", paste0(z.out[["conf.int"]], 
    collapse = ","), ") corresponds to a p-value of", z.out[["p.value"]], " that lends evidence to the claim that the mean frequency of individuals with either a progesterone or estrogen receptor in a tumor in the sample significantly differs. This false negative draws into question whether the sample is representative of the population or  if it is biased."))

The z-statistic of -535.209760949229 CI( 0.0467267325207219,0.0512020164436526 ) corresponds to a p-value of 0 that lends evidence to the claim that the mean frequency of individuals with either a progesterone or estrogen receptor in a tumor in the sample significantly differs. This false negative draws into question whether the sample is representative of the population or if it is biased.

Q3 (repeated sample)

Is the rate of growth of a tumor over time associated with histological grade / estrogen status / progesterone status?

bcs2 <- haven::read_sav("~/Northeastern/Fall 2018/CAEP7712/Week 4/Breast cancer survival 2.sav") %>% 
    .[!is.na(.$pathsize), ]

Hypothesis

\(H_0:\) There is no significant difference in pathological sizes across time relative to the various levels of histological grade / estrogen status / progesterone status.

\(H_a:\) There is a significant difference in pathological size across time relative to the various levels of histological grade / estrogen status / progesterone status.

\(\alpha: .05\)
Test type: two-sided, paired (repeated-sample) #### Assumption Testing

ggplot(data = bcs2, mapping = aes(x = pathsizet2)) + geom_histogram()

shapiro.test(bcs2$pathsizet2)

## 
##  Shapiro-Wilk normality test
## 
## data:  bcs2$pathsizet2
## W = 0.91743, p-value < 2.2e-16

The pathological tumor size at time 2 is also right skewed and non-normal, as is to be expected given the measure is a repeat sampling of the same population. Non-parametric Mann-Whitney tests will be used in addition to t-tests since the data does not fit a theoretical normal distribution.

Analysis

# ----------------------- Thu Oct 04 08:42:28 2018 ------------------------#
# T-tests and Mann-Whitney tests for repeated sample data

var <- c("histgrad", "pr", "er")
t.outQ3 <- sapply(var, bcs = bcs2, FUN = function(var, bcs) {
    fac <- bcs[[var]] %>% unique %>% sort
    # Subset pathsize according to the factor levels and remove NA
    
    tt <- vector(mode = "list", length = {
        length(fac)
    })
    es <- vector(mode = "list", length = {
        length(fac)
    })
    mw <- vector(mode = "list", length = {
        length(fac)
    })
    # Create a list comparing the pathsizes corresponding to factor levels of each
    # variable
    for (i in seq_along(fac)) {
        x <- bcs[["pathsize"]][{
            bcs[[var]] == fac[i]
        } %>% which] %>% .[!is.na(.)]
        y <- bcs[["pathsizet2"]][{
            bcs[[var]] == fac[i]
        } %>% which] %>% .[!is.na(.)]
        # t-tests
        tt[[i]] <- t.test(x = x, y = y, alternative = "two.sided", paired = T)
        tt[[i]]$data.name <- paste("t-test of pathsize across time where", var, "is equal to", 
            fac[i])
        # Mann-whitney
        mw[[i]] <- wilcox.test(x = x, y = y, alternative = "two.sided", conf.int = T, 
            paired = T)
        mw[[i]]$data.name <- paste("Mann-whitney of pathsize across time where", 
            var, "is equal to", fac[i])
        # Cohen's d
        es[[i]] <- effsize::cohen.d(d = x, f = y, paired = T)
        es[[i]]$method <- paste("cohen's d of pathsize across time where", var, "is equal to", 
            fac[i])
    }
    # Add in effect sizes
    out <- list(list(t.tests = tt, `Mann-whitney` = mw, effect.sizes = es))
    return(out)
})
# p-value, t-stat, CI, Effect Size,practical implications.

Discussion

discQ3 <- lapply(var, t.out = t.outQ3, function(var, t.out) {
    out <- tagList(tags$h5(paste("The comparison of pathsize across time for", var, 
        "required", length(t.out[[var]][["t.tests"]]), "test(s) to compare the various factor levels")), 
        tags$strong(paste("Test results:")), lapply(sapply(seq_along(t.out[[var]][["t.tests"]]), 
            t.out = t.out, function(t, t.out) {
                paste("For the", t.out[[var]][["t.tests"]][[t]][["data.name"]], "the t-statistic (", 
                  t.out[[var]][["t.tests"]][[t]][["statistic"]][["t"]] %>% format(scientific = T), 
                  ") had a p-value of", t.out[[var]][["t.tests"]][[t]][["p.value"]] %>% 
                    format(scientific = T), "with a confidence interval of (", paste(t.out[[var]][["t.tests"]][[t]][["conf.int"]] %>% 
                    format(scientific = T), collapse = ","), ") indicating that we", 
                  ifelse(t.out[[var]][["t.tests"]][[t]][["p.value"]] < 0.05, "reject", 
                    "failed to reject"), "the null hypothesis at the 95% confidence level. Thus the", 
                  t.out[[var]][["t.tests"]][[t]][["data.name"]], "provides evidence that there ", 
                  ifelse(t.out[[var]][["t.tests"]][[t]][["p.value"]] < 0.05, "is", 
                    "is not"), " a significant difference between pathological tumor size across time when", 
                  var, "is equal to", str_extract(t.out[[var]][["t.tests"]][[t]][["data.name"]], 
                    "\\d"), ". The Mann-Whitney test for the same comparison gave a p-value of", 
                  t.out[[var]][["Mann-whitney"]][[t]][["p.value"]] %>% format(scientific = T), 
                  "at the same confidence level with a confidence interval of (", 
                  paste0(t.out[[var]][["Mann-whitney"]][[t]][["conf.int"]] %>% format(scientific = T), 
                    collapse = ","), ") which", ifelse(t.out[[var]][["t.tests"]][[t]][["p.value"]] < 
                    0.05 & t.out[[var]][["Mann-whitney"]][[t]][["p.value"]] < 0.05, 
                    "confirms", "contradicts"), "the results of the t-test. The effect size indicated by", 
                  t.out[[var]][["effect.sizes"]][[t]][["method"]], "is", t.out[[var]][["effect.sizes"]][[t]][["estimate"]] %>% 
                    format(scientific = T), "with confidence interval (", paste0(t.out[[var]][["effect.sizes"]][[t]][["conf.int"]] %>% 
                    format(scientific = T), collapse = ","), ") showing that the effect is considered", 
                  t.out[[var]][["effect.sizes"]][[t]][["magnitude"]], ". ")
            }), FUN = tags$p))
    
    
    return(out)
})
tagList(discQ3)

The comparison of pathsize across time for histgrad required 3 test(s) to compare the various factor levels

Test results:

For the t-test of pathsize across time where histgrad is equal to 1 the t-statistic ( -4.29e+01 ) had a p-value of 5.26e-55 with a confidence interval of ( -5.33e-01,-4.85e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where histgrad is equal to 1 provides evidence that there is a significant difference between pathological tumor size across time when histgrad is equal to 1 . The Mann-Whitney test for the same comparison gave a p-value of 2.51e-14 at the same confidence level with a confidence interval of ( -5.37e-01,-4.89e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where histgrad is equal to 1 is -4.89e+00 with confidence interval ( -5.52e+00,-4.25e+00 ) showing that the effect is considered large .

For the t-test of pathsize across time where histgrad is equal to 2 the t-statistic ( -1.07e+02 ) had a p-value of 0e+00 with a confidence interval of ( -5.03e-01,-4.85e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where histgrad is equal to 2 provides evidence that there is a significant difference between pathological tumor size across time when histgrad is equal to 2 . The Mann-Whitney test for the same comparison gave a p-value of 5.2e-81 at the same confidence level with a confidence interval of ( -5.04e-01,-4.86e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where histgrad is equal to 2 is -4.87e+00 with confidence interval ( -5.12e+00,-4.62e+00 ) showing that the effect is considered large .

For the t-test of pathsize across time where histgrad is equal to 3 the t-statistic ( -8.59e+01 ) had a p-value of 3.43e-219 with a confidence interval of ( -5.14e-01,-4.91e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where histgrad is equal to 3 provides evidence that there is a significant difference between pathological tumor size across time when histgrad is equal to 3 . The Mann-Whitney test for the same comparison gave a p-value of 4.57e-53 at the same confidence level with a confidence interval of ( -5.14e-01,-4.90e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where histgrad is equal to 3 is -4.86e+00 with confidence interval ( -5.17e+00,-4.54e+00 ) showing that the effect is considered large .

The comparison of pathsize across time for pr required 3 test(s) to compare the various factor levels

Test results:

For the t-test of pathsize across time where pr is equal to 0 the t-statistic ( -1e+02 ) had a p-value of 1.48e-272 with a confidence interval of ( -5.12e-01,-4.93e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where pr is equal to 0 provides evidence that there is a significant difference between pathological tumor size across time when pr is equal to 0 . The Mann-Whitney test for the same comparison gave a p-value of 3.34e-63 at the same confidence level with a confidence interval of ( -5.13e-01,-4.93e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where pr is equal to 0 is -5.18e+00 with confidence interval ( -5.48e+00,-4.88e+00 ) showing that the effect is considered large .

For the t-test of pathsize across time where pr is equal to 1 the t-statistic ( -9.65e+01 ) had a p-value of 1.24e-299 with a confidence interval of ( -5.0e-01,-4.8e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where pr is equal to 1 provides evidence that there is a significant difference between pathological tumor size across time when pr is equal to 1 . The Mann-Whitney test for the same comparison gave a p-value of 1.78e-74 at the same confidence level with a confidence interval of ( -5.01e-01,-4.80e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where pr is equal to 1 is -4.58e+00 with confidence interval ( -4.83e+00,-4.33e+00 ) showing that the effect is considered large .

For the t-test of pathsize across time where pr is equal to 999 the t-statistic ( -8.83e+01 ) had a p-value of 2.72e-217 with a confidence interval of ( -4.99e-01,-4.77e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where pr is equal to 999 provides evidence that there is a significant difference between pathological tumor size across time when pr is equal to 9 . The Mann-Whitney test for the same comparison gave a p-value of 2.88e-51 at the same confidence level with a confidence interval of ( -5.00e-01,-4.77e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where pr is equal to 999 is -5.08e+00 with confidence interval ( -5.41e+00,-4.75e+00 ) showing that the effect is considered large .

The comparison of pathsize across time for er required 3 test(s) to compare the various factor levels

Test results:

For the t-test of pathsize across time where er is equal to 0 the t-statistic ( -9.5e+01 ) had a p-value of 2.69e-241 with a confidence interval of ( -5.17e-01,-4.96e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where er is equal to 0 provides evidence that there is a significant difference between pathological tumor size across time when er is equal to 0 . The Mann-Whitney test for the same comparison gave a p-value of 7.58e-56 at the same confidence level with a confidence interval of ( -5.18e-01,-4.97e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where er is equal to 0 is -5.23e+00 with confidence interval ( -5.55e+00,-4.91e+00 ) showing that the effect is considered large .

For the t-test of pathsize across time where er is equal to 1 the t-statistic ( -1.04e+02 ) had a p-value of 0e+00 with a confidence interval of ( -4.97e-01,-4.79e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where er is equal to 1 provides evidence that there is a significant difference between pathological tumor size across time when er is equal to 1 . The Mann-Whitney test for the same comparison gave a p-value of 1.33e-84 at the same confidence level with a confidence interval of ( -4.98e-01,-4.79e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where er is equal to 1 is -4.62e+00 with confidence interval ( -4.85e+00,-4.38e+00 ) showing that the effect is considered large .

For the t-test of pathsize across time where er is equal to 999 the t-statistic ( -8.63e+01 ) had a p-value of 8.29e-206 with a confidence interval of ( -4.99e-01,-4.77e-01 ) indicating that we reject the null hypothesis at the 95% confidence level. Thus the t-test of pathsize across time where er is equal to 999 provides evidence that there is a significant difference between pathological tumor size across time when er is equal to 9 . The Mann-Whitney test for the same comparison gave a p-value of 1.74e-48 at the same confidence level with a confidence interval of ( -5.01e-01,-4.77e-01 ) which confirms the results of the t-test. The effect size indicated by cohen's d of pathsize across time where er is equal to 999 is -5.11e+00 with confidence interval ( -5.45e+00,-4.77e+00 ) showing that the effect is considered large .

The practical signifance of these findings is that tumors grow significantly in size across time for all histological grades and receptor classes. It’s important to seek treatment immediately upon tumor discovery to prevent growth and the potential for metastasis.

CAEP7712: Week 4

Stephen Synchronicity

2018-10-04

Research Questions

Hypothesis Testing

Q1 (two sample)

What is the relationship between size and histological grade / estrogen status / progesterone status?

Hypothesis

Assumption Testing

Analysis

Discussion

The comparison of pathsize and histgrad required 2 test(s) to compare the various factor levels

The comparison of pathsize and pr required 1 test(s) to compare the various factor levels

The comparison of pathsize and er required 1 test(s) to compare the various factor levels

Q2 (one sample)

Is the estrogen and progesterone receptor status frequency of this sample statistically different than that of the population as specified on cancer.org ¹: 2/3 or .66?

Hypothesis

Analysis

Discussion

Q3 (repeated sample)

Is the rate of growth of a tumor over time associated with histological grade / estrogen status / progesterone status?

Hypothesis

Analysis

Discussion

The comparison of pathsize across time for histgrad required 3 test(s) to compare the various factor levels

The comparison of pathsize across time for pr required 3 test(s) to compare the various factor levels

The comparison of pathsize across time for er required 3 test(s) to compare the various factor levels

CAEP7712: Week 4

Stephen Synchronicity

2018-10-04

Research Questions

Hypothesis Testing

Q1 (two sample)

What is the relationship between size and histological grade / estrogen status / progesterone status?

Hypothesis

Assumption Testing

Analysis

Discussion

The comparison of pathsize and histgrad required 2 test(s) to compare the various factor levels

The comparison of pathsize and pr required 1 test(s) to compare the various factor levels

The comparison of pathsize and er required 1 test(s) to compare the various factor levels

Q2 (one sample)

Is the estrogen and progesterone receptor status frequency of this sample statistically different than that of the population as specified on cancer.org 1: 2/3 or .66?

Hypothesis

Analysis

Discussion

Q3 (repeated sample)

Is the rate of growth of a tumor over time associated with histological grade / estrogen status / progesterone status?

Hypothesis

Analysis

Discussion

The comparison of pathsize across time for histgrad required 3 test(s) to compare the various factor levels

The comparison of pathsize across time for pr required 3 test(s) to compare the various factor levels

The comparison of pathsize across time for er required 3 test(s) to compare the various factor levels

Is the estrogen and progesterone receptor status frequency of this sample statistically different than that of the population as specified on cancer.org ¹: 2/3 or .66?