Problem 1

Data from a case-control study of 200 esophageal cancer cases and 775 community-based controls are shown below. Detailed dietary data were obtained by interview. This study addresses the relation between alcohol consumption (dichotomized an 80 grams per day) and esophageal cancer.

p1 <- read.xls("/Users/jenniferpolson/Documents/School/2018-W/BE M228/Final/Data Entry.xlsx", 1); p1
row.names(p1) <- p1$X
p1 <- subset(p1, select = -X )
  1. Compute the odds ratio and test the null hypothesis that the odds ratio is equal to 1.
OR1 <- (p1[1,1]*p1[2,2])/(p1[2,1]*p1[1,2])

The odds ratio is 5.6400847.

  1. Calculate the 95% confidence interval for the odds ratio.
UpperOR1 <- exp(log(OR1)+(1.96*sqrt((1/p1[1,1]))+(1/p1[2,2])+(1/p1[2,1])+(1/p1[1,2]))); UpperOR1
## [1] 7.030318
LowerOR1 <- exp(log(OR1)-(1.96*sqrt((1/p1[1,1]))+(1/p1[2,2])+(1/p1[2,1])+(1/p1[1,2]))); LowerOR1
## [1] 4.524768
#Since the odds ratio is not equal to one and the 95% percent confidence interval does not overlap with 1, we can be confident in saying that we can disregard the null hypothesis

The 95% confidence interval is 7.030318,4.5247676. Since the confidence interval doesn’t contain 1 (which would cause us to accept the null hypothesis that OR = 1), we can conclude that the null hypothesis can be rejected.

Problem 2

In an experiment to compare two diets for fattening beef steers, nine pairs of animals were chosen from the herd; members of each pair were matched as closely as possible with respect to hereditary factors. The members of each pair were randomly allocated, one to each diet. The following table below shows the weight gains (lb) of the animals over a 140-day test period on diet 1 and on diet 2.

p2 <- read.xls("/Users/jenniferpolson/Documents/School/2018-W/BE M228/Final/Data Entry.xlsx", 2); p2
  1. Calculate the summary statistics of the difference
summary(p2$Difference)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -50.00  -34.00    8.00   22.89   70.00  100.00
  1. Test the difference between the diets is zero at the 5% level of significance using a parametric approach. Use a non-directional (i.e. two-sided) alternative.
test2b <- t.test(p2$Diet.1, p2$Diet.2, paired = T); test2b
## 
##  Paired t-test
## 
## data:  p2$Diet.1 and p2$Diet.2
## t = 1.1585, df = 8, p-value = 0.2801
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -22.67122  68.44900
## sample estimates:
## mean of the differences 
##                22.88889

Based on a p-value of 0.2800675, we are unable to reject the null hypothesis that there is a difference between the two diets.

  1. Test the difference between the diets is zero at the 5% level of significance using a non-parametric approach. Use a non-directional (i.e. two-sided) alternative.
test2c <- wilcox.test(p2$Diet.1, p2$Diet.2, paired = T); test2c
## 
##  Wilcoxon signed rank test
## 
## data:  p2$Diet.1 and p2$Diet.2
## V = 32, p-value = 0.3008
## alternative hypothesis: true location shift is not equal to 0

Based on a p-value of 0.3007812, we are unable to reject the null hypothesis that there is a difference between the two diets.

  1. Compare the result of (b) to (c) and discuss the result.

The results of both the parametric and non-parametric tests are statistically insignificant (0.2800675 and 0.3007812, respectively), however, the parametric test assumes a normal distribution for the data, which is not the case Therefore, the Wilcoxon rank-sum test may be more indicative since the data do not follow a normal distribution.

Problem 3

A psychologist conducted a study to examine the nature of the relation, if any, between an employee’s emotional stability (X) and the employee’s ability to perform in a task group (Y). Emotional stability (X) and the employee’s ability to perfume in a task group (Y). Emotional stability was measured by a written test for which the higher the score, the greater is the emotional stability. Ability to perform in a task group (Y=1 if able, Y=0 if unable) was evaluated by the supervisor. The results for 27 employees were:

p3 <- read.xls("/Users/jenniferpolson/Documents/School/2018-W/BE M228/Final/Data Entry.xlsx", 3); p3
  1. Find the maximum likelihood estimates of β0 and β1 from the simple logistic regression, where β0 and β1 are the coefficient of constant and stability.
#install.packages('ROCR')
#library(ROCR)
set.seed(520)

n <- nrow(p3); n
## [1] 27
p <- 0.6 
nrows = seq(1, n, 1)
trainingrows <- sample(nrows, floor(n * p))
training <- p3[trainingrows, ]
testing <- p3[-trainingrows, ]

model3a = glm(Task.Ability ~ Stability, data = training, family = "binomial"); summary(model3a)
## 
## Call:
## glm(formula = Task.Ability ~ Stability, family = "binomial", 
##     data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7874  -0.7089   0.4338   0.7195   1.5228  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -10.84353    5.71420  -1.898   0.0577 .
## Stability     0.02092    0.01058   1.977   0.0480 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.170  on 15  degrees of freedom
## Residual deviance: 15.779  on 14  degrees of freedom
## AIC: 19.779
## 
## Number of Fisher Scoring iterations: 4

The model used is binomial, given that there is a discrete outcome i.e., if the task ability is accomplished or not. The β0, or intercept of the simple logistic regression model is -10.8435312, while β1, which corresponds to the stability coefficient, is 0.0209157.

  1. Obtain exp(estimate of β1) and interpret this number
em3 <- exp(model3a$coefficients[2])

For every unit increase in stability, the odds that the task ability is 1 is multiplied by exp(β1), which is 1.0211359. This is an estimated odds ratio for the stability test measurement.

  1. What is the estimated probability that employees with an emotional stability test score of 550 will be able to perform in a task group?
case3c = data.frame(Task.Ability = 1, Stability = 550)
prob3c = predict(model3a, case3c, type="response")

The probability of task ability given the aforementioned test score is 0.6592798.

  1. Estimate the emotional stability test score for which 70 percent of the employees with this test score are expected to be able to perform in a task group.
score3d = (.70 - model3a$coefficients[1])/model3a$coefficients[2]; score3d
## (Intercept) 
##    551.9083

If 70 percent of the employees are able to perform a task, it is estimated that they will have a score of 551.90832.

  1. What is the area under curve using ROC analysis? What will be the optimal approximate sensitivity and specificity from the model in your opinion? Explain the reason. What decision can you make?
library(pROC)
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
library(ROCR)
## Loading required package: gplots
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
testing$pred <- predict(model3a, newdata = testing, type = "response")
pred_result <- prediction(testing$pred, testing$Task.Ability)
pred_performance <- performance(pred_result,"tpr","fpr")
plot(pred_performance, main = "Model Performance", xlab = "1-Specificity", ylab = "Sensitivity")

roc3 <- pROC::roc(testing$Task.Ability, testing$pred)
auc3d <- auc(roc3); auc3d
## Area under the curve: 0.7857
opts = coords(roc3, x = "best", best.method = "closest.topleft");

The ideal sensitivity and specificty are 0.7142857 and 0.75 respectively, based on the criteria about being closest to the top left of the graph; that is, optimizing for both measures of performance; of note, this is at a probability threshold of 0.6860224. The model performance is 0.7857143, which is not optimal, it is not prudent to make a conclusion about the stability score and predictive capacity for task ability, especially if this is about employees and their livelihood.

Problem 4

An investigator wants to examine data from a study that investigated the effects of a treatment on patients with chronic obstructive pulmonary disease. Patients involved in the study were recruited from three different medical centers - Johns Hopkins University Medical Center, Rancho Los Amigos Medical Center, and the St. Louis University Medical Center. Before combining the subjects for analysis into one group, one should first examine the baseline characteristics to ensure that the patients from the different centers are comparable. Extent of disease is one variable that is most important and is listed below by center:

p4 <- read.xls("/Users/jenniferpolson/Documents/School/2018-W/BE M228/Final/Data Entry.xlsx", 4); p4

Test the null hypothesis at the 5% level, that at the time when the study was conducted, whether the means of extent of disease at all three centers were identical.

#install.packages("MASS")
library(MASS)

group1 <- na.omit(p4$Hopkins)
group2 <- na.omit(p4$Rancho)
group3 <- na.omit(p4$St..Louis..Extent.of.Disease.)
response <- c(group1, group2, group3)
group <- c(rep("A", length(group1)), rep("B", length(group2)), rep("C", length(group3)))
eg1 <- data.frame(response, group)

model4 = lm(eg1$response ~ eg1$group)
summary(model4)
## 
## Call:
## lm(formula = eg1$response ~ eg1$group)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.000  -6.400   1.600   5.462  25.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   39.400      3.168  12.435 1.36e-14 ***
## eg1$groupB     8.138      4.650   1.750   0.0886 .  
## eg1$groupC    -7.400      4.871  -1.519   0.1375    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.27 on 36 degrees of freedom
## Multiple R-squared:  0.2108, Adjusted R-squared:  0.167 
## F-statistic: 4.809 on 2 and 36 DF,  p-value: 0.0141
aov(model4)
## Call:
##    aov(formula = model4)
## 
## Terms:
##                 eg1$group Residuals
## Sum of Squares   1448.144  5420.831
## Deg. of Freedom         2        36
## 
## Residual standard error: 12.27105
## Estimated effects may be unbalanced
  1. State the hypotheses and assumptions of the test used.

Null hypothesis: The mean disease extent is the same among all three centers. Alternative hypothesis: The means of disease extend at all three centers are not identical.

  1. If the null hypothesis is rejected, use a multiple comparisons procedure to see which means are different. State your findings. What conclusion can you make?
p_js = t.test(p4$Hopkins, p4$St..Louis..Extent.of.Disease.)
p_jr = t.test(p4$Hopkins, p4$Rancho)
p_rs = t.test(p4$Rancho, p4$St..Louis..Extent.of.Disease.)

p_raw = c(p_js$p.value, p_jr$p.value, p_rs$p.value)
names = c("Hopkins-STL", "Hopkins-Rancho", "Rancho-STL")
p_adj <- data.frame(names, p_raw)
p_adj$p_bonf = p.adjust(p_adj$p_raw, method = "bonferroni")
p_adj$p_holm = p.adjust(p_adj$p_raw, method = "holm")
p_adj$p_hoch = p.adjust(p_adj$p_raw, method = "hochberg")
p_adj

According to the table above, the means for the samples at Johns Hopkins and St. Louis are the same, no matter what form of correction. Bonferroni is typically regarded s the strictest form of correction; under this adjustment, none of the means are statistically different, and the null hypothesis cannot be rejected. The Holm method, while not as strict, indicates the same thing. However, performing Benjamini-Hochberg correction shows that the Rancho Los Amigos Medical Center is different from the other two centers (p = 0.043)

Problem 5

The medical imaging division and pulmonary medicine are involved in a study emphysematous lungs. A prediction equation needs to be developed predicting Y grams of emphysema from various imaging texture measures x1( m) x2(mm) x3( m2) x4(mm).

p5 <- read.xls("/Users/jenniferpolson/Documents/School/2018-W/BE M228/Final/Data Entry.xlsx", 5)
model5a <- lm(Y..g. ~ x1 + x2 + x3 + x4, data = p5)
s5 <- summary(model5a)
AIC(model5a)
## [1] 77.28271
s5
## 
## Call:
## lm(formula = Y..g. ~ x1 + x2 + x3 + x4, data = p5)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1388 -1.3761 -0.7588  2.1733  4.0201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -30.1369    37.5282  -0.803  0.44264   
## x1            2.0699     0.4562   4.537  0.00141 **
## x2            2.5816     0.7397   3.490  0.00683 **
## x3            0.6360     0.4603   1.382  0.20039   
## x4            1.1060     0.7648   1.446  0.18208   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.107 on 9 degrees of freedom
## Multiple R-squared:  0.9757, Adjusted R-squared:  0.9648 
## F-statistic: 90.18 on 4 and 9 DF,  p-value: 2.954e-07
  1. What are the significant measures to include in a model? Explain the rational of your final model.

Running the linear model of this indicates that only the p-values for x1 and x2 are significant to the model (rs5$coefficients[4,2] and 1.381729), respectively.

  1. What is the multiple R for this equation?
model5b <- lm(Y..g. ~ x1 + x2, data = p5)
s5b <- summary(model5b)
AIC(model5b)
## [1] 76.32074

The multiple R-quared is 0.9756564, adjusted: 0.9648371 for the model that includes all factors. For the model that only inlcudes X1 and X2, the values are 0.9697568, adjusted: 0.9642581. This means that the model including all of the variables is a better model for the proposed study.

  1. Write the equation that you select and predict the grams Y of emphysema given these values for x1=5.2 x2=21.3 x3=19.7 x4=12.2.
case5c = data.frame(x1=5.2,  x2=21.3,  x3=19.7,  x4=12.2)
prob5c <- predict(model5a, case5c, type="response")

The grams of emphesema for the predicted for those parameters is 61.6369059.

Problem 6

In the beginning of lecture, we have discussed this figure below to illustrate the process of statistical study and design. Using the idea of figure and two articles of New England Journal Article with the tile of “Adjuvant Sunitinib in High-Risk Renal-Cell Carcinoma after Nephrectomy” (http://www.nejm.org/doi/pdf/10.1056/NEJMoa1611406), answer the questions below

  1. Write the main hypotheses of the article.

The null hypothesis to be tested is that sunitinib does not significantly affect the incidence of disease recurrence, a second cancer, or death among patients with locoregional renal-cell carcinoma. The alternative hypothesis is that there is an effect of sunitinib on the incidence of events listed above, in that the incidence is lower. This is a one-sided hypothesis.

  1. What was the primary outcome of the study?

The patients with high risk of recurrence have a higher survival in a disease-free state if they take sunitinib. However, there is toxicity as a result,

  1. Describe the study design briefly in two sentences

The study conductors recruited 615 patients with a high-risk form of renal cell carcinoma; these patients either received a placebo or the drug (sunitinib) at the following schedule: 4-weeks-on, 2-weeks-off for one year or disease recurrence. They assesed for disease-free survival, on a double blind, phase III trial.

  1. Describe the statistical method(s) to address the hypotheses. Do you agree with the statistical model?

The methods they used were determining the statistical power required for the target hazard ratio/effect size. They performed a two-sided log rank test, despite the fact that their hypothesis was one-sided. They implemented repreated measure analysis for a few parameters using covariates; multiple testing needs to be taken care when looking at covariates; the mean of the two groups was used to compare overall long-term outcomes.

  1. Using Figure 2 from the article, interpret the hazard ratio of in the study in your words.

The Kaplan-Meier curve for survival indicates a hazard ratio of 0.76, with a 95% CI of 0.59–0.98. This means that, at a given time, around 3/4 of the treatment subjects will experience an event of disease recurrence, a second cancer, or death as compared to the group receiving the placebo.

  1. What medical decision can be inferred from the studies?

The decision here is that there is a slight diminishment of risk of the adverse incidents among patients who take sunitinib, but there are also toxic effects. For patients witht he specific type of renal carcinoma, this can be recommended to reduce recurrence or mortality.

Bonus Questions: (a) Describe the type1 and type2 errors. (b) Describe the p-value. (c) Describe the study power. The authors

Problem 7

Part I

  1. What is P(A=1, B=0)?

P(A) x P(B) = 0.3*0.7 = 0.21

  1. What is P(A=0, E=1)?

P(A = 0)x[P(E|B=0) + P(E|B=1)] = 0.30 x [0.10 + 0.90] = 0.30

Part II

Two pathologists are counting the number of cancer cells in a sample. Imagine that they are simultaneously examining the same tissue sample using their own microscope to make measurements M1 and M2 of the number of cancer cells N in the sample. Each microscope can be out of focus (events F1 and F2). In such a case, the pathologist will undercount by three or more cells or, if N is less than three, fail to detect any cancer cells at all.

Which of them correctly, but not necessarily efficiently, represent the above information?

The fourth network represents this information; the number of cells N isn’t dependent on any of the nodes stated in the problem, and thus the fourth option is the only one with that belief. Moreover, M1 and M2 are dependent on both the number of cells N as well as the focus events of each microscope.

Which network topology is preferred and why?

For Bayesian networks, directed, acyclic graphs provide the best topographical structure. This is because direct and indirect conditional dependencies can be quantified and discerned from the graphical structure. This allows for conditional probability tables to be constructed for a system.