Please submit your answers by 5.59 pm Monday February 4, 2019. Remember to show your work. In other words, always use echo=TRUE for the R code chunks that you provide. NOTE - All plots must show proper title, axis lables, and any legends used. Points will be deducted otherwise. Submissions must be made in HTML. Problem 6 has an additional submission to be made in DOCX or PDF format.
# Use this space to call any libraries here
library(plyr)
library(dplyr)
library(lubridate)
library(png)
library(grid)
library(ggplot2)
Q1) In Lecture 3, we worked with two datasets: “case_patients.csv” and “control_patients.csv” (under Files -> lectures -> lecture_3). We will use those files again in this question.
Ans.
# Insert code below
case_patient <- read.csv("~/Desktop/case_patients.csv")
control_patient <- read.csv("~/Desktop/control_patients.csv")
case_patient$outcome <- 'case'
control_patient$outcome <- 'control'
case_patient <- case_patient %>%
mutate(outcome = rep('0', nrow(case_patient)))
control_patient <- control_patient %>%
mutate(outcome = rep('1', nrow(control_patient)))
combine_patients <- rbind(case_patient, control_patient)
boxplot(los~outcome, data = combine_patients,
xlab = "Outcome",
ylab="LOS", ylim=c(0,100))
##Observations
##There is an outlier for one male patient who stayed 400 days (LOS=400)
##When I zoom in with ylim = c, we notice that male mean is around 18 and the 3rd quartile is around 22 and the 2nd quartile is around 15
##When I zoom age with ylim =c, we notice that female mean is around 10 and the 3rd quartile is around 11 and the 2nd quartile is around 9
##Outliers start at 50 for male and at 20 for female
Ans.
# Insert code below
#case_patient <- read.csv("~/Desktop/case_patients.csv")
#control_patient <- read.csv("~/Desktop/control_patients.csv")
# case_patient <- case_patient %>%
# mutate(outcome = rep('0', nrow(case_patient)))
# control_patient <- control_patient %>%
# mutate(outcome = rep('1', nrow(control_patient)))
# combine_patients <- rbind(case_patient, control_patient)
boxplot(los~gender, data = combine_patients,
xlab = "gender",
ylab="LOS", ylim=c(0,100))
##Observations
##There is an outlier for one male patient who stayed 400 days (LOS=400)
## When I zoom in with ylim = c, we notice that male mean is around 10 and the 3rd quartile is around 16 and the 2nd quartile is around 14
##When I zoom age with ylim =c, we notice that female mean is around 10 and the 3rd quartile is around 11 and the 2nd quartile is around 9
##Outliers start at 50 for male and at 20 for female
Q2) In a case-cohort study of 2000 patients studying renal disease caused by smoking, a group found the following numbers.
1. A total of 300 patients suffered renal disease
2. A total of 1000 patients were smokers
3. Two-thirds of the patients who suffered renal disease smoked.
Ans. Primary exposure - smoking Outcome of interest - renal disease
Ans.
| Table | Outcome | No Outcome | Total |
|---|---|---|---|
| Exposure | 200 | 800 | 1000 |
| No exposure | 100 | 900 | 1000 |
| Total | 300 | 1700 | 2000 |
Ans. Odds ratio = odds of exposure among cases/odds of exposure among control = A/C//B/D = 200/100//800/900 = 2.25
Q3) Answer the following:
Ans. Asking for odds = P/1-P We are given P = 18% = 0.18 P/1-P = 0.18/1-0.18 = 21.9%
Ans. Odds are given, asking for probability P/1-P = 0.4 Rearranging, P = 0.28 = 29%
Q4) A hospital H noticed that approximately 10% of patients who underwent a particularly surgery (surgery S) also developed infection during their post-surgery recovery period. For surgery S, the post-surgery recovery period is 2 weeks. It is suspected a majority of these patients (about 60%) may have taken a particular antibiotic (ABX) during a 3-month period prior to the surgery. ABX could kill the “good” bacteria in the patients body causing the patient to be prone to infection after the surgery.
Ans. Outcome of interest = Patients who developed infection during post-surgery S recovery period. Exposure of interest = Taking antibiotic (ABX) 3 months prior to surgery
Ans. Case-control study: We know the outcome so begin with the case and build control 1. Consider all patients who have developed infection post-surgery 2. Build a control set based on age and gender (since this is a matched study) 3. We measure exposure i.e., patients that took antibiotic ABX 3 months prior to surgery
Inclusion criteria: Similar variables (in this case, age and gender), antibiotic ABX taken 3 months prior to surgery
Exclusion criteria: Patient did not undergo surgery S, Patient did not take solely take antibiotic ABX in the specified timeframe (in this case 3 months), Patient is immuno-compromised
Ans. Retrospective: Build cohort using rates of exposure 1. Begin with the entire set of patients admitted to hospital H recently (for ex. last 5 years) 2. Measure how many patients, out of this set, took antibiotic ABX 3 months prior to surgery 3. Determine the outcome i.e., patients who developed infection post-surgery
Inclusion criteria: Patient must be admitted to the hospital in the specified timeframe (in this case, 5 years), Patient has to have taken antibiotic ABX in the specified timeframe (3 months prior to surgery)
Exclusion criteria: Patients who have been in the hospital greater or less than 5 years ago,Patient did not take solely take antibiotic ABX in the specified timeframe (in this case 3 months), Patient is immuno-compromised
Ans. Prospective: Involves enrolling patients into the study for a certain time frame 1. Enroll patients into the study 2. Determine if these patients are on antibiotic ABX 3. Observe to see if these patients develop an infection
Inclusion criteria: Healthy and consenting patients
Exclusion criteria: Patient on different antibiotics of interfering medications, Patient is immuno-compromised
Q5) Creating a Table 1: Under Assignments -> Assignment_II, read in the file called “smokedata.csv”. The codebook is as follows:
. ID: Unique numeric identifier
. quit: 0= did not quit smoking in previous 12 months, 1= quit smoking in previous 12 months
. advice 0= reported ‘did not receive advice from healthcare provider regarding quitting smoking.’, 1= reported ‘did receive advice…’
. ill 0= no major illness diagnosed in previous 12 months, 1= major illness diagnosed in previous 12 months
. sex 0=male, 1=female
. educ 0= less than 12 years, 1= 12 years or more
. provtype 0= Physician, 1= Nurse Practitioner
. birth_yr Four digit year of patient’s birth
. app_yr Four digit year of patient’s appointment
. age_at_app* Patient’s age at the appointment.
*For simplicity, we will assume all patients have already had their birthday when they attend their appointment. Therefore, age should equal app_yr – birth_yr.
The objective of the study behind the dataset was to determine the association between of providing smoking cessation advice to patients and whether they quit smoking during the same time period.
Complete the attached Table 1 (Problem6_Table1.docx) and upload it into Canvas (you can upload either .docx or .pdf). Use the space below to show your work.
# Insert code below
#First read in the dataset csv file
patient_smokedata <- read.csv("~/Desktop/smokedata.csv")
#summary(patient_smokedata)
#Use as.factor() to convert the continous variables to binary
patient_smokedata$quit <- as.factor(patient_smokedata$quit)
patient_smokedata$advice <- as.factor(patient_smokedata$advice)
patient_smokedata$ill <- as.factor(patient_smokedata$ill)
patient_smokedata$sex <- as.factor(patient_smokedata$sex)
patient_smokedata$educ <- as.factor(patient_smokedata$educ)
patient_smokedata$provtype <- as.factor(patient_smokedata$provtype)
#summary(patient_smokedata)
#patient_smokedata <- patient_smokedata[!(patient_smokedata$quit!="1" & patient_smokedata$quit!="0"),]
#patient_smokedata <- patient_smokedata[!(patient_smokedata$birth_yr<1911 | patient_smokedata$birth_yr>2005),]
#patient_smokedata <- patient_smokedata[!(patient_smokedata$age_at_appt<18 | patient_smokedata$age_at_appt>100),]
#patient_smokedata <- patient_smokedata[!(patient_smokedata$appt_yr!="2011"),]
#summary(patient_smokedata)
## Age, means, std dev
patient_smokedata.quit <- filter(patient_smokedata, quit == 1)
patient_smokedata.not.quit <- filter(patient_smokedata, quit == 0)
#summary(patient_smokedata.q)
cat("Mean age for quitters", mean(patient_smokedata.quit$age_at_appt), "\n")
## Mean age for quitters 47.29167
cat("Std dev for quitters", sd(patient_smokedata.quit$age_at_appt), "\n")
## Std dev for quitters 20.75281
cat("Mean age for not quitters", mean(patient_smokedata.not.quit$age_at_appt), "\n")
## Mean age for not quitters 51
cat("Std dev for not quitter", sd(patient_smokedata.not.quit$age_at_appt), "\n")
## Std dev for not quitter 17.27129
# for p-value
age.test <- t.test(patient_smokedata.quit$age, patient_smokedata.not.quit$age)
age.test
##
## Welch Two Sample t-test
##
## data: patient_smokedata.quit$age and patient_smokedata.not.quit$age
## t = -1.1194, df = 69.219, p-value = 0.2668
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -10.316764 2.900098
## sample estimates:
## mean of x mean of y
## 47.29167 51.00000
##Sex (females 1 and males 0)
sex.table <- with(patient_smokedata, table(quit, sex))
sex.table
## sex
## quit 0 1
## 0 68 81
## 1 25 23
chisq.test(sex.table)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: sex.table
## X-squared = 0.37423, df = 1, p-value = 0.5407
##Major Illness
majorillness.table <- with(patient_smokedata, table(quit, ill))
majorillness.table
## ill
## quit 0 1
## 0 131 18
## 1 34 14
chisq.test(majorillness.table)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: majorillness.table
## X-squared = 6.5849, df = 1, p-value = 0.01028
## Education
educ.table <- with(patient_smokedata, table(quit, educ))
educ.table
## educ
## quit 0 1
## 0 69 80
## 1 26 22
chisq.test(educ.table)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: educ.table
## X-squared = 0.61068, df = 1, p-value = 0.4345
## Smoking advice
smokingadvice.table <- with(patient_smokedata, table(quit, advice))
smokingadvice.table
## advice
## quit 0 1
## 0 127 22
## 1 22 26
chisq.test(smokingadvice.table)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: smokingadvice.table
## X-squared = 28.483, df = 1, p-value = 9.451e-08
##Provider type
providertype.table <- with(patient_smokedata, table(quit, provtype))
providertype.table
## provtype
## quit 0 1
## 0 135 14
## 1 44 4
chisq.test(providertype.table)
## Warning in chisq.test(providertype.table): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: providertype.table
## X-squared = 7.7403e-30, df = 1, p-value = 1