(1) Whether or not the data derived from an RCT

Yes, the description of the data says” Randomized, Controlled Trial of Licorice Gargle before Intubation for Elective Thoracic Surgery”, there is also a column called “treat” to indicate whether the unit is under the treatment or not.

(2) The number of observations (rows) in the data set

# install.packages("medicaldata")
library(medicaldata)

# data(package = "medicaldata")
data("licorice_gargle")
data <- licorice_gargle
str(data)
## 'data.frame':    235 obs. of  19 variables:
##  $ preOp_gender          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ preOp_asa             : num  3 2 2 2 1 2 3 2 3 3 ...
##  $ preOp_calcBMI         : num  33 23.7 26.8 28.4 30.4 ...
##  $ preOp_age             : num  67 76 58 59 73 61 66 61 83 69 ...
##  $ preOp_mallampati      : num  2 2 2 2 1 3 1 2 1 2 ...
##  $ preOp_smoking         : num  1 2 1 1 2 1 1 1 1 3 ...
##  $ preOp_pain            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ treat                 : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ intraOp_surgerySize   : num  2 1 2 3 2 3 3 1 1 2 ...
##  $ extubation_cough      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pacu30min_cough       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pacu30min_throatPain  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pacu30min_swallowPain : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pacu90min_cough       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pacu90min_throatPain  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ postOp4hour_cough     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ postOp4hour_throatPain: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pod1am_cough          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pod1am_throatPain     : num  0 0 0 0 0 0 0 0 0 0 ...

By using the str() function, we can see there are 235 observation in the dataset.

(3) The number and names of the columns

By using the str() function above, we can see there are 19 variables as columns in the dataset and using the names() function, we are able to see the names of the columns.

 names(data)
##  [1] "preOp_gender"           "preOp_asa"              "preOp_calcBMI"         
##  [4] "preOp_age"              "preOp_mallampati"       "preOp_smoking"         
##  [7] "preOp_pain"             "treat"                  "intraOp_surgerySize"   
## [10] "extubation_cough"       "pacu30min_cough"        "pacu30min_throatPain"  
## [13] "pacu30min_swallowPain"  "pacu90min_cough"        "pacu90min_throatPain"  
## [16] "postOp4hour_cough"      "postOp4hour_throatPain" "pod1am_cough"          
## [19] "pod1am_throatPain"

(4) The types of genders, and the numbers of people who were each gender in the data.

?licorice_gargle

Using the “?” function, we can see the description of this dataset, indicates that in preOp_gender , 0 = Male; 1 = Female.

#calulation the number of specific observation
nm <- nrow(data[data$preOp_gender == 1,])
nf <- nrow(data[data$preOp_gender == 0,])
cat("\nThere are",nm, "males in the data set.\n")
## 
## There are 93 males in the data set.
cat("\nThere are",nf, "females in the data set.\n")
## 
## There are 142 females in the data set.

(5) The number of females younger than 32 years old in the data set.

nf_32 <- nrow(data[data$preOp_gender == 0 & data$preOp_age <=32,])
cat("\nThere are",nf_32, "females under 32 in the data set.\n")
## 
## There are 19 females under 32 in the data set.

There are 19 females under 32 in the data set.

(6) How might a positive test result be defined in this data set, and why?

Use the “which” command to identify which rows (i.e., which individuals) were ages 0-32, and were smokers, and which had positive test results (by your definition of positive results).

In the context of this dataset, we hypotheze that gargling with licorice would improve the sore throat in surgery. Hence, the positive test result would be possibly defined as there’s a significant reduction of the sore throat in patients who used the licorice.(treatment group). Here, we can investigate the sore throat pain score in different timestamp as below.

# define the function"slope" to see the trend of throat Pain
slope <- function(row_number){
  time_points <- c(1,2,3,4)
  pain_level <- data[row_number,c("pacu30min_throatPain", "pacu90min_throatPain","postOp4hour_throatPain", "pod1am_throatPain")]
  pain_level <- as.numeric(pain_level)
  model <- lm(pain_level ~ time_points)
  slope1 <- coef(model)[2]
  slope1 <- as.numeric( coef(model)[2])
  return(slope1)
}

# calculate the throat Pain and ignore the NA 
data$slope <- 0
for(i in 1:nrow(data)){
  try (data$slope[i] <- slope(i))
}
## Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
##   0 (non-NA) cases
## Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
##   0 (non-NA) cases
# identify the positive cases 
positive_test <- which(data$preOp_age <=32, data$preOp_smoking ==1, data$slope <= 0)
cat("\nWe have",nrow(data[positive_test,])  ,"The positive test results." )
## 
## We have 24 The positive test results.

(7) Use the quantile function to identify the median, 10% quantile, and 90% quantile values of ages for males.

male_ages <- data$preOp_age[data$preOp_gender==1]
quntile <- quantile(male_ages, probs = c(0.1,0.5,0.9))

cat("\n10% quantile =", quntile[1],"\n50% quantile(median) =", quntile[2],"\n90% quantile =", quntile[3])
## 
## 10% quantile = 39.4 
## 50% quantile(median) = 60 
## 90% quantile = 73

(8) Write a short paragraph (probably 5-10 sentences) that could serve as the abstract or executive summary of a memo designed to advise the principal investigators of the most important aspects of this study. What if anything do you conclude about treatment effects? Whatever analysis you perform, please describe it briefly (in a paragraph).

treatment_trend <- data$slope[data$treat==1]
control_trend <-data$slope[data$treat==0]

t_test_result <- t.test(treatment_trend,control_trend)
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  treatment_trend and control_trend
## t = 2.5087, df = 204.34, p-value = 0.01289
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.02939711 0.24523711
## sample estimates:
##   mean of x   mean of y 
##  0.03389831 -0.10341880

We can determine the treatment effect by comparing if there is a trend difference between the treatment group and the control group. I applied a T-test, and the result indicates that there’s a significant difference between the two groups since the p-value is 0.01289. However, it also indicates that the treatment group has a positive slope of 0.03, meaning the trend is going up. In comparison, the control group is going down with an average slope of -0.10, which leads to the conclusion that the throat pain in the treatment group becomes more severe than in the control group. I’m having trouble understanding this dataset and why it doesn’t match my understanding. There might be some miscalculation or confounding variables that should be included in the analysis.