Yes, the description of the data says” Randomized, Controlled Trial of Licorice Gargle before Intubation for Elective Thoracic Surgery”, there is also a column called “treat” to indicate whether the unit is under the treatment or not.
# install.packages("medicaldata")
library(medicaldata)
# data(package = "medicaldata")
data("licorice_gargle")
data <- licorice_gargle
str(data)
## 'data.frame': 235 obs. of 19 variables:
## $ preOp_gender : num 0 0 0 0 0 0 0 0 0 0 ...
## $ preOp_asa : num 3 2 2 2 1 2 3 2 3 3 ...
## $ preOp_calcBMI : num 33 23.7 26.8 28.4 30.4 ...
## $ preOp_age : num 67 76 58 59 73 61 66 61 83 69 ...
## $ preOp_mallampati : num 2 2 2 2 1 3 1 2 1 2 ...
## $ preOp_smoking : num 1 2 1 1 2 1 1 1 1 3 ...
## $ preOp_pain : num 0 0 0 0 0 0 0 0 0 0 ...
## $ treat : num 1 1 1 1 1 1 1 1 1 1 ...
## $ intraOp_surgerySize : num 2 1 2 3 2 3 3 1 1 2 ...
## $ extubation_cough : num 0 0 0 0 0 0 0 0 0 0 ...
## $ pacu30min_cough : num 0 0 0 0 0 0 0 0 0 0 ...
## $ pacu30min_throatPain : num 0 0 0 0 0 0 0 0 0 0 ...
## $ pacu30min_swallowPain : num 0 0 0 0 0 0 0 0 0 0 ...
## $ pacu90min_cough : num 0 0 0 0 0 0 0 0 0 0 ...
## $ pacu90min_throatPain : num 0 0 0 0 0 0 0 0 0 0 ...
## $ postOp4hour_cough : num 0 0 0 0 0 0 0 0 0 0 ...
## $ postOp4hour_throatPain: num 0 0 0 0 0 0 0 0 0 0 ...
## $ pod1am_cough : num 0 0 0 0 0 0 0 0 0 0 ...
## $ pod1am_throatPain : num 0 0 0 0 0 0 0 0 0 0 ...
By using the str() function, we can see there are 235 observation in the dataset.
By using the str() function above, we can see there are 19 variables as columns in the dataset and using the names() function, we are able to see the names of the columns.
names(data)
## [1] "preOp_gender" "preOp_asa" "preOp_calcBMI"
## [4] "preOp_age" "preOp_mallampati" "preOp_smoking"
## [7] "preOp_pain" "treat" "intraOp_surgerySize"
## [10] "extubation_cough" "pacu30min_cough" "pacu30min_throatPain"
## [13] "pacu30min_swallowPain" "pacu90min_cough" "pacu90min_throatPain"
## [16] "postOp4hour_cough" "postOp4hour_throatPain" "pod1am_cough"
## [19] "pod1am_throatPain"
?licorice_gargle
Using the “?” function, we can see the description of this dataset,
indicates that in preOp_gender
, 0 = Male; 1 = Female.
#calulation the number of specific observation
nm <- nrow(data[data$preOp_gender == 1,])
nf <- nrow(data[data$preOp_gender == 0,])
cat("\nThere are",nm, "males in the data set.\n")
##
## There are 93 males in the data set.
cat("\nThere are",nf, "females in the data set.\n")
##
## There are 142 females in the data set.
nf_32 <- nrow(data[data$preOp_gender == 0 & data$preOp_age <=32,])
cat("\nThere are",nf_32, "females under 32 in the data set.\n")
##
## There are 19 females under 32 in the data set.
There are 19 females under 32 in the data set.
Use the “which” command to identify which rows (i.e., which individuals) were ages 0-32, and were smokers, and which had positive test results (by your definition of positive results).
In the context of this dataset, we hypotheze that gargling with licorice would improve the sore throat in surgery. Hence, the positive test result would be possibly defined as there’s a significant reduction of the sore throat in patients who used the licorice.(treatment group). Here, we can investigate the sore throat pain score in different timestamp as below.
# define the function"slope" to see the trend of throat Pain
slope <- function(row_number){
time_points <- c(1,2,3,4)
pain_level <- data[row_number,c("pacu30min_throatPain", "pacu90min_throatPain","postOp4hour_throatPain", "pod1am_throatPain")]
pain_level <- as.numeric(pain_level)
model <- lm(pain_level ~ time_points)
slope1 <- coef(model)[2]
slope1 <- as.numeric( coef(model)[2])
return(slope1)
}
# calculate the throat Pain and ignore the NA
data$slope <- 0
for(i in 1:nrow(data)){
try (data$slope[i] <- slope(i))
}
## Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
## 0 (non-NA) cases
## Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
## 0 (non-NA) cases
barplot(data$slope)
# identify the positive cases
positive_test <- which(data$preOp_age <=32, data$preOp_smoking ==1, data$slope <= 0)
cat("\nWe have",nrow(data[positive_test,]) ,"The positive test results." )
##
## We have 24 The positive test results.
male_ages <- data$preOp_age[data$preOp_gender==1]
quntile <- quantile(male_ages, probs = c(0.1,0.5,0.9))
cat("\n10% quantile =", quntile[1],"\n50% quantile(median) =", quntile[2],"\n90% quantile =", quntile[3])
##
## 10% quantile = 39.4
## 50% quantile(median) = 60
## 90% quantile = 73
treatment_trend <- data$slope[data$treat==1]
control_trend <-data$slope[data$treat==0]
t_test_result <- t.test(treatment_trend,control_trend)
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: treatment_trend and control_trend
## t = 2.5087, df = 204.34, p-value = 0.01289
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.02939711 0.24523711
## sample estimates:
## mean of x mean of y
## 0.03389831 -0.10341880
We can determine the treatment effect by comparing if there is a trend difference between the treatment group and the control group. I applied a T-test, and the result indicates that there’s a significant difference between the two groups since the p-value is 0.01289. However, it also indicates that the treatment group has a positive slope of 0.03, meaning the trend is going up. In comparison, the control group is going down with an average slope of -0.10, which leads to the conclusion that the throat pain in the treatment group becomes more severe than in the control group. I’m having trouble why it doesn’t match my intuition. There might be some miscalculation or confounding variables that should be included in the analysis.