# data
library(here) # The location management : import/export
library(tidyverse) # data management
#graphs
library(ggplot2) # graphs
library(patchwork) # combine graphs
# analysis
library(moments) # for skewnness and kurtosis
# publication ready table
library(gtsummary) # table generation
library(flextable) # save the tableCase Study 1
About the Data
This data set is simulated using R . Cancer treatment research often involves evaluating how potential drugs impact key cellular processes such as proliferation, apoptosis, and gene expression. An experiment designed to assess the effect of a drug treatment on cancer cells by comparing biological responses in a control vs. experimental group before and after treatment. The objective is to compare proliferation, apoptosis, and gene expression between control and experimental group.
To download the data used in this analysis, click the link below: case_study1.csv
The preview of the data:
The variables used in this study are as follows: CP stands for Cell Proliferation, AR refers to Apoptosis Rate, OX represents Oncogene X Expression, and TSY denotes Tumor Suppressor Y Expression.
Loading R Packages
To use the following R packages in your code, you’ll need to install them first. Installation requires an active internet connection. Once installed, you do not need an internet connection to load the packages in future sessions.
Data Import
To import the data file named ‘case_study1.csv’, which is stored in the ‘data’ folder of your project directory, use the following R code. Let us read the CSV file into R and provide an overview of the data set, including missing values and summary statistics.
D <- read.csv(here("data", "case_study1.csv"))
glimpse(D)Rows: 20
Columns: 6
$ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
$ Group <chr> "Control", "Control", "Control", "Control", "Control", "Control"…
$ CP <dbl> 1.84, 1.67, 1.68, 1.62, 1.49, 1.96, 1.70, 1.21, 1.74, 1.51, 0.99…
$ AR <dbl> 6.25, 5.97, 5.96, 7.37, 5.77, 7.52, 4.45, 6.58, 6.12, 6.22, 14.5…
$ OX <dbl> 3.10, 2.87, 2.72, 2.29, 3.28, 2.30, 3.69, 3.37, 2.48, 2.09, 1.27…
$ TSY <dbl> 1.53, 1.24, 1.11, 0.97, 0.48, 1.44, 0.66, 1.32, 1.67, 0.67, 2.16…
sum(is.na(D)) # number of missing values[1] 0
summary(D) ID Group CP AR
Min. : 1.00 Length:20 Min. :0.820 Min. : 4.450
1st Qu.: 5.75 Class :character 1st Qu.:1.005 1st Qu.: 6.195
Median :10.50 Mode :character Median :1.145 Median :10.105
Mean :10.50 Mean :1.304 Mean :10.430
3rd Qu.:15.25 3rd Qu.:1.673 3rd Qu.:14.562
Max. :20.00 Max. :1.960 Max. :16.030
OX TSY
Min. :0.850 Min. :0.480
1st Qu.:1.427 1st Qu.:1.208
Median :1.915 Median :1.735
Mean :2.095 Mean :1.568
3rd Qu.:2.757 3rd Qu.:2.002
Max. :3.690 Max. :2.200
Descriptive Statistics and Assessment of Normality
In this analysis, we calculate key descriptive statistics for multiple variables across different groups in the data set. These statistics include the mean, standard deviation (SD), median, skewness, and kurtosis, which provide insights into the central tendency, spread, and shape of the data distribution. Additionally, the Shapiro-Wilk test is applied to assess the normality of each variable within each group. The p-value from the Shapiro-Wilk test indicates whether the data deviate significantly from a normal distribution.
tb_CP <- D |> group_by(Group) |>
summarise(mean = mean(CP),
sd = sd(CP),
median = median(CP),
skew = skewness(CP),
kurt = kurtosis(CP),
pvalue = shapiro.test(CP)$p.value)
tb_AR <- D |> group_by(Group) |>
summarise(mean = mean(AR),
sd = sd(AR),
median = median(AR),
skew = skewness(AR),
kurt = kurtosis(AR),
pvalue = shapiro.test(AR)$p.value)
tb_OX <- D |> group_by(Group) |>
summarise(mean = mean(OX),
sd = sd(OX),
median = median(OX),
skew = skewness(OX),
kurt = kurtosis(OX),
pvalue = shapiro.test(OX)$p.value)
tb_TSY <- D |> group_by(Group) |>
summarise(mean = mean(TSY),
sd = sd(TSY),
median = median(TSY),
skew = skewness(TSY),
kurt = kurtosis(TSY),
pvalue = shapiro.test(TSY)$p.value)The output of the above code chunk is a tibble containing the descriptive statistics for each variable, including the mean, standard deviation, median, skewness, kurtosis, and the p-value from the Shapiro-Wilk test for normality.
print(tb_CP)# A tibble: 2 × 7
Group mean sd median skew kurt pvalue
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Control 1.64 0.206 1.67 -0.604 3.24 0.729
2 Experimental 0.965 0.106 1 -0.288 1.38 0.0634
print(tb_AR)# A tibble: 2 × 7
Group mean sd median skew kurt pvalue
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Control 6.22 0.858 6.17 -0.358 3.31 0.306
2 Experimental 14.6 0.997 14.6 -0.314 2.79 0.679
print(tb_OX)# A tibble: 2 × 7
Group mean sd median skew kurt pvalue
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Control 2.82 0.534 2.80 0.180 1.78 0.747
2 Experimental 1.37 0.294 1.38 -0.292 1.97 0.667
print(tb_TSY)# A tibble: 2 × 7
Group mean sd median skew kurt pvalue
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Control 1.11 0.404 1.18 -0.226 1.75 0.658
2 Experimental 2.03 0.130 2.02 -0.227 1.91 0.514
Visualization
ggplot(D, aes(x = Group, y = CP,col = Group)) +
geom_boxplot() +
labs(title = "Boxplot of Cell Proliferation by Group",
x = "Group",
y = "Cell Proliferation") +
theme_minimal()ggplot(D, aes(x = Group, y = AR,col = Group)) +
geom_boxplot() +
labs(title = "Boxplot of Apoptosis Rate by Group",
x = "Group",
y = "Apoptosis Rate") +
theme_minimal()ggplot(D, aes(x = Group, y = OX,col = Group)) +
geom_boxplot() +
labs(title = "Boxplot of Oncogene X Expression by Group",
x = "Group",
y = "Oncogene X Expression") +
theme_minimal()ggplot(D, aes(x = Group, y = TSY,col = Group)) +
geom_boxplot() +
labs(title = "Boxplot of Tumor Suppressor Y Expression by Group",
x = "Group",
y = "Tumor Suppressor Y Expression") +
theme_minimal()To generate all four graphs in a single file, you can use the following code. This task requires the patchwork package, which allows for easy composition of multiple plots into one layout.
g1 <- ggplot(D, aes(x = Group, y = CP,col = Group)) +
geom_boxplot() +
labs(title = "Cell Proliferation",
x = "Group",
y = "Cell Proliferation") +
theme_minimal()
g2 <- ggplot(D, aes(x = Group, y = AR,col = Group)) +
geom_boxplot() +
labs(title = "Apoptosis Rate",
x = "Group",
y = "Apoptosis Rate") +
theme_minimal()
g3 <- ggplot(D, aes(x = Group, y = OX,col = Group)) +
geom_boxplot() +
labs(title = "Oncogene X Expression",
x = "Group",
y = "Oncogene X Expression") +
theme_minimal()
g4 <- ggplot(D, aes(x = Group, y = TSY,col = Group)) +
geom_boxplot() +
labs(title = "Tumor Suppressor Y Expression",
x = "Group",
y = "Tumor Suppressor Y Expression") +
theme_minimal()
g1+g2+g3+g4To have one legend and overall graph title
g1+g2+g3+g4+
plot_layout(guides = "collect")+
plot_annotation(title = "Distribution of Cellular Parameters")Two Group Comparison Using t test
The following code performs independent sample t-tests to compare the means of different variables across groups in the data set. The t-test is used to assess if there is a statistically significant difference between the means of two groups for each variable. The assumption of equal variances is specified using var.equal = TRUE , Welch’s t-test (which does not assume equal variances) is used by setting equal.var = FALSE.
t.test(CP~Group, data = D)
Welch Two Sample t-test
data: CP by Group
t = 9.2241, df = 13.461, p-value = 3.451e-07
alternative hypothesis: true difference in means between group Control and group Experimental is not equal to 0
95 percent confidence interval:
0.518991 0.835009
sample estimates:
mean in group Control mean in group Experimental
1.642 0.965
t.test(AR~Group, data = D)
Welch Two Sample t-test
data: AR by Group
t = -20.24, df = 17.606, p-value = 1.226e-13
alternative hypothesis: true difference in means between group Control and group Experimental is not equal to 0
95 percent confidence interval:
-9.292083 -7.541917
sample estimates:
mean in group Control mean in group Experimental
6.221 14.638
t.test(OX~Group, data = D)
Welch Two Sample t-test
data: OX by Group
t = 7.5245, df = 13.993, p-value = 2.782e-06
alternative hypothesis: true difference in means between group Control and group Experimental is not equal to 0
95 percent confidence interval:
1.03596 1.86204
sample estimates:
mean in group Control mean in group Experimental
2.819 1.370
t.test(TSY~Group, data = D)
Welch Two Sample t-test
data: TSY by Group
t = -6.8337, df = 10.834, p-value = 3.05e-05
alternative hypothesis: true difference in means between group Control and group Experimental is not equal to 0
95 percent confidence interval:
-1.2142207 -0.6217793
sample estimates:
mean in group Control mean in group Experimental
1.109 2.027
Publication Ready Table
To perform descriptive statistics and calculate the mean difference between groups for the selected variables (CP, AR, OX, TSY), the following code is used. It summarizes the data by group, providing the mean and standard deviation for each variable, and then calculates the mean difference between the groups.
D1 <- D |> select(Group,CP,AR,OX,TSY)
D1 |> tbl_summary(by = Group,
statistic = ~"{mean} ({sd})")|>
add_difference()Characteristic |
Control |
Experimental |
Difference 2 |
95% CI 2,3 |
p-value 2 |
|---|---|---|---|---|---|
| CP | 1.64 (0.21) | 0.97 (0.11) | 0.68 | 0.52, 0.84 | <0.001 |
| AR | 6.2 (0.9) | 14.6 (1.0) | -8.4 | -9.3, -7.5 | <0.001 |
| OX | 2.82 (0.53) | 1.37 (0.29) | 1.4 | 1.0, 1.9 | <0.001 |
| TSY | 1.11 (0.40) | 2.03 (0.13) | -0.92 | -1.2, -0.62 | <0.001 |
| 1
Mean (SD) |
|||||
| 2
Welch Two Sample t-test |
|||||
| 3
CI = Confidence Interval |
|||||
To ensure that numerical values in the summary table are displayed with three decimal places, the following code is used.
D1 |> tbl_summary(by = Group,
statistic = ~"{mean} ({sd})",
digits = ~ 3)|>
add_difference() Characteristic |
Control |
Experimental |
Difference 2 |
95% CI 2,3 |
p-value 2 |
|---|---|---|---|---|---|
| CP | 1.642 (0.206) | 0.965 (0.106) | 0.68 | 0.52, 0.84 | <0.001 |
| AR | 6.221 (0.858) | 14.638 (0.997) | -8.4 | -9.3, -7.5 | <0.001 |
| OX | 2.819 (0.534) | 1.370 (0.294) | 1.4 | 1.0, 1.9 | <0.001 |
| TSY | 1.109 (0.404) | 2.027 (0.130) | -0.92 | -1.2, -0.62 | <0.001 |
| 1
Mean (SD) |
|||||
| 2
Welch Two Sample t-test |
|||||
| 3
CI = Confidence Interval |
|||||
To save the table as a Word document titled “summary_table” in the “results” folder, you can use the following code.
tb <- D1 |> tbl_summary(by = Group,
statistic = ~"{mean} ({sd})")|>
add_difference()
tb1 <- as_flex_table(tb)
save_as_docx(tb1, path = here("results", "summary_table.docx"))