Case Study 1

About the Data

This data set is simulated using R . Cancer treatment research often involves evaluating how potential drugs impact key cellular processes such as proliferation, apoptosis, and gene expression. An experiment designed to assess the effect of a drug treatment on cancer cells by comparing biological responses in a control vs. experimental group before and after treatment. The objective is to compare proliferation, apoptosis, and gene expression between control and experimental group.

To download the data used in this analysis, click the link below: case_study1.csv

The preview of the data:

The variables used in this study are as follows: CP stands for Cell Proliferation, AR refers to Apoptosis Rate, OX represents Oncogene X Expression, and TSY denotes Tumor Suppressor Y Expression.

Loading R Packages

To use the following R packages in your code, you’ll need to install them first. Installation requires an active internet connection. Once installed, you do not need an internet connection to load the packages in future sessions.

# data
library(here)      # The location management : import/export
library(tidyverse) # data management

#graphs
library(ggplot2)   # graphs
library(patchwork) # combine graphs

# analysis
library(moments)   # for skewnness and kurtosis

# publication ready table
library(gtsummary) # table generation
library(flextable) # save the table

Data Import

To import the data file named ‘case_study1.csv’, which is stored in the ‘data’ folder of your project directory, use the following R code. Let us read the CSV file into R and provide an overview of the data set, including missing values and summary statistics.

D <- read.csv(here("data", "case_study1.csv"))
glimpse(D)
Rows: 20
Columns: 6
$ ID    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
$ Group <chr> "Control", "Control", "Control", "Control", "Control", "Control"…
$ CP    <dbl> 1.84, 1.67, 1.68, 1.62, 1.49, 1.96, 1.70, 1.21, 1.74, 1.51, 0.99…
$ AR    <dbl> 6.25, 5.97, 5.96, 7.37, 5.77, 7.52, 4.45, 6.58, 6.12, 6.22, 14.5…
$ OX    <dbl> 3.10, 2.87, 2.72, 2.29, 3.28, 2.30, 3.69, 3.37, 2.48, 2.09, 1.27…
$ TSY   <dbl> 1.53, 1.24, 1.11, 0.97, 0.48, 1.44, 0.66, 1.32, 1.67, 0.67, 2.16…
sum(is.na(D)) # number of missing values
[1] 0
summary(D)
       ID           Group                 CP              AR        
 Min.   : 1.00   Length:20          Min.   :0.820   Min.   : 4.450  
 1st Qu.: 5.75   Class :character   1st Qu.:1.005   1st Qu.: 6.195  
 Median :10.50   Mode  :character   Median :1.145   Median :10.105  
 Mean   :10.50                      Mean   :1.304   Mean   :10.430  
 3rd Qu.:15.25                      3rd Qu.:1.673   3rd Qu.:14.562  
 Max.   :20.00                      Max.   :1.960   Max.   :16.030  
       OX             TSY       
 Min.   :0.850   Min.   :0.480  
 1st Qu.:1.427   1st Qu.:1.208  
 Median :1.915   Median :1.735  
 Mean   :2.095   Mean   :1.568  
 3rd Qu.:2.757   3rd Qu.:2.002  
 Max.   :3.690   Max.   :2.200  

Descriptive Statistics and Assessment of Normality

In this analysis, we calculate key descriptive statistics for multiple variables across different groups in the data set. These statistics include the mean, standard deviation (SD), median, skewness, and kurtosis, which provide insights into the central tendency, spread, and shape of the data distribution. Additionally, the Shapiro-Wilk test is applied to assess the normality of each variable within each group. The p-value from the Shapiro-Wilk test indicates whether the data deviate significantly from a normal distribution.

tb_CP <-  D |> group_by(Group) |> 
  summarise(mean = mean(CP),
            sd = sd(CP),
            median = median(CP),
            skew = skewness(CP),
            kurt = kurtosis(CP),
            pvalue = shapiro.test(CP)$p.value)

tb_AR <-  D |> group_by(Group) |> 
  summarise(mean = mean(AR),
            sd = sd(AR),
            median = median(AR),
            skew = skewness(AR),
            kurt = kurtosis(AR),
            pvalue = shapiro.test(AR)$p.value)

tb_OX <-  D |> group_by(Group) |> 
  summarise(mean = mean(OX),
            sd = sd(OX),
            median = median(OX),
            skew = skewness(OX),
            kurt = kurtosis(OX),
            pvalue = shapiro.test(OX)$p.value)

tb_TSY <-  D |> group_by(Group) |> 
  summarise(mean = mean(TSY),
            sd = sd(TSY),
            median = median(TSY),
            skew = skewness(TSY),
            kurt = kurtosis(TSY),
            pvalue = shapiro.test(TSY)$p.value)

The output of the above code chunk is a tibble containing the descriptive statistics for each variable, including the mean, standard deviation, median, skewness, kurtosis, and the p-value from the Shapiro-Wilk test for normality.

print(tb_CP)
# A tibble: 2 × 7
  Group         mean    sd median   skew  kurt pvalue
  <chr>        <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>
1 Control      1.64  0.206   1.67 -0.604  3.24 0.729 
2 Experimental 0.965 0.106   1    -0.288  1.38 0.0634
print(tb_AR)
# A tibble: 2 × 7
  Group         mean    sd median   skew  kurt pvalue
  <chr>        <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>
1 Control       6.22 0.858   6.17 -0.358  3.31  0.306
2 Experimental 14.6  0.997  14.6  -0.314  2.79  0.679
print(tb_OX)
# A tibble: 2 × 7
  Group         mean    sd median   skew  kurt pvalue
  <chr>        <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>
1 Control       2.82 0.534   2.80  0.180  1.78  0.747
2 Experimental  1.37 0.294   1.38 -0.292  1.97  0.667
print(tb_TSY)
# A tibble: 2 × 7
  Group         mean    sd median   skew  kurt pvalue
  <chr>        <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>
1 Control       1.11 0.404   1.18 -0.226  1.75  0.658
2 Experimental  2.03 0.130   2.02 -0.227  1.91  0.514

Visualization

ggplot(D, aes(x = Group, y = CP,col = Group)) +
  geom_boxplot() + 
  labs(title = "Boxplot of Cell Proliferation by Group",        
    x = "Group",                            
    y = "Cell Proliferation") +
  theme_minimal()

ggplot(D, aes(x = Group, y = AR,col = Group)) +
  geom_boxplot() + 
  labs(title = "Boxplot of Apoptosis Rate by Group",        
       x = "Group",                            
       y = "Apoptosis Rate") +
  theme_minimal()

ggplot(D, aes(x = Group, y = OX,col = Group)) +
  geom_boxplot() + 
  labs(title = "Boxplot of Oncogene X Expression by Group",        
       x = "Group",                            
       y = "Oncogene X Expression") +
  theme_minimal()

ggplot(D, aes(x = Group, y = TSY,col = Group)) +
  geom_boxplot() + 
  labs(title = "Boxplot of Tumor Suppressor Y Expression by Group",        
       x = "Group",                            
       y = "Tumor Suppressor Y Expression") +
  theme_minimal()

To generate all four graphs in a single file, you can use the following code. This task requires the patchwork package, which allows for easy composition of multiple plots into one layout.

g1 <-  ggplot(D, aes(x = Group, y = CP,col = Group)) +
  geom_boxplot() + 
  labs(title = "Cell Proliferation",        
       x = "Group",                            
       y = "Cell Proliferation") +
  theme_minimal()

g2 <-  ggplot(D, aes(x = Group, y = AR,col = Group)) +
  geom_boxplot() + 
  labs(title = "Apoptosis Rate",        
       x = "Group",                            
       y = "Apoptosis Rate") +
  theme_minimal()

g3 <-  ggplot(D, aes(x = Group, y = OX,col = Group)) +
  geom_boxplot() + 
  labs(title = "Oncogene X Expression",        
       x = "Group",                            
       y = "Oncogene X Expression") +
  theme_minimal()

g4 <- ggplot(D, aes(x = Group, y = TSY,col = Group)) +
  geom_boxplot() + 
  labs(title = "Tumor Suppressor Y Expression",        
       x = "Group",                            
       y = "Tumor Suppressor Y Expression") +
  theme_minimal()

g1+g2+g3+g4

To have one legend and overall graph title

g1+g2+g3+g4+
  plot_layout(guides = "collect")+
  plot_annotation(title = "Distribution of Cellular Parameters")

Two Group Comparison Using t test

The following code performs independent sample t-tests to compare the means of different variables across groups in the data set. The t-test is used to assess if there is a statistically significant difference between the means of two groups for each variable. The assumption of equal variances is specified using var.equal = TRUE , Welch’s t-test (which does not assume equal variances) is used by setting equal.var = FALSE.

t.test(CP~Group, data = D)

    Welch Two Sample t-test

data:  CP by Group
t = 9.2241, df = 13.461, p-value = 3.451e-07
alternative hypothesis: true difference in means between group Control and group Experimental is not equal to 0
95 percent confidence interval:
 0.518991 0.835009
sample estimates:
     mean in group Control mean in group Experimental 
                     1.642                      0.965 
t.test(AR~Group, data = D)

    Welch Two Sample t-test

data:  AR by Group
t = -20.24, df = 17.606, p-value = 1.226e-13
alternative hypothesis: true difference in means between group Control and group Experimental is not equal to 0
95 percent confidence interval:
 -9.292083 -7.541917
sample estimates:
     mean in group Control mean in group Experimental 
                     6.221                     14.638 
t.test(OX~Group, data = D)

    Welch Two Sample t-test

data:  OX by Group
t = 7.5245, df = 13.993, p-value = 2.782e-06
alternative hypothesis: true difference in means between group Control and group Experimental is not equal to 0
95 percent confidence interval:
 1.03596 1.86204
sample estimates:
     mean in group Control mean in group Experimental 
                     2.819                      1.370 
t.test(TSY~Group, data = D)

    Welch Two Sample t-test

data:  TSY by Group
t = -6.8337, df = 10.834, p-value = 3.05e-05
alternative hypothesis: true difference in means between group Control and group Experimental is not equal to 0
95 percent confidence interval:
 -1.2142207 -0.6217793
sample estimates:
     mean in group Control mean in group Experimental 
                     1.109                      2.027 

Publication Ready Table

To perform descriptive statistics and calculate the mean difference between groups for the selected variables (CP, AR, OX, TSY), the following code is used. It summarizes the data by group, providing the mean and standard deviation for each variable, and then calculates the mean difference between the groups.

D1 <- D |> select(Group,CP,AR,OX,TSY)
D1 |> tbl_summary(by = Group,
                 statistic = ~"{mean} ({sd})")|> 
                   add_difference()

Characteristic

Control
N = 10

1

Experimental
N = 10

1

Difference

2

95% CI

2,3

p-value

2
CP 1.64 (0.21) 0.97 (0.11) 0.68 0.52, 0.84 <0.001
AR 6.2 (0.9) 14.6 (1.0) -8.4 -9.3, -7.5 <0.001
OX 2.82 (0.53) 1.37 (0.29) 1.4 1.0, 1.9 <0.001
TSY 1.11 (0.40) 2.03 (0.13) -0.92 -1.2, -0.62 <0.001
1

Mean (SD)

2

Welch Two Sample t-test

3

CI = Confidence Interval

To ensure that numerical values in the summary table are displayed with three decimal places, the following code is used.

D1 |> tbl_summary(by = Group,
                  statistic = ~"{mean} ({sd})",
                  digits =  ~ 3)|> 
                   add_difference() 

Characteristic

Control
N = 10

1

Experimental
N = 10

1

Difference

2

95% CI

2,3

p-value

2
CP 1.642 (0.206) 0.965 (0.106) 0.68 0.52, 0.84 <0.001
AR 6.221 (0.858) 14.638 (0.997) -8.4 -9.3, -7.5 <0.001
OX 2.819 (0.534) 1.370 (0.294) 1.4 1.0, 1.9 <0.001
TSY 1.109 (0.404) 2.027 (0.130) -0.92 -1.2, -0.62 <0.001
1

Mean (SD)

2

Welch Two Sample t-test

3

CI = Confidence Interval

To save the table as a Word document titled “summary_table” in the “results” folder, you can use the following code.

tb <- D1 |> tbl_summary(by = Group,
                        statistic = ~"{mean} ({sd})")|> 
  add_difference()

tb1 <- as_flex_table(tb)
save_as_docx(tb1, path = here("results", "summary_table.docx"))