data=read.csv("DosagePilot.csv")
Chat GPT Input: I need to go from wide to long format in R. Columns are samples (A, B,C) and dose (0,5,10,25). It appended a number for replications (A.10.1, A.10.2) So this would be Sample A, dose of 10, replicates 1 and 2. Please provide code to go to long format. Include a screenshot of the data.
We converted the dataset from a wide format to a long format to make it easier to analyze and visualize in R using tidyverse tools. Originally, each column represented a unique experimental condition, where the column names encoded the sample identity (e.g., A or B), dosage level (0, 5, 10, or 25), and replicate number (e.g., .1 or .2). Using pivot_longer(), we combined all measurement columns into a single response column (OD) while preserving the time variable. We then used separate() to split the original column names into three new variables: Sample, Dose, and Replicate. Missing replicate identifiers were assigned a value of 0, allowing columns such as A.10, A.10.1, and A.10.2 to be interpreted consistently as replicate 0, 1, and 2 of Sample A at dose 10. This restructuring produces a tidy dataset in which each row represents a single observation at one time point, making downstream statistical analysis and ggplot visualization much more straightforward.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyverse) # Loads the tidyverse collection of data wrangling packages
data_long <- data %>% # Start with the original dataset called "data"
pivot_longer( # Convert data from wide format to long format
cols = -Time, # Use all columns except the Time column
names_to = "well_name", # Store original column names in a new column called well_name
values_to = "OD" # Store OD600 values in a new column called OD
) %>%
separate( # Split the well_name column into multiple columns
well_name, # Column being split
into = c("Sample", "Dose", "Replicate"), # Names of new columns created
sep = "\\.", # Split wherever there is a period "."
fill = "right" # If replicate is missing, leave it blank instead of erroring
) %>%
mutate( # Modify and clean columns
Dose = as.numeric(Dose), # Convert Dose from text to numeric
Replicate = replace_na(Replicate, "0"), # Replace missing replicate values with 0
Replicate = as.numeric(Replicate) # Convert Replicate from text to numeric
)
#Statistical Comparison
The goal of this code is to prepare the long-format growth data for statistical analysis and then test whether dosage changes growth over time. Because each sample was measured repeatedly across many timepoints, the observations are not fully independent. A linear mixed model is useful here because it works like an ANOVA in the sense that it tests whether groups differ, but it can also include a continuous variable like time and account for repeated measurements from the same sample. The key part of the model is the Dose * Time_hours interaction, which tests whether the slope of growth over time differs among dosage groups.
library(hms)
##
## Attaching package: 'hms'
## The following object is masked from 'package:lubridate':
##
## hms
data_long <- data_long %>% # Start with the long-format dataset
mutate( # Create or modify columns
Dose = factor(Dose), # Treat Dose as a categorical variable instead of a continuous number
Replicate = factor(Replicate), # Treat Replicate as a categorical identifier
Sample_ID = paste(Sample, Dose, Replicate, sep = "_") # Create a unique ID for each sample-dose-replicate combination
)
library(tidyverse) # Load tools for data cleaning, reshaping, and plotting
library(lubridate) # Load tools for working with dates and times
data_long <- data_long %>% # Start with the long-format dataset again
mutate( # Create or modify columns
Time = trimws(as.character(Time)), # Convert Time to text and remove extra spaces
Time_hours = period_to_seconds(lubridate::hms(Time)) / 3600 # Convert Time from HH:MM:SS format into numeric hours
)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Time_hours = period_to_seconds(lubridate::hms(Time))/3600`.
## Caused by warning in `.parse_hms()`:
## ! Some strings failed to parse
library(lme4) # Load functions for running mixed effects models
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
library(lmerTest) # Add p-values and significance tests to lme4 models
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
model <- lmer( # Run a linear mixed effects model
OD ~ Dose * Time_hours + (1 | Sample_ID), # Predict OD using Dose, Time, their interaction, and repeated measures by Sample_ID
data = data_long # Tell R which dataset to use
)
summary(model) # Show the model results, including estimates, standard errors, t-values, and p-values
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: OD ~ Dose * Time_hours + (1 | Sample_ID)
## Data: data_long
##
## REML criterion at convergence: -463.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4400 -0.4324 0.1834 0.7378 1.3642
##
## Random effects:
## Groups Name Variance Std.Dev.
## Sample_ID (Intercept) 0.00173 0.04159
## Residual 0.04221 0.20546
## Number of obs: 1728, groups: Sample_ID, 36
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 8.554e-01 2.390e-02 1.220e+02 35.792 <2e-16 ***
## Dose5 -4.074e-03 3.380e-02 1.220e+02 -0.121 0.9043
## Dose10 1.546e-03 3.380e-02 1.220e+02 0.046 0.9636
## Dose25 2.664e-02 3.380e-02 1.220e+02 0.788 0.4321
## Time_hours 2.408e-02 1.427e-03 1.688e+03 16.871 <2e-16 ***
## Dose5:Time_hours -1.628e-04 2.018e-03 1.688e+03 -0.081 0.9357
## Dose10:Time_hours -2.192e-03 2.018e-03 1.688e+03 -1.086 0.2776
## Dose25:Time_hours -4.534e-03 2.018e-03 1.688e+03 -2.247 0.0248 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Dose5 Dose10 Dose25 Tm_hrs Ds5:T_ D10:T_
## Dose5 -0.707
## Dose10 -0.707 0.500
## Dose25 -0.707 0.500 0.500
## Time_hours -0.702 0.496 0.496 0.496
## Dos5:Tm_hrs 0.496 -0.702 -0.351 -0.351 -0.707
## Ds10:Tm_hrs 0.496 -0.351 -0.702 -0.351 -0.707 0.500
## Ds25:Tm_hrs 0.496 -0.351 -0.351 -0.702 -0.707 0.500 0.500
The model tests:
while accounting for repeated measurements from the same sample.
A linear mixed effects model was used to evaluate how metformin dosage influenced microbial growth over time while accounting for repeated measurements from the same biological replicates. Linear mixed models function similarly to ANOVA in that they test whether groups differ statistically; however, they are more flexible because they can simultaneously analyze continuous predictors, interactions among variables, and repeated observations from the same subjects or samples. In this analysis, OD600 was modeled as a function of dosage (Dose), time (Time_hours), and the interaction between dosage and time (Dose × Time_hours). A random intercept for Sample_ID was included to account for repeated measurements collected from the same culture replicate over time.
The fixed effect labeled (Intercept) represents the expected OD600 value for the reference condition, which in this model corresponds to the control dosage (Dose = 0) at time zero. This value serves as the baseline from which all other effects are interpreted. The significant intercept estimate (β = 0.855, p < 0.001) indicates that the cultures began the experiment with a nonzero starting OD600 value.
The fixed effects labeled Dose5, Dose10, and Dose25 represent the average difference in OD600 between each treatment group and the control group at the starting timepoint. These effects test whether cultures exposed to different metformin concentrations differed from the control before growth over time is considered. None of these dosage main effects were statistically significant (p > 0.43 for all comparisons), indicating that the treatment groups did not differ substantially in baseline OD600 values at the beginning of the experiment.
The Time_hours fixed effect represents the average rate of OD600 increase over time in the control group. This effect measures the overall slope of microbial growth throughout the experiment. The significant positive coefficient for Time_hours (β = 0.0241, p < 0.001) indicates that OD600 values increased significantly across the experimental timecourse, consistent with normal microbial proliferation.
The interaction terms (Dose5:Time_hours, Dose10:Time_hours, and Dose25:Time_hours) represent the extent to which the slope of growth over time differed between each treatment group and the control condition. These interaction effects are the most biologically important components of the model because they test whether metformin altered the trajectory of microbial growth rather than simply shifting OD600 values uniformly upward or downward. The interaction coefficient for the 5-dose treatment was not significant (β = -0.00016, p = 0.936), indicating that the growth trajectory at this concentration did not differ detectably from the control. Similarly, the 10-dose treatment did not significantly alter growth over time (β = -0.00219, p = 0.278), although the negative coefficient suggested a slight nonsignificant trend toward reduced growth.
In contrast, the interaction between the 25-dose treatment and time was statistically significant (β = -0.00453 ± 0.00202 SE, t = -2.247, p = 0.0248). Because this interaction coefficient was negative, it indicates that OD600 increased more slowly over time in the 25-dose treatment relative to the control group. Statistically, this means that the slope of the growth curve for the highest metformin concentration was significantly reduced compared to the slope observed in untreated cultures. Biologically, this suggests that the highest metformin concentration produced a measurable inhibitory effect on microbial growth dynamics over the course of the experiment.
#Calculate the mean and SE OD reading for each triplicate at each time point
Summary_data <- data_long %>% # Start with the long-format dataset
group_by(Time_hours, Dose) %>% # Group rows by both Time and Dose
summarise( # Calculate summary statistics for each group
mean_OD = mean(OD, na.rm = TRUE), # Calculate the mean OD value
se_OD = sd(OD, na.rm = TRUE) / sqrt(n()), # Calculate the standard error of the OD values
.groups = "drop" # Remove grouping after summarising
)
Summary_data$Dose=as.factor(Summary_data$Dose) # Convert Dose into a categorical factor variable
ggplot(Summary_data, aes(x = Time_hours, y = mean_OD, color = Dose, group = Dose)) + # Create the plot and map variables
geom_line(linewidth = 1) + # Draw lines connecting the mean values over time
geom_point(size = 2) + # Add points at each timepoint
geom_errorbar( # Add error bars showing standard error
aes(ymin = mean_OD - se_OD, ymax = mean_OD + se_OD), # Define lower and upper limits of error bars
width = 0.1 # Set the width of the horizontal error bar caps
) +
theme_classic() + # Use a clean classic plot theme
labs( # Add axis labels and plot title
x = "Time", # Label for x-axis
y = "Mean OD600 ± SE", # Label for y-axis
color = "Dosage", # Label for legend
title = "Growth over time by dosage" # Plot title
)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(Summary_data,
aes(x = Time_hours,
y = mean_OD,
color = Dose,
group = Dose)) +
geom_line(linewidth = 1.2) +
geom_errorbar(
aes(ymin = mean_OD - se_OD,
ymax = mean_OD + se_OD),
width = 0.1
) +
coord_cartesian( # Visually zoom into a specific region of the plot without removing data
xlim = c(4, 12), # Only display the x-axis region from 4 to 12 hours
ylim = c(1.15, 1.28) # Only display the y-axis region from OD600 = 1.15 to 1.28
) +
theme_classic(base_size = 14) +
labs(
x = "Time (hours)",
y = "Mean OD600 ± SE",
title = "Growth curves (zoomed view)"
)
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
This graph shows microbial growth over time for cultures exposed to different metformin dosages. The x-axis represents time in hours, while the y-axis represents the mean OD600 value, which is commonly used as a proxy for microbial density or growth. Each colored line corresponds to a different dosage group, with the red line representing the control condition (0 dose) and the remaining lines representing increasing metformin concentrations (5, 10, and 25). Because the graph is “zoomed in,” the y-axis only displays a narrow range of OD600 values, making subtle differences among treatment groups easier to visualize.
The lines themselves represent the average OD600 value across biological replicates at each timepoint. As time progresses, all groups initially increase in OD600, indicating that the cultures are growing. However, the higher dosage groups, particularly the 25-dose treatment, appear to plateau at slightly lower OD600 values than the control group. This visual pattern is consistent with the statistical model, which indicated that the highest metformin concentration significantly reduced growth over time relative to the control.
The vertical bars extending above and below each point are error bars, which in this graph represent the standard error (SE) of the mean. Error bars provide information about variability and uncertainty in the estimated mean at each timepoint. Small error bars indicate that replicate cultures produced very similar OD600 values, while larger error bars indicate greater variability among replicates. In general, when error bars between groups overlap heavily, it suggests that differences among groups may be small relative to the variability in the data. However, overlap alone is not a formal statistical test. Statistical significance is determined more rigorously using the linear mixed effects model, which evaluates differences across all timepoints simultaneously while accounting for repeated measurements within samples.
In this figure, the error bars for the 5-dose and 10-dose groups overlap substantially with the control group throughout most of the experiment, consistent with the nonsignificant interaction terms observed in the model. In contrast, the 25-dose group consistently trends lower than the control and shows reduced growth across time, supporting the statistically significant negative Dose × Time interaction observed for that treatment.
In humans, type 2 diabetes is characterized by impaired glucose regulation, altered insulin signaling, and disruptions in how cells sense and use energy. Metformin helps lower blood glucose primarily by suppressing hepatic glucose production and improving cellular energy balance. However, many of its effects occur through broader nutrient-sensing and metabolic stress pathways that are evolutionarily conserved across organisms, including microbes.
At the cellular level, metformin partially inhibits mitochondrial respiration, reducing ATP production and creating a low-energy state within the cell. In humans, this activates AMPK, a major energy-sensing pathway that helps cells respond to nutrient limitation and energetic stress. AMPK activation shifts metabolism away from energy-consuming anabolic processes and toward energy conservation and stress adaptation. Many microorganisms possess analogous nutrient-sensing and metabolic adaptation systems, meaning they can exhibit similar physiological responses when exposed to metformin.
The nutrient deprivation and refeeding experiments model these metabolic stress conditions in a simplified system. During nutrient deprivation, cells must survive by conserving energy, reorganizing metabolism, and often storing or mobilizing lipids. Refeeding then tests how efficiently cells recover once nutrients become available again. These processes are highly relevant to human metabolic physiology because diabetes, obesity, fasting, insulin resistance, and even exercise all involve changes in nutrient availability and energy regulation.
Nile Red staining adds another layer of metabolic relevance because lipid storage is tightly connected to metabolic health in humans. In diabetes and metabolic syndrome, cells often accumulate abnormal lipid stores, reflecting disruptions in nutrient utilization and energy balance. Similarly, if metformin-treated microbial cells alter lipid accumulation patterns during starvation or refeeding, this suggests that the drug is changing how cells process and store energy. While microbes obviously do not develop diabetes in the human sense, they can still serve as powerful model systems for understanding fundamental metabolic principles such as nutrient sensing, stress adaptation, mitochondrial function, and energy allocation.
Thus, this project uses a simplified microbial system to explore how metformin influences cellular responses to energetic stress. By measuring growth, recovery from nutrient deprivation, and lipid accumulation simultaneously, the study models several core features of metabolic regulation that are also central to human diabetes biology.
Mechanism of Metformin: https://pubmed.ncbi.nlm.nih.gov/28776086/ AMPK, what it does: https://pubmed.ncbi.nlm.nih.gov/25259783/ Nile Red Staining: https://www.researchgate.net/publication/230869693_An_improved_high-throughput_Nile_red_fluorescence_assay_for_estimating_intracellular_lipids_in_a_variety_of_yeast_species Nile red video: https://www.youtube.com/watch?v=dUEBxJrxaQM