Reference Study: DOI:10.3389/fmicb.2025.1534446
Data Source: NCBI GEO GSE282747 / SRA PRJNA1190357
This report details the independent bioinformatic re-analysis of
RNA-seq data from a study investigating the role of the
post-transcriptional regulator CsrA in Geobacter
sulfurreducens. The primary goal was to validate the reported
transcriptomic impact of CsrA deletion and to characterize how its
regulatory role differs between standard biofilm conditions and
electrogenic microbial fuel cell (MFC) environments. Using a
standardized RNA-seq pipeline (quality control, alignment,
quantification, and differential expression analysis), I confirmed that
growth environment (MFC vs. inert glass) is the dominant factor
shaping the global transcriptome. Within this strong
environmental effect, the deletion of csrA induces distinct
expression profiles, supporting its role as a global regulator. I
identified a subset of genes, including GS_RS02340
(downregulated in MFC) and GS_RS01010 (upregulated in
MFC), whose response to the electrogenic environment appears modulated
by CsrA. A notable technical observation was a potential sample
preparation issue in one replicate (SRR31487053). Overall, this analysis
independently corroborates the original study’s findings and provides a
detailed, reproducible workflow for examining environment-dependent
transcriptional regulation in electroactive bacteria.
This report details the independent bioinformatic re-analysis of
RNA-seq data from a study investigating the role of the
post-transcriptional regulator CsrA in Geobacter
sulfurreducens. The primary goal was to validate the reported
transcriptomic impact of CsrA deletion and to characterize how its
regulatory role differs between standard biofilm conditions and
electrogenic microbial fuel cell (MFC) environments. Using a
standardized RNA-seq pipeline (quality control, alignment,
quantification, and differential expression analysis), I confirmed that
growth environment (MFC vs. inert glass) is the dominant factor
shaping the global transcriptome. Within this strong
environmental effect, the deletion of csrA induces distinct
expression profiles, supporting its role as a global regulator. I
identified a subset of genes, including GS_RS02340
(downregulated in MFC) and GS_RS01010 (upregulated in
MFC), whose response to the electrogenic environment appears modulated
by CsrA. A notable technical observation was a potential sample
preparation issue in one replicate (SRR31487053). Overall, this analysis
independently corroborates the original study’s findings and provides a
detailed, reproducible workflow for examining environment-dependent
transcriptional regulation in electroactive bacteria.
Biofilms are structured communities of microorganisms encased in a self-produced extracellular matrix that adheres to biotic or abiotic surfaces. In G. sulfurreducens, biofilm formation is integral to its ability to perform extracellular electron transfer (EET), as biofilms facilitate close contact between cells and insoluble electron acceptors, such as metal oxides or electrodes in microbial fuel cells (MFCs).
Another aspect of the Biofilms, that can help us to understand better
why there could be trascriptome differences are related to the gradient
of the nutrients and how metabolically active organism cohexist in
different layers with dormant cells that are metabolically less active,
then with reduced production of trascriptomes.
CsrA (carbon storage regulator A) is a highly conserved RNA-binding protein that functions as a global post-transcriptional regulator in a wide range of bacteria CsrA is known to regulate diverse cellular processes, including carbon metabolism, motility, biofilm formation, and virulence in various bacterial species. In Escherichia coli, for example, CsrA represses the synthesis of extracellular polysaccharides and curli fibers, thereby inhibiting biofilm formation. Conversely, in Pseudomonas aeruginosa, CsrA positively regulates biofilm development by modulating the expression of genes involved in exopolysaccharide production.
| Category | Technique / Material | Purpose / What It Measured |
|---|---|---|
| Genetic & Microbiological | Markerless deletion system (pK18mobsacB, conjugation) | Construction of the clean ΔcsrA mutant strain. |
| Anaerobic culturing (NBAF, acetate-Fe(III) media) | Standard growth and physiological assays under relevant conditions. | |
| Complementation (pRG5.1-RRflg-csrA plasmid) | To confirm phenotypes were due to csrA loss. | |
| Crystal Violet Staining | Basic quantification of adhered biofilm biomass. | |
| Agglutination Assay | Indirect assessment of pili production and cell clumping. | |
| Microscopy & Imaging | Confocal Laser Scanning Microscopy (CLSM) | High-resolution 3D imaging of biofilm structure. |
| LIVE/DEAD BacLight Stain (SYTO9/PI) | Differentiation and quantification of live vs. dead cells in biofilms. | |
| FTO (Fluorine-doped Tin Oxide) Electrodes | Conductive support for growing electroactive biofilms for CLSM. | |
| Image Analysis (Comstat2, Fiji) | Quantification of biofilm parameters: thickness, roughness, coverage. | |
| Electrochemical | Two-chamber H-cell Microbial Fuel Cell (MFC) | Measuring bioelectrochemical current generation by the bacteria. |
| Graphite Stick Anodes & Potentiostat | Electrode material and instrument to control potential/measure current. | |
| Molecular Biology | RNA Extraction (RNeasy kit) | Isolation of high-quality total RNA from biofilms. |
| RT-qPCR | Validation of differential gene expression from RNA-seq data. | |
| Sequencing & Bioinformatics | RNA-seq (Illumina NextSeq 500) | Primary tool: Genome-wide transcriptome profiling of WT vs. ΔcsrA biofilms on glass and graphite. |
| Ribosomal RNA depletion (RiboMinus kit) | Enrichment for mRNA prior to sequencing. | |
| Differential Expression Analysis (edgeR, DESeq2, NOISeq, limma via IDEAmeX) | Statistical identification of significantly up/down-regulated genes. | |
| Functional Annotation (KEGG) | Categorizing differentially expressed genes into biological pathways. | |
| Motif Discovery (MEME suite) | In silico identification of conserved CsrA binding sequences in 5’-UTRs. | |
| Principal Component Analysis (PCA) | Visualizing global transcriptome differences between samples. | |
| Data Repository (NCBI GEO, GSE282747) | Public archiving of raw and processed RNA-seq data. |
This project aimed to independently process and analyze the publicly deposited RNA-seq data from the above study (GSE282747) with the following objectives:
The datasets utilized in this study were derived from RNA sequencing (RNA-seq) experiments conducted on Geobacter sulfurreducens wild-type (WT) and ΔcsrA mutant strains. Biofilms were cultivated under two distinct conditions: on inert glass supports and on graphite electrodes within microbial fuel cells (MFCs). The RNA-seq data provided comprehensive transcriptomic profiles, enabling the identification of differentially expressed genes associated with the deletion of the csrA gene. The raw and processed RNA-seq data have been deposited in the NCBI Gene Expression Omnibus (GEO) under accession number GSE282747, ensuring accessibility for further analysis and validation by the scientific community.Here the links: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA1190357&o=acc_s%3Aa and the genome used for the mapping of the genes has been collected from https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000007985.2/.
| # Index | Run | BioSample | Bases | Bytes | Experiment | Genotype | Library Name | Creation date | Sample Names | Treatment |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SRR31487053 | SAMN45032779 | 2.61 G | 976.57 Mb | SRX26555218 | DL1 wild type strain | GSM8649882 | 2024-11-25 10:04:00Z | GSM8649882 | Biofilm grown on graphite electrode in microbial fuel cell operated for 2 weeks |
| 2 | SRR31487054 | SAMN45032780 | 2.78 G | 1.02 Gb | SRX26555217 | DL1 wild type strain | GSM8649881 | 2024-11-25 10:05:00Z | GSM8649881 | Biofilm grown on graphite electrode in microbial fuel cell operated for 2 weeks |
| 3 | SRR31487055 | SAMN45032781 | 2.57 G | 961.02 Mb | SRX26555216 | DL1 with csrA-deletion | GSM8649880 | 2024-11-25 10:06:00Z | GSM8649880 | Biofilm grown on graphite electrode in microbial fuel cell operated for 2 weeks |
| 4 | SRR31487056 | SAMN45032782 | 2.51 G | 943.12 Mb | SRX26555215 | DL1 wild type strain | GSM8649879 | 2024-11-25 10:07:00Z | GSM8649879 | Biofilm grown on glass support in NBAF medium for 48 h at 25 oC |
| 5 | SRR31487057 | SAMN45032783 | 2.71 G | 1,007.41 Mb | SRX26555214 | DL1 wild type strain | GSM8649877 | 2024-11-25 10:08:00Z | GSM8649877 | Biofilm grown on glass support in NBAF medium for 48 h at 25 oC |
| 6 | SRR31487058 | SAMN45032784 | 2.33 G | 867.88 Mb | SRX26555213 | DL1 wild type strain | GSM8649876 | 2024-11-25 10:09:00Z | GSM8649876 | Biofilm grown on glass support in NBAF medium for 48 h at 25 oC |
| 7 | SRR31487059 | SAMN45032785 | 3.09 G | 1.13 Gb | SRX26555212 | DL1 with csrA-deletion | GSM8649875 | 2024-11-25 10:10:00Z | GSM8649875 | Biofilm grown on glass support in NBAF medium for 48 h at 25 oC |
| 8 | SRR31487060 | SAMN45032786 | 2.87 G | 1.06 Gb | SRX26555211 | DL1 with csrA-deletion | GSM8649875 | 2024-11-25 10:11:00Z | GSM8649875 | Biofilm grown on glass support in NBAF medium for 48 h at 25 oC |
The two treatments in the table represent two distinct experimental conditions used to cultivate Geobacter sulfurreducens biofilms for RNA-seq analysis. The first treatment involves growing biofilms on a graphite electrode within a functioning microbial fuel cell (MFC) for two weeks. This condition replicates an applied, electroactive environment where the bacteria are respiring by transferring electrons directly to a solid anode to produce electrical current. The second treatment involves growing biofilms on an inert glass support in standard growth medium (NBAF) for 48 hours at 25°C. This is a non-electrogenic, static control condition designed to study biofilm formation without the influence of an electrochemical potential or current flow. By comparing these two setups, the researchers could distinguish the general transcriptional role of the CsrA regulator in biofilm development (glass support) from its specific role in regulating genes essential for extracellular electron transfer and current production (MFC electrode).
On the server Sanger, the following folders have been created to organize the analysis:
The bioinformatic analysis of the RNA-seq data in this study was conducted using a combination of specialized software tools and programming languages. Key tools included:
| Tool | Version | Where installed/used |
|---|---|---|
| FastQC | 0.12.1 | Fedora 43 on WSL |
| bcftools | 1.23 | Fedora 43 on WSL |
| htslib | 1.23 | Fedora 43 on WSL |
| samtools | 1.23 | Fedora 43 on WSL |
| IGV | 2.19.7 | Windows 11 25H2 |
| R | 4.5.2 | Windows 11 25H2 |
| R | 4.5.2 | Fedora 43 |
| MultiQC | 1.32 | Sanger Server/ CentOS Stream 9 |
| Trimmomatic | 0.39 | Sanger Server/ CentOS Stream 9 |
| hisat2 | 2.2.1 | Sanger Server/ CentOS Stream 9 |
| FastQC | 0.12.1 | Sanger Server/ CentOS Stream 9 |
| HTSeq | 0.12.3 | Sanger Server/ CentOS Stream 9 |
An initial quality control (QC) assessment was performed on the raw sequencing reads downloaded from the NCBI SRA. This section details the iterative process of identifying and addressing quality issues, including adapter contamination and ribosomal RNA (rRNA) carryover.
FastQC analysis was conducted on all paired-end samples, and the results were aggregated using MultiQC. The initial report indicated high per-base sequence quality (Q > 30), meeting the standard threshold for reliable downstream analysis.
In the fact, I had to use the script provided in Sanger to run FastQC on all the samples in the folder “1_sra” as Paired Ends.
However, the report flagged a significant level of sequence duplication. The “Overrepresented sequences” module identified specific sequences with exceptionally high counts, warranting further investigation to determine their origin.
To determine if the duplicated sequences represented contamination or technical artifacts, they were subjected to BLASTn analysis against the Geobacter sulfurreducens genome and related taxa.
The three most overrepresented sequences identified were: 1.
Sequence 1
(GTTTGATCCTGGCTCAGAACGAACGCTGGCGGCGTGCCTAACACATGCAA):
This sequence showed a perfect match to the 16S ribosomal RNA
(rRNA) gene, a highly conserved region common across
Geobacter species and other prokaryotes. Its presence is
expected in total RNA samples, as rRNA typically comprises the majority
of cellular RNA, even after depletion protocols. 2. Sequence 2
(GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG):
This homopolymer sequence showed no significant match to the
Geobacter genome. It was identified as a common
Illumina adapter sequence, representing a technical
artifact from the library preparation or sequencing process on the
Illumina NextSeq 500 platform. 3. Sequence 3
(CCCCATTCGGACACCCCCGGATCAAAGCCTGTTTAGCGGCTCCCCGAGGC):
This sequence also matched the 5S rRNA gene, similar to
Sequence 1, confirming its biological origin. 3. Sequence 4
(CCTCACGGTACTTGTGCACTATCGGTCAGAGAGTAGTATTTAGCCTTGGG):
This sequence also matched the 23S rRNA gene, similar
to Sequence 1, confirming its biological origin.Plus as per microbiome
analysis we know that rRNA16 presents some regions highly preservered
among different species, so it is possible that some of these sequences
can be from other species present in the biofilm sample.So, out of
curiosity, I run a BLAST on it and checked the tree view of the species
matched with the BLASTed sequence.
Based on the FastQC findings, Trimmomatic was employed to trim adapter sequences and low-quality bases from the reads. The following parameters were used:
As adaptor file, initially I used the one provided in “/opt/Trimmomatic-0.39/adapters/adapters.fa” then, I thought to add the rRNA sequences with high duplication in the adapter file, and created a new file called “adapters_+_geo_sulf_rna.fna” in my home directory.
This has paid later in the quality checks after the trimmomatic step.
java -jar /opt/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 2 -phred33 /home/$USER/1_sra/SRR314870XX_1.fastq.gz /home/$USER/1_sra/SRR314870XX_2.fastq.gz /home/$USER/3.1_trimmed2_sra/SRR314870xx_1P.fastq.gz /home/$USER/3.1_trimmed2_sra/SRR314870XX_2P.fastq.gz /home/$USER/3.1_trimmed2_sra/SRR314870XX_1U.fastq.gz /home/$USER/3.1_trimmed2_sra/SRR314870XX_2P.fastq.gz ILLUMINACLIP:/home/antoniodg/adapters_+_geo_sulf_rna.fna:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:35
Post-trimming FastQC analysis confirmed the successful removal of adapter sequences and improved overall read quality, with no remaining overrepresented sequences.
FastQC was rerun on the trimmed reads to assess the effectiveness of the trimming process. The post-trimming FastQC reports indicated that the overrepresented sequences had been successfully removed, and the overall quality metrics remained high, with per-base sequence quality scores consistently above Q30+ across all samples. This confirmed that the reads were now suitable for downstream alignment and analysis.
MultiQC was used to aggregate the FastQC reports from all samples before and after trimming. The consolidated report provided a comprehensive overview of the quality metrics across the dataset, highlighting improvements in sequence quality and the elimination of overrepresented sequences post-trimming. The number of duplicated sequences for the major overrepresented reads was reduced to below 1 million occurrences. While traces of Geobacter rRNA sequences persist—an inherent characteristic of biofilm-derived total RNA samples where diverse microbial rRNA may be present—the data was deemed of sufficient quality for robust alignment and differential expression analysis. They are available on the server /home/antoniodg/ folder and they have been made for the dataset raw, for the dataset with only adapter trimming, and for the dataset with adapter + rRNA trimming.
The reference genome for Geobacter sulfurreducens used in this study was obtained from the NCBI database (accession number GCF_000007985.2_ASM798v2). The genome was downloaded in FASTA format along with the corresponding annotation file in GTF/GFF format. Prior to mapping, the reference genome was indexed using HISAT2 to facilitate efficient alignment of RNA-seq reads.
I have downloaded the genome from the NCBI datasets website with wget and the unzipped it in my home directory, in a folder called “Geobacter_sulfurreducens”, then made a new folder called GenomeIndex where I have indexed the genome with hisat2-build as per below:
hisat2-build GCF_000007985.2_ASM798v2_genomic.fna GenomeIndex/genome
This has generated files genome.1.ht2 genome.2.ht2 genome.3.ht2 genome.4.ht2 genome.5.ht2 genome.6.ht2 genome.7.ht2 genome.8.ht2.
During the initial attempts to generate BAM and SAM files from the aligned reads, several errors were encountered. These issues were primarily related to file format inconsistencies and incorrect genome indexing related probably to a syntax error in the command.
The first attempt then generated files with 0 bytes, indicating that the alignment process did not complete successfully. After reviewing the HISAT2 documentation and troubleshooting the command syntax, the errors were resolved by ensuring that the correct paths to the indexed genome and trimmed read files were specified.
[antoniodg@sanger 5_hisat2_trimmed2_genome]$ ll
total 0
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487053.bam
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487053_unmapped.1.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487053_unmapped.2.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487054.bam
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487054_unmapped.1.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487054_unmapped.2.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487055.bam
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487055_unmapped.1.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487055_unmapped.2.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487056.bam
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487056_unmapped.1.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487056_unmapped.2.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487057.bam
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487057_unmapped.1.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487057_unmapped.2.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487058.bam
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487058_unmapped.1.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487058_unmapped.2.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487059.bam
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487059_unmapped.1.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487059_unmapped.2.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487060.bam
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487060_unmapped.1.fastq
-rw-r--r--. 1 antoniodg antoniodg 0 Jan 17 19:38 SRR31487060_unmapped.2.fastq
Then correcting the syntax errors as per below, I obtained files usable in the IGV visualization step.
./bit09-mapping-hisat2-PE.sh /home/antoniodg/3.1_trimmed2_sra/ /home/antoniodg/Geobacter_sulfurreducens/GenomeIndex/genome /home/antoniodg/5_hisat2_trimmed2_genome
The generated BAM files from HISAT2 were further processed using Samtools to ensure they were sorted and indexed for efficient access during visualization and downstream analysis.
In this way I got sorted and filter BAM files ready for visualization in IGV. That could provide me more insights about the expression of the CsrA gene under different growth conditions, and more in general the expressions differences between the wildtype and the ΔcsrA mutant strains under the same growth conditions. Seeing if besides the genetic difference that can explain the regulation also the growth in environment that improve the electrogenic-biofilm formation affects the expression too compared with a standard growth medium.
The aligned reads were visualized using the Integrative Genomics Viewer (IGV) to inspect the quality of alignments and to verify the expression patterns of key genes, including csrA.
The first figure show that there has been probably some error into
sampling preparation as SRR31487053 it seems pretty under expressed
compared with the same wildtype sample SRR31487054 grown under the same
treatment conditons. This can be
related to the fact that there could have been some mistakes during the
RNA extraction or library preparation steps, leading to lower read
counts for that particular sample. Further investigation would be needed
to confirm the exact cause of this discrepancy with the team responsible
for the experimental work. Or it can be that even if the RNA extraction
were done properly, during the sequencing step there could have been
some issues that led to a lower yield of reads for that sample.
Another interesting finding is related to the expression of the gene GS_RS14920, one of the 6 genes that encodes for the rubretythrin family protein, that seems to be overexpressed in the csrA deletion mutant in MFC condition as per below.
In
this case most probably the deletion of the CsrA gene is affecting the
expression more than the environment itself. So, to confirm this
hypothesis I have related these two samples with the same genotypes in
different growth conditions, to see if the expression is affected more
by the genotype or by the environment.
In this case, the expression of GS_RS14920 is consistently higher in the
ΔcsrA mutant (SRR31487055 and SRR31487059) compared to the
wild-type (SRR31487054 and SRR31487058), regardless of the growth
condition. This suggests that the deletion of csrA has a more
pronounced effect on the expression of this gene than the environmental
conditions, indicating that CsrA likely plays a direct regulatory role
over GS_RS14920 expression.
The protein encoded by GS_RS14920 is part of the rubrerythrin family, which is known to be involved in oxidative stress response. The upregulation of this gene in the ΔcsrA mutant could imply that CsrA normally represses its expression, and in its absence, the cell may be compensating for increased oxidative stress or altered redox conditions, particularly relevant in the context of extracellular electron transfer in MFCs.
HTSeq was used to count the number of reads mapping to each gene in the Geobacter sulfurreducens genome. The following command was executed for each BAM file using th script present in the Sanger server.
I have run the script with the CDS option as my organism doesn’t have exons and introns like for the Eukaryotes.So, I have used the coding sequence, although probably I could have assumed the count like the entire genome is an big exon. Then I have tried to use the option intersect-strict and union to see if there were differences in the count.
| SAMPLE | ORIGINAL TOTAL READINGS | READINGS AFTER TRIMMING | % Retention | No Feature Union | No Feature Intersection-strict | Quality Union (%) | Quality Intersection-strict (%) |
|---|---|---|---|---|---|---|---|
| SRR31487053 | 34,277,504 | 17,144,376 | 50.00% | 55 | 57 | 99.9997% | 99.9997% |
| SRR31487054 | 36,571,658 | 18,283,944 | 50.00% | 86,770 | 94,004 | 99.525% | 99.486% |
| SRR31487055 | 33,776,904 | 16,888,355 | 50.00% | 126,857 | 136,080 | 99.249% | 99.194% |
| SRR31487056 | 32,982,970 | 16,491,485 | 50.00% | 85,218 | 91,003 | 99.483% | 99.448% |
| SRR31487057 | 35,637,552 | 17,833,808 | 50.00% | 37,712 | 41,325 | 99.789% | 99.768% |
| SRR31487058 | 30,592,208 | 15,290,104 | 50.00% | 32,036 | 34,688 | 99.791% | 99.773% |
| SRR31487059 | 40,614,636 | 24,812,085 | 61.10% | 71,547 | 76,554 | 99.712% | 99.691% |
| SRR31487060 | 37,826,652 | 23,718,026 | 62.70% | 83,045 | 89,005 | 99.650% | 99.625% |
Overall the difference is slight, but the intersection-strict option is more conservative, leading to slightly fewer reads being counted for some genes. This could be important for downstream differential expression analysis, as it may affect the sensitivity and specificity of detecting true changes in gene expression.Furthermore in the output files made with the union parameter, I have found some ambiguous reads, that I have not found in the intersection-strict output files. So, I have decided to use the intersection-strict output files for the downstream analysis to be more conservative and avoid ambiguous reads that could affect the results.
Another interesting aspect to note is the reading before trimming and after trimming (trimmed2 with adapter + rRNA). The percentage of retention is around 50% for most of the samples, except for the samples SRR31487059 and SRR31487060 that have a retention of 61.10% and 62.70% respectively. This indicates that the trimming process removed a significant portion of the reads, likely due to the presence of adapter sequences and rRNA contamination. The higher retention in these two samples could suggest that they had less contamination or better quality reads to begin with.
Using the scripts in R provided, I have built the files needed for the Differental Expression analysis, specifically all_counts.txt and summary_counts.txt. I have done this process on Windows 11 25H2, so I needed to tune the script for the folder path as it seems that “” is not well interpreted in R on Windows, so I have used “/” instead to identify the path, and in this way the script worked.
The count data generated by HTSeq was imported into R for differential expression analysis. The data was organized into a matrix format, with genes as rows and samples as columns. Normalization of the count data was performed using the TMM (Trimmed Mean of M-values) method implemented in the edgeR package to account for differences in library sizes and sequencing depth across samples.
The script provided, needed to be adjusted to my specific case, as in this report I have actually 4 conditions and 2 samples each, but in order to fit with the requirements of the report, as suggested by the supervisor, I needed to change the point of view on the treatment, and in that case I 2 treatments with 4 samples each. So, when I built the metadata columns I have added an extra column for the treatments and a new identifier as per below.
In this way, I can
have a column called “treatment” with 2 levels: “MFC” and “STD”, and a
column called “genotype” with 2 levels: “WT” and “DT” (deletion type).
This will allow me to perform the differential expression analysis based
on the treatment conditions, while still accounting for the genotype
differences.
So, here below the data samples organised by treatment.
Below a further insight on the data count per million, and a comparison between the raw data and the filtered data. The measure used is Log-CPM (Counts Per Million) that is a common way to normalize RNA-seq data for differences in library size.
Another
interesting aspect is to compare the distribution of the data, before
the normalisation as it is crucial that the normalisation doesn’t affect
that in order to preserve a better biological meaning of the data.
Multidimensional scaling (MDS) analysis of RNA-seq data revealed clear separation of samples primarily by growth condition along treatment parameter, which explained the majority of transcriptional variance. Samples clustered into two distinct groups: biofilm grown on graphite electrodes in microbial fuel cells (MFC) versus biofilm grown on glass supports in NBAF medium. Within each treatment group, secondary separation by genotype (wild type vs csrA-deletion mutant) was observed along Dimension 2. This visualization confirms that growth environment is the dominant factor shaping global gene expression profiles, with genetic background exerting additional but secondary influence.This demonstrates that the shift to an electrogenic, electrode-associated biofilm state drives profound transcriptional reprogramming.
In order to start with this part of the analysis, I had to modify a
bit the script as per below.
In this way, I can compare the two treatments directly. The next step was to remove the heteroscedasticity from the data, in order to have a better statistical power in the analysis.
This step is crucial as it
ensures that the variance of gene expression levels is consistent across
different conditions, which is a key assumption for many statistical
tests used in differential expression analysis. By applying this
transformation, we can improve the reliability of our results and reduce
the likelihood of false positives or negatives.
Another interesting way to have a better overview of the Differental Expression Analysis of the genes is the volcanic plot, that gives a better idea of the genes that are upregulated and downregulated between the two treatments, as per below.
The fold
changes range from -5.8 to +4.0, indicating substantial variation in
gene expression levels between compared conditions. However, all
p-values appear to be non-significant (p = 1.0), as reflected by the
-log₁₀(P-value) values of zero. This suggests either that no
statistically significant differences were detected under the given
thresholds, or that there may be an artifact in the statistical
computation or data representation requiring further validation that we
can investigate
A heatmap was generated to visualize the expression patterns of the top 10 differentially expressed genes across all samples. This visualization highlights the relative expression levels of these genes, allowing for easy identification of patterns associated with each treatment condition.
Key Findings from the Heatmap: - GS_RS02340: Showed consistent downregulation across MFC conditions. - GS_RS01010: Showed specific upregulation. - GS_RS02335 & GS_RS05390: Exhibited clear directional changes (increase and decrease, respectively) in the MFC environment. - Six other genes in the top 10 list maintained near-baseline expression, indicating they are not responsive to this environmental shift.
This re-analysis successfully reproduced key findings from the original study and provided additional layers of interpretation.
Environment is the Primary Transcriptional Driver: The stark separation by condition in the MDS plot underscores that G. sulfurreducens undergoes a major metabolic and physiological rewiring when forming an electrogenic biofilm on an anode. Any study of EET must account for this dominant effect.
CsrA as a Context-Dependent Modulator: While environment is primary, the consistent separation by genotype within each cluster proves CsrA’s regulatory influence is real and active in both conditions. It does not trigger a new state but modifies the transcriptional program within a given environment. This aligns with the original study’s phenotypic data where biofilm amount and current increased, but fundamental respiration pathways remained functional.
Functional Implications of Identified Genes: The downregulation of GS_RS02340 in MFCs warrants investigation into its annotation; if it is involved in a non-essential pathway under electrode respiration, its repression could be a metabolic streamlining strategy. The upregulation of the oxidative stress-related GS_RS14920 in the ΔcsrA mutant provides a plausible mechanistic link: loss of CsrA repression may enhance the cell’s ability to manage reactive oxygen species, potentially facilitating more robust biofilm formation and higher current densities, as observed phenotypically.
Technical Considerations & Limitations: The
identified outlier sample (SRR31487053) highlights the importance of
rigorous QC. Furthermore, an initial simplistic contrast (all MFC
vs. all STD) failed to detect statistically significant DEGs due to
cross-cutting genotype effects. This underscores the necessity of using
factorial models (~ Genotype + Condition) or interaction
tests to properly dissect complex experimental designs.
This project provides a validated, reproducible bioinformatics pipeline for the analysis of Geobacter sulfurreducens RNA-seq data. The analysis confirms that: - The MFC growth environment is the strongest determinant of global gene expression. - The CsrA regulator significantly modifies the transcriptome within both standard and electrogenic conditions. - Specific candidate genes (e.g., GS_RS02340, GS_RS14920) have been identified as being under environmental and CsrA-dependent control.