This article has Open Peer Review reports available.
RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
© The Author(s). 2017
Received: 28 May 2017
Accepted: 22 August 2017
Published: 5 September 2017
Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization.
Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli, and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/.
The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
KeywordsRNA-seq Transcriptomics Normalization Gene expression DNA damage Stress response
To validate this method, we examined the bacterial response (in E. coli) to the chemotherapeutic drug mitomycin C (MMC), which we investigated at early and late time-points by RNA-seq. MMC is a potent DNA crosslinker that will ultimately generate double-stranded breaks (DSB) in DNA and thereby activate the so-called SOS response. The SOS response is initiated whenever DNA damage occurs. This generates single-stranded DNA (ssDNA) which is bound by the RecA protein. RecA-nucleofilaments subsequently trigger autocleavage of the LexA repressor that controls a regulon of >50 genes in E. coli, many of which have functions in DNA repair [9–12]. The response to DNA damage has been intensively studied by microarray analysis of E. coli cells that have been treated with UV light, MMC, or quinolone antibiotics [13–17]. However, none of these studies followed the response to high levels of DNA damage over an extended period of time, nor did they capture possible, more subtle changes in gene expression. Here, relative changes in transcript levels were calculated from an RNA-seq study of E. coli cells treated with a high dose of MMC for up to 90 min. Moose 2 performed better than the other tested normalization methods in terms of (i) the Euclidean distances, which are lower in within-replicate comparisons than in cross-condition comparisons, as is to be expected; and (ii) minimizing the variation of expression values across samples. Thus, the moose 2 results could be used to predict the expression profiles of functional groups, such as the LexA regulon, which gave exciting new insights into the bacterial response after prolonged DNA damage. In addition to this bacterial system, we also applied the approach to three eukaryotic data sets and compared the results to other methods.
Cultivation of bacteria and sampling
Escherichia coli MG1655 cells, obtained from CGSC (Coli Genetic Stock Center) at Yale University (http://cgsc.biology.yale.edu) were grown aerobically in Luria broth (LB) at 37 °C. Triplicate overnight cultures were diluted 1:100 into fresh LB and grown for 2 h to reach an OD600 of ~0.35. Mitomycin C (MMC) was added at a final concentration of 2.5 μg/ml to induce DNA damage. Samples were withdrawn at 0, 30, and 90 min, and immediately mixed with 0.25 vol of RNA stop solution (95% ethanol, 5% phenol) on ice. Cells were pelleted by centrifugation and frozen in liquid nitrogen. Pellets were thawed on ice and immediately processed for RNA extraction.
RNA extraction and quality control
Total RNA was prepared by the hot acid-phenol method . Cells were resuspended in lysis buffer (100 mM Tris pH 7.5, 40 mM EDTA, 200 mM NaCl, 0.5% SDS) and incubated at 65 °C for 5 min. After adding acidic phenol (pH 4.0) to the cell lysate, extraction mixtures were incubated at 65 °C for 3 min, frozen in liquid nitrogen, and centrifuged for phase separation. RNA was precipitated from supernatants using isopropanol and pelleted by centrifugation. RNA was washed in 75% ethanol and resuspended in RNase-free water. Samples were treated with DNase I (Thermo Scientific) and extracted with phenol/chloroform, followed by precipitation with isopropanol as before. PCR using primers BB1 (GCT TTA CAG GGG AGA CAA) and BB2 (AAC CCG CAC GCT AAA TAT) was applied to test for DNA contamination. Absorbance ratios of A260/A280 and A260/A230 were determined using a NanoDrop ND-1000 spectrophotometer to assure purity of RNA. RNA integrity was assessed on 1% agarose gels containing 25 mM guanidinium thiocyanate and by analysis with an Agilent 2100 Bioanalyzer (Agilent Technologies) using the Agilent RNA 6000 Pico Kit for total prokaryotic RNA.
Library preparation, RNA-sequencing and read mapping
Preparation of cDNA libraries
Sequencing libraries were prepared with the Encore Complete Prokaryotic RNA-Seq DR Multiplex System (NuGEN) using 200 ng total RNA as input. After cDNA synthesis, cDNA was fragmented by ultra-sonication using the Covaris S-Series System according to the recommendations of the Encore protocol. Adapters containing unique barcode sequences and target sites for Illumina sequencing primers were ligated to cDNA fragments. After strand selection and adapter cleavage, cDNA was amplified to yield strand-specific ready-to-use sequencing libraries. Length distribution of amplified cDNA fragments was validated by Agilent 2100 Bioanalyzer (Agilent Technologies) using the Agilent High Sensitivity DNA Kit. Ultra-sonication and length distribution analysis were performed by the SNP&SEQ Technology Platform in Uppsala, Sweden (www.sequencing.se).
Quality control of sequencing libraries
The quality of the libraries was evaluated using the Advanced Analytical Technologies Fragment Analyzer and a DNA-kit (DNF910). The adapter-ligated fragments were quantified by qPCR using the Library quantification kit for Illumina (KAPA Biosystems) on a StepOnePlus instrument (Applied Biosystems/Life Technologies) prior to cluster generation and sequencing.
Cluster generation and sequencing
An 11 pM solution of sequencing library was subjected to cluster generation and paired-end sequencing with 100 bp read length on the HiSeq 2500 system (Illumina Inc.) using v3 chemistry according to the manufacturer’s protocols. Base calling was done by RTA 18.104.22.168 and the resulting.bcl files were demultiplexed and converted to fastq format with tools provided by CASAVA 1.8.4 (Illumina Inc.), allowing for one mismatch in the index sequence. Additional statistics on sequence quality were compiled with an in-house script from the fastq-files, RTA and CASAVA output files. Sequencing was performed by the SNP&SEQ Technology Platform in Uppsala, Sweden (www.sequencing.se).
Reads were mapped against the E. coli K-12 MG1655 genome sequence using Papaya (http://sourceforge.net/projects/satsuma/) from the Satsuma  package, and counted against the NCBI GenBank annotations (NC_000913), requiring a minimum alignment length of 60 nt and identity >0.98.
RNA concentrations were determined with a Qubit 2.0 Fluorometer (Invitrogen) using the Qubit RNA HS Assay Kit (Molecular Probes). Primers for qRT-PCR were optimized using primer design software (Additional file 1). The Brilliant III Ultra-Fast SYBR Green QRT-PCR Master Mix (Agilent Technologies) was applied to perform an ultra-fast one-step protocol on a StepOnePlus Real-Time PCR System (Applied Biosystems/Life Technologies) using the following settings for amplification: 50 °C - 10 min, 95 °C - 3 min, 45× (95 °C - 5 s, 60 °C - 10 s). Initial RNA concentrations were set to 1 ng/μl, except for ssrA and rrsA (set to 10 pg/μl). Melting curves were recorded to monitor amplification specificity. All samples were measured as technical triplicates. Cycle threshold (Ct) values were automatically determined in the linear amplification phase as implemented in the StepOne Software v2.3. Relative fold changes of gene expression were calculated according to the 2-ΔΔCt method . The average Ct values of the six reference genes cysG, idnT, hcaT, ihfB, ssrA, and rrsA were used for normalization.
Identifying putative invariant genes
In order to speed up the computation, the linear program in Eq. (1) is solved by applying a dynamic programming algorithm. Also, in this way, the costs c ij , given by the expression in Eq. (2), do not have to be pre-computed. The specific values of the reward sinks b i = − r = − 800 (r ≫ 1 in order to ensure that known reference genes are included in the path), which is roughly the number of genes divided by the number of reference genes. The penalties m = 4.0 and h = 5.0, which control the number of identified in silico reference genes, are chosen to produce about 30 reference genes that are roughly evenly spaced in log(expression) space (for parameter choice, see Additional file 2 and the moose 2 manual on the web site at http://grabherr.github.io/moose2/. In case no reference genes are provided, the algorithm selects the statistically best estimate from the experimental data.
Non-linear (polynomial) correction
The algorithms are implemented in C++ and are publicly available from https://github.com/grabherr/moose2 under the General Public License (GPL).
Linear (global) normalization
For linear normalization we used four different algorithms: RPKM , UQ , DESeq2 , and TMM . DEseq2 and TMM normalization were performed using the R statistical language (http://www.r-project.org/) and Bioconductor (http://www.bioconductor.org/) packages ‘DESeq2’  and ‘edgeR’ . In the ‘DESeq2’ package, functions estimateSizeFactors and sizeFactors were called to receive sample-specific normalization factors. In the ‘edgeR’ package, function calcNormFactors was called to output scaling factors together with the original library size. The product of the original library size and the scaling factor, the so-called effective library size, was used as a sample-specific normalization factor. All fold-changes were calculated as direct log2 ratios from the normalized read counts.
For identification of differentially expressed genes the moose 2 -corrected counts were first transformed using the voom function then subjected to the actual gene expression analysis using Limma . In short, voom estimates mean-variance relationships of log-counts for the samples and Limma identifies differentially expressed genes in a linear modelling framework using an empirical Bayesian method to moderate standard errors and estimate fold changes between conditions. From these values, moderated t-statistic and the corresponding p-values are estimated. Finally, Benjamini-Hochberg correction of p-values was used to control the false discovery rate.
Hierarchical cluster analysis of RNA-Seq samples was performed using the R statistical language (http://www.r-project.org/). Function dist (method = euclidean) was called to generate distance matrices based on log2-transformed read counts (with a pseudocount of 1). Alternatively, the ‘DESeq2’ package provides log-transformation schemes that are based on per-gene dispersion estimates. Expression data, normalized by DESeq’s sizeFactors, were applied to the regularized log transformation (function rlog) and variance stabilizing transformation (VST; function varianceStabilizingTransformation), using dispersion estimates calculated with function estimateDispersions. Distance matrices were subsequently generated. All distance matrices were used as input for function hclust (method = ward.D2) to generate cluster trees.
Expression cluster analysis of time-series data based on fuzzy c-means was performed using the R statistical language (http://www.r-project.org/) and Bioconductor (http://www.bioconductor.org/) package ‘Mfuzz’ . For soft clustering of the Top-1000 list (see main text), the fuzzification parameter was set to m = 2 and the number of clusters to c = 6. Results are displayed in Additional file 3.
Functional annotation clustering of genes was performed using the DAVID bioinformatics database  (http://david.abcc.ncifcrf.gov/home.jsp). Medium classification stringency with default settings was used to generate clusters of gene ontology (GO) terms based on biological process (BP), cellular component (CC), and molecular function (MF). Pathway clustering was performed in a similar way using the KEGG (http://www.genome.jp/kegg/) pathway option. Results are displayed in Additional file 4.
The original RNA-seq datasets for E. coli are distributed with moose 2 (https://github.com/grabherr/moose2). Moose 2 -normalized data including significance analysis can be found as Additional file 5.
Distortion of RNA-seq read counts depends on transcript abundance
In silico reference genes allow for non-linear transformation of expression values
Expression-invariant genes in E. coli during DNA damage as identified by the DP scheme and used for moose 2
uroporphyrin III C-methyltransferase
dipeptide/tripeptide:H+ symporter DtpB
Cell division protein ftsX
Cell division protein ftsY
DNA gyrase, subunit B
putative transport protein, major facilitator superfamily (MFS)
L-idonate / 5-ketogluconate / gluconate transporter IdnT
integration host factor (IHF), beta subunit
member of ATP-dependent helicase superfamily II
formamidopyrimidine DNA glycosylase
A/G-specific adenine glycosylase
nucleoside diphosphate kinase
iron-sulfur cluster scaffold protein
polynucleotide phosphorylase monomer
RhsB protein in rhs element
30S ribosomal subunit protein S21
16S ribosomal RNA (rrsA)
16S ribosomal RNA (rrsE)
16S ribosomal RNA (rrsG)
Rac prophage; predicted tail fiber assembly protein
Rac prophage; cold shock protein, function unknown
zinc, cadmium and lead efflux system
heavy metal divalent cation transporter ZupT
We next investigated the individual contributions stemming from (a) predicted in silico reference genes; (b) the quadratic correction term; and (c) guiding the in silico prediction by established reference genes. We thus eliminated each feature individually, and found that the sample grouping was only correct when including both in silico prediction and non-linear correction based on the quadratic term (Additional file 9). To verify, we normalized the data with DESeq2 based on (a) the six established reference genes, and (b) the 33 predicted invariant genes, confirming that a non-linear correction term is required for correct sample grouping (Additional file 10).
To assess the role of accurate reference genes, we ran moose 2 with (a) six genes that were randomly selected over the expression range; and (b) the six most differentially expressed genes (Additional file 9). Interestingly, the resulting sample groupings are correct even in the second case, due to moose 2 rejecting five out of the six genes as too costly to use in the dynamic programming step, reverting to a different set of predictions. In either experiment, however, the RLE boxplots are not as closely centered on zero and/or the variance is larger than when incorporating the six established reference genes used in the full analysis (Additional file 11).
In-silico predicted expression-invariant transcripts contain housekeeping genes
Genes with stable expression patterns across a variety of conditions often serve housekeeping functions, such as transcription, translation, or replication. The DP scheme in the moose 2 pipeline predicted 27 expression-invariant genes, many of which have indeed housekeeping functions (Table 1). The most dominant group comprises genes with a function in translation, including ribosomal RNAs, one ribosomal protein, transfer RNAs, and one aminoacyl tRNA synthetase. Furthermore, the predicted invariant genes were enriched for functions in DNA replication and repair. These findings support the accuracy of the DP scheme for the identification of reference genes.
We tested the 33 reference genes (six established and 27 predicted) for their correlation to each other across all RNA-seq samples. Since normalization is expected to reduce systematic errors in read counts, such as library size effects, one would expect more unstructured correlation patterns for expression-invariant genes after normalization. While global normalization (e.g. by DESeq2; Additional file 12) produced a structured correlation pattern with many estimated coefficients (Pearson’s Rho) clearly divergent from zero, moose 2 produced a less structured pattern, which would be expected, and more coefficients close to zero (Additional file 13). This analysis suggested that systematic biases are efficiently reduced by moose 2 .
Accurate prediction of expression changes reveals new features of the bacterial response to DNA damage
For further validation, we selected 19 E. coli genes, eight being members of the SOS response, for qRT-PCR, widely accepted as the gold-standard for assessing gene expression changes [2, 27]. While the Pearson correlation coefficients between qRT-PCR and moose 2 were comparable to the TMM and DESeq2 methods (Additional file 14), linear regression showed that the constant term, describing the global shift of the data, was closest to zero for moose 2 (−0.072 and −0.021 respectively), compared to TMM (0.264 and −0.381) and DESeq2 (0.204 and −0.413). Even though expression ratios predicted by moose 2 might be slightly underestimated in some cases (Additional file 14), the overall qRT-PCR results were reliably reproduced by moose 2 .
Moose 2 can be applied to complex eukaryotic samples
Adequate normalization of RNA-seq data is an essential step required to reliably predict differentially expressed genes . The correct choice of a normalization method depends on the assumptions that are valid for the particular biological system under investigation. For example, the RPKM method (normalization by library size) assumes that the RNA amount per cell is not changed between conditions, an assumption that is easily violated by differential expression of highly expressed genes [2, 5]. Methods such as DESeq2 and TMM assume that the number of up- and down-regulated genes are balanced between conditions, i.e. expression changes are symmetric. These methods might, therefore, perform poorly when the symmetry of expression changes is skewed towards one direction . The genomes of bacteria comprise relatively few genes (e.g. E. coli: ~4500) compared to other organisms, thus any response to outside stimuli might involve a large fraction of genes. This can be exemplified by bacteria exposed to extreme stresses [13, 32] or environments such as macrophages [33, 34], where hundreds of genes are differentially expressed, representing up to one fourth of the whole genome. Even though expression changes are not necessarily asymmetric, many experiments show a clear trend towards either side of regulation. As an alternative to the aforementioned methods, expression data can be normalized by applying control genes, which are either external controls (spike-ins) [7, 35, 36], or invariant genes [2, 37]. The underlying assumption for the latter is that at least a small number of genes exist that are not subject to changes in expression in the given experiment. Here, we applied a new method, moose 2 , which first identifies such a small subset of invariant genes in silico, and further applies different normalization factors based on the expression strength of each individual gene. In an experiment exposing E. coli to high doses of the DNA damaging agent MMC for an extended period of time, global changes in gene expression can be expected [13, 15], and were validated here (Additional file 16). Importantly, gene expression changes might not be symmetric, since the log2 ratio distributions (Fig. 2d) are skewed towards up- and down-regulation for the 30-to-0-min and 90-to-0-min comparisons, respectively. The assumption of symmetric gene expression changes is therefore unjustified, thus necessitating an approach that relies on a different assumption, as the existence of invariant genes. The dynamic programming step in the moose 2 pipeline, guided by six established reference genes, predicted 27 additional genes to be expressed at stable levels across experiments. The majority of these invariant genes are known to perform housekeeping functions. We do note, however, that the term “invariant” only applies to genes that do not change expression in any given experiment, so that the selection of these genes depends on the conditions. While the prediction of in silico genes appears stable in the experiments presented here, we caution that there are cases in which this scheme might perform poorly, notably when analyzing large numbers of conditions, or expression across species. Furthermore, in case invariant genes do not exist, the main assumption of moose 2 is violated and alternative methods are preferable. For example, global changes in gene expression, where most of the genes are up-regulated, have been observed in tumor cells and termed transcriptional amplification . In this special case, invariant genes do not exist and external controls (spike-ins) are needed for adequate normalization of RNA-seq data: in the study of Lovén et al.  cyclic loess normalization on the spike-ins was successfully applied. Hence, there are limitations to the usage of moose 2 , even though it is expected to perform well for most experimental settings. However, the experimentalist should in every case carefully check the justification of the assumptions before deciding on a normalization method.
Reduction of in-between replicate variation by non-linear correction schemes has already been suggested for microarray experiments [39–41], and our data indicate the general strength of such methods in removing technical bias in expression data as well. Interestingly, for our E. coli data set, we found that the quadratic correction term used in the moose 2 pipeline performs best, when based on in silico prediction of invariant genes (Additional file 9). The in silico prediction step is clearly a reasonable basis for a non-linear transformation. The accuracy of subsequently calculated log2 ratios was experimentally verified through qRT-PCR for selected genes (Additional file 14). Moose 2 , therefore, allows for precisely identifying differentially expressed genes, e.g. using Limma , or other methods that are continuously being developed and refined in parallel to advances in RNA-seq technologies .
The costs for high-throughput sequencing have rapidly decreased over the years, and sequencing of any prokaryotic or eukaryotic organism has become achievable. However, for many organisms, well-annotated reference genomes are unavailable. De novo transcriptome assembly represents an attractive strategy to assess non-sequenced organisms, despite being a bigger informatics challenge than reference-based transcriptome assembly [43, 44]. Furthermore, for downstream analyses such as qRT-PCR, robust reference genes are often needed, but generally not known for non-sequenced organisms. Identification of invariant genes by moose 2 might help to establish reference genes for accurate normalization of qRT-PCR . Different methods have been described for data-driven identification of reference genes , and ideally, these methods should be combined with moose 2 to define a reliable set of reference genes. We predict that applying de novo transcriptome assembler together with reference gene identification will benefit the establishment of new model organisms.
As a showcase, we used the moose 2 approach to investigate the response of E. coli to prolonged DNA damage caused by MMC. Since there are many treatments that can evoke DNA damage, like ionizing radiation, UV light, DNA gyrase inhibitors, and DNA crosslinkers, the gene expression changes presented here are considered as the MMC-specific response to DNA damage. Also, in a comparative study, aiming to define a global network scheme based on compilations of microarrays, it was found that the SOS response is the only transcriptional response that is consistently triggered upon DNA damage regardless of the toxic agent . As expected and observed here, the degree of induction of several SOS response genes relies on the HI value of the corresponding LexA-box: the lower the HI value, the higher the induction (Fig. 4b). The 90-to-30-min comparison however revealed that most LexA-dependent genes clearly decrease in expression level at the late time-point, which cannot be attributed to their HI values. Since most of the LexA-dependent genes solely depend on LexA and Sigma70 for transcription , it is likely that transcript stability and other post-transcriptional mechanisms are pivotal. The strong expression increase of the toxin gene tisB (Fig. 4a) is of particular interest, since TisB targets the inner membrane to impair the proton motive force, which then contributes to persister cell formation under DNA-damaging conditions [46–49]. Persisters are transiently drug-tolerant cells that are arrested in their growth due to the action of toxins. The TisB-dependent growth arrest might be accompanied by downstream expression changes, relevant to the persister phenomenon. Gene expression at an early time-point of DNA damage (here 30 min) generally represents the effort to counteract the stressful condition, i.e. inhibiting cell division and repairing DNA damages. The situation changes at late stages (here 90 min), when a fraction of cells has experienced a high level of DNA damage and consequently died, while the surviving subpopulation (i.e. persisters) have only faced moderate DNA damage . So, the SOS response is expected to decline, and this is exactly what we observe (Fig. 4a). Since our RNA-seq data are based on bulk experiments, conclusions have to be drawn cautiously, and some of the gene expression changes at 90 min may reflect the surviving subpopulation. The clear up-regulation of transporters and enzymes involved in purine and amino acid metabolism (Fig. 5b) might factor into long-term survival strategies of the bacteria. Interestingly, the same GO terms have been found to be under-represented during short periods of DNA damage , and might therefore be highly specific to late adaptation processes.
In summary, we present a novel method, moose 2 , and show that it corrects for systematic bias in RNA-seq expression data from a bacterial data set by normalizing expression values against a set of genes that were predicted as invariant in silico. Moreover, when applied to more complex eukaryotic data sets, the method performs consistently as well as, or better than, other RNA-seq normalization methods, indicating that its algorithm is also applicable to a wider set of organisms. The software is modular and can easily be integrated with other methods that require a set of invariant genes for normalization. Moose 2 is written in C++ and freely available as source code under the General Public License from http://grabherr.github.io/moose2/.
We thank Mirthe Hoekzema for comments on the manuscript and Klev Diamanti for support with R statistical language. Sequencing was performed by the SNP&SEQ Technology Platform in Uppsala. The facility is part of the National Genomics Infrastructure (NGI) Sweden and Science for Life Laboratory. The SNP&SEQ Platform is also supported by the Swedish Research Council and the Knut and Alice Wallenberg Foundation.
This work was in part supported by a Swedish Research Council Formas grant to M.G.G. We further acknowledge the European Molecular Biology Organization (EMBO) for a long-term fellowship [ALTF 62–2012 to B.A.B.] and the German Research Foundation (DFG) for a research fellowship [BE 5210/1–1 and BE 5210/2–1 to B.A.B.]. Lab work was funded through the Swedish Research Council [VR 621–2010-5233 to E.G.H.W.].
Availability of data and materials
Project name: moose 2 .
Project home page: http://grabherr.github.io/moose2/
Operating system: Linux.
Programming language: C++.
License: GNU GPL.
The datasets generated and analyzed during the current study, including the read counts, are publicly available from https://github.com/grabherr/moose2 under the General Public License (GPL), or are included in this published article and its additional information files.
BAB, EGHW, and MGG designed the work. BAB performed the experiments. BAB, TK, TKäl, and MGG analyzed the data. All authors contributed to writing the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–8.View ArticlePubMedGoogle Scholar
- Bullard JH, Purdom E, Hansen KD, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010;11:94.View ArticlePubMedPubMed CentralGoogle Scholar
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.View ArticlePubMedPubMed CentralGoogle Scholar
- Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25.View ArticlePubMedPubMed CentralGoogle Scholar
- Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2012;14:671–83.View ArticlePubMedGoogle Scholar
- Evans C, Hardin J, Stoebel DM. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief Bioinform. 2017. doi:10.1093/bib/bbx008. [Epub ahead of print]
- Risso D, Ngai J, Speed TP, Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotech. 2014;32:896–902.View ArticleGoogle Scholar
- Zhuo B, Emerson S, Chang JH, Di Y. Identifying stably expressed genes from multiple RNA-Seq data sets. PeerJ. 2016;4:e2791.View ArticlePubMedPubMed CentralGoogle Scholar
- Little JW. Mechanism of specific LexA cleavage: autodigestion and the role of RecA coprotease. Biochimie. 1991;73:411–22.View ArticlePubMedGoogle Scholar
- Lewis L, Harlow GR, Gregg-Jolly LA, Mount DW. Identification of high affinity binding sites for LexA which define new DNA damage-inducible genes in Escherichia Coli. J Mol Biol. 1994;241:507–23.View ArticlePubMedGoogle Scholar
- Fernandez De Henestrosa AR, Ogi T, Aoyagi S, Chafin D, Hayes JJ, Ohmori H, et al. identification of additional genes belonging to the LexA regulon in Escherichia Coli. Mol Microbiol. 2000;35:1560–72.View ArticlePubMedGoogle Scholar
- Wade JT, Reppas NB, Church GM, Struhl K. Genomic analysis of LexA binding reveals the permissive nature of the Escherichia Coli genome and identifies unconventional target sites. Genes Dev. 2005;19:2619–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Khil PP, Camerini-Otero RD. Over 1000 genes are involved in the DNA damage response of Escherichia Coli. Mol Microbiol. 2002;44:89–105.View ArticlePubMedGoogle Scholar
- Courcelle J, Khodursky A, Peter B, Brown PO, Hanawalt PC. Comparative gene expression profiles following UV exposure in wild-type and SOS-deficient Escherichia Coli. Genetics. 2001;158:41–64.PubMedPubMed CentralGoogle Scholar
- Kyeong SJ, Xie Y, Hiasa H, Khodursky AB. Analysis of pleiotropic transcriptional profiles: a case study of DNA gyrase inhibition. PLoS Genet. 2006;2:1464–76.Google Scholar
- Hong J, Ahn JM, Kim BC, Gu MB. Construction of a functional network for common DNA damage responses in Escherichia Coli. Genomics. 2009;93:514–24.View ArticlePubMedGoogle Scholar
- Sangurdekar DP, Srienc F, Khodursky AB. A classification based framework for quantitative description of large-scale microarray data. Genome Biol. 2006;7:R32.View ArticlePubMedPubMed CentralGoogle Scholar
- Blomberg P, Wagner EG, Nordström K. Control of replication of plasmid R1: the duplex between the antisense RNA, CopA, and its target, CopT, is processed specifically in vivo and in vitro by RNase III. EMBO J. 1990;9:2331–40.PubMedPubMed CentralGoogle Scholar
- Grabherr MG, Russell P, Meyer M, Mauceli E, Alföldi J, Palma F Di, et al. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 2010;26:1145–1151.Google Scholar
- Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(−Delta Delta C(T)) method. Methods. 2001;25:402–8.View ArticlePubMedGoogle Scholar
- Griva I, Nash SG, Sofer A. Linear and nonlinear optimization: second edition. SIAM 2009. ISBN: 978-0-898716-61-0Google Scholar
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.View ArticlePubMedGoogle Scholar
- Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.View ArticlePubMedPubMed CentralGoogle Scholar
- Kumar L. E Futschik M. Mfuzz: a software package for soft clustering of microarray data. Bioinformation. 2007;2:5–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang DW, Lempicki RA, Sherman BT. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4:44–57.View ArticleGoogle Scholar
- Zhou K, Zhou L, Lim Q, Zou R, Stephanopoulos G, Too H-P. Novel reference genes for quantifying transcriptional responses of Escherichia Coli to protein overexpression by quantitative PCR. BMC Mol Biol. 2011;12:18.View ArticlePubMedPubMed CentralGoogle Scholar
- Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol. 2006;24:1115–22.View ArticlePubMedGoogle Scholar
- Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muñiz-Rascado L, García-Sotelo JS, et al. RegulonDB v8.0: Omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 2013;41(Database issue):D203–D213.View ArticlePubMedGoogle Scholar
- Lee JV, Carrer A, Shah S, Snyder NW, Wei S, Venneti S, et al. Akt-dependent metabolic reprogramming regulates tumor cell histone acetylation. Cell Metab. 2014;20:306–19.View ArticlePubMedPubMed CentralGoogle Scholar
- Tian B, Li X, Kalita M, Widen SG, Yang J, Bhavnani SK, et al. Analysis of the TGFβ-induced program in primary airway epithelial cells shows essential role of NF-κB/RelA signaling network in type II epithelial mesenchymal transition. BMC Genomics. 2015;16:529.View ArticlePubMedPubMed CentralGoogle Scholar
- Hoeppner MP, Lundquist A, Pirun M, Meadows JRS, Zamani N, Johnson J, et al. An improved canine genome and a comprehensive catalogue of coding genes and non-coding transcripts. PLoS One. 2014;9:e91172.View ArticlePubMedPubMed CentralGoogle Scholar
- Kröger C, Colgan A, Srikumar S, Händler K, Sivasankaran SK, Hammarlöf DL, et al. An infection-relevant transcriptomic compendium for salmonella enterica serovar typhimurium. Cell Host Microbe. 2013;14:683–95.View ArticlePubMedGoogle Scholar
- Tolman JS, Valvano MA. Global changes in gene expression by the opportunistic pathogen Burkholderia cenocepacia in response to internalization by murine macrophages. BMC Genomics. 2012;13:63.View ArticlePubMedPubMed CentralGoogle Scholar
- Srikumar S, Kröger C, Hébrard M, Colgan A, Owen SV, Sivasankaran SK, et al. RNA-seq Brings New Insights to the Intra-Macrophage Transcriptome of Salmonella Typhimurium. PLoS Pathog. 2015;11(11).Google Scholar
- Lovén J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, et al. Revisiting global gene expression analysis. Cell. 2012;151:476–82.View ArticlePubMedPubMed CentralGoogle Scholar
- Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 2011;21:1543–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Mar JC, Kimura Y, Schroder K, Irvine KM, Hayashizaki Y, Suzuki H, et al. Data-driven normalization strategies for high-throughput quantitative RT-PCR. BMC Bioinformatics. 2009;10:110.View ArticlePubMedPubMed CentralGoogle Scholar
- Lin CY, Lovén J, Rahl PB, Paranal RM, Burge CB, Bradner JE, et al. Transcriptional amplification in tumor cells with elevated c-Myc. Cell. 2012;151:56–67.View ArticlePubMedPubMed CentralGoogle Scholar
- Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, et al. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 2002;3:research0048.View ArticlePubMedPubMed CentralGoogle Scholar
- Edwards D. Non-linear normalization and background correction in one-channel cDNA microarray studies. Bioinformatics. 2003;19:825–33.View ArticlePubMedGoogle Scholar
- Faller D, Voss HU, Timmer J, Hobohm U. Normalization of DNA-microarray data by nonlinear correlation maximization. J Comput Biol. 2003;10:751–62.View ArticlePubMedGoogle Scholar
- Huang HC, Niu Y, Qin LX. Differential expression analysis for RNA-Seq: an overview of statistical methods and computational software. Cancer Inform. 2015;14:57–67.PubMedPubMed CentralGoogle Scholar
- Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011;12:671–82.View ArticlePubMedGoogle Scholar
- Moreton J, Izquierdo A, Emes RD. Assembly, assessment and availability of de novo generated eukaryotic transcriptomes. Front Genet. 2015;6:1–9.Google Scholar
- Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002;3:RESEARCH0034.View ArticlePubMedPubMed CentralGoogle Scholar
- Unoson C, Wagner EGH. A small SOS-induced toxin is targeted against the inner membrane in Escherichia Coli. Mol Microbiol. 2008;70:258–70.View ArticlePubMedGoogle Scholar
- Dörr T, Vulic M, Lewis K. Ciprofloxacin causes persister formation by inducing the TisB toxin in Escherichia Coli. PLoS Biol. 2010;8:e1000317.View ArticlePubMedPubMed CentralGoogle Scholar
- Gurnev PA, Ortenberg R, Dörr T, Lewis K, Bezrukov SM. Persister-promoting bacterial toxin TisB produces anion-selective pores in planar lipid bilayers. FEBS Lett. 2012;586:2529–34.View ArticlePubMedPubMed CentralGoogle Scholar
- Berghoff BA, Hoekzema M, Aulbach L, Wagner EGH. Two regulatory RNA elements affect TisB-dependent depolarization and persister formation. Mol Microbiol. 2017;103:1020–33.View ArticlePubMedGoogle Scholar
- Dörr T, Lewis K, Vulic M. SOS response induces persistence to fluoroquinolones in Escherichia coli. PLoS Genet. 2009;5:e1000760.View ArticlePubMedPubMed CentralGoogle Scholar