miFRame: analysis and visualization of miRNA sequencing data in neurological disorders
© Backes et al. 2015
Received: 22 February 2015
Accepted: 2 July 2015
Published: 14 July 2015
While in the past decades nucleic acid analysis has been predominantly carried out using quantitative low- and high-throughput approaches such as qRT-PCR and microarray technology, next-generation sequencing (NGS) with its single base resolution is now frequently applied in DNA and RNA testing. Especially for small non-coding RNAs such as microRNAs there is a need for analysis and visualization tools that facilitate interpretation of the results also for clinicians.
We developed miFRame, which supports the analysis of human small RNA NGS data. Our tool carries out different data analyses for known as well as predicted novel mature microRNAs from known precursors and presents the results in a well interpretable manner. Analyses include among others expression analysis of precursors and mature miRNAs, detection of novel precursors and detection of potential iso-microRNAs. Aggregation of results from different users moreover allows for evaluation whether remarkable results, such as novel mature miRNAs, are indeed specific for the respective experimental set-up or are frequently detected across a broad range of experiments.
We demonstrate the capabilities of miFRame, which is freely available at http://www.ccb.uni-saarland.de/miframe on two studies, circulating biomarker screening for Multiple Sclerosis (cohort includes clinically isolated syndrome, relapse remitting MS, matched controls) as well as Alzheimer Disease (cohort includes Alzheimer Disease, Mild Cognitive Impairment, matched controls). Here, our tool allowed for an improved biomarker discovery by identifying likely false positive marker candidates.
During the past three decades, molecular analysis of DNA and RNA became more and more important. In the 1990’s, the low-throughput technologies such as Western Blot and quantitative polymerase chain reaction have been augmented by high-throughput technologies, namely microarrays. While microarrays allowed for profiling of thousands of molecules in parallel, producing orders of magnitude more data, qRT-PCR still remained gold standard. All these technologies are thought for quantitative analysis of molecular markers. Around one decade ago, next-generation sequencing (NGS) became available. With its single base resolution, NGS outperformed microarrays by several orders of magnitude considering the information content. This incredible flood of data requires sophisticated computational approaches for quantitative but also qualitative analysis of NGS data. Especially for DNA and mRNA a magnitude of analysis and visualization toolkits, web based as well as stand alone, are available. For miRNAs, that are known to be key regulators and valuable biomarker candidates for various human pathologies, respective tools are currently still under development.
One key challenge in small RNA sequencing analysis is the detection of so far unknown molecules. To address this topic, several tools have been developed. A prominent example is miRExpress, a stand-alone application for detecting known miRNAs along with novel miRNA candidates . Another example, miRanalyzer is available as a stand-alone solution, however it offers also the option to carry out the analyses online in a web based version [2, 3]. Besides the detection of novel miRNAs, miRanalyzer identifies differentially expressed miRNAs and predicts miRNA targets. Likewise as stand-alone and web service, SeqBuster allows for detection of variants of miRNAs for known markers and can moreover be applied to discover dys-regulated miRNAs and the prediction of targets .
Besides these tools specialized algorithms such as mirTools [5, 6], DARIO , WapRNA , eRNA  or E-miR  have been proposed. Among the most widely applied miRNA NGS tools is miRDeep/miRDeep2, allowing for the prediction of novel miRNAs [11, 12]. Respective tools are now stitched together in pipelines. An example of a comprehensive set of analysis tools is CPSS, including analysis of length distribution and genome mapping, quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, and functional enrichment, e.g. of GO terms . A second example is omiRas, detecting dys-regulated miRNAs and revealing insights into molecular mechanisms by annotation, comparison and visualization of interaction networks of ncRNAs .
While the above mentioned tools that represent just a selection of available NGS analysis solutions, focus on prediction of miRNAs and complex analyses to put miRNAs into a larger context, solutions that specifically analyze and visualize reads across the precursor molecules of specific biomarkers are not common. One example towards this direction is MISIS, a tool to visualize and analyze maps of small RNAs derived from viruses . MISIS displays RNA reads as histogram along a given reference sequence.
However, especially for small non-coding RNAs the visual inspection of reads across the precursor can support the discovery of potentially false positive biomarker candidates, to select the right candidates for further experiments and to interpret the results of the high-throughput experiments. We developed miFRame, a tool for the analysis of single miRNAs as well as miRNA sets from human NGS data that can be easily applied by researchers. Comparable to other tools, miFRame performs quantification of expression on miRNA precursors and mature forms of miRNAs, and allows for discovering differentially expressed markers. Besides this functionality, miFRame however also offers to discover potentially dys-regulated novel mature miRNAs, differentially regulated iso-miRNAs and performs a per-base expression analysis. Moreover, miFRame offers an aggregation analysis. Here, results of one user are compared to other users’ results, e.g., to provide evidence whether a novel miRNA isoform is actually specific for an experiment or frequently detected in other NGS analyses. If this is the case, an NGS artifact may be more likely than an actual finding, allowing for improved priorization of replication/validation experiments. Among the most important features of miFRame besides easy data input and sophisticated analyses along miRNA precursors is a concise representation of results: for each miRNA separately, key findings are represented in various manners, enabling users to inspect the most important markers visually.
miFRame is a freely available web-service for analyzing miRNA NGS data (http://www.ccb.uni-saarland.de/miframe/). The front-end is implemented in PHP and substantial parts of secondary miRNA analysis are carried out using proprietary R and Python scripts. No additional packages beyond the core packages in R version 3.0.2 were used. The analyses implemented in miFRame are described in the Results section of the manuscript. miFrame relies on the miRBase (http://www.mirbase.org/), one of the most frequently used miRNA repositories. From the miRBase, the hsa.gff file containing the genomic coordinates for all human miRNAs and the mature.fa file containing all mature miRNA sequences (for all organisms) are used. Currently, miFRame is restricted to human data, thus all non-human miRNAs are filtered out of the analysis. A detailed step-by-step tutorial is available from the web resource.
To demonstrate the functionality of miFRame, we applied our tool to two data sets on Multiple Sclerosis and Alzheimer consisting of a total of 133 samples [16, 17]. In brief, the following cohorts were included: Alzheimer Disease (AD) patients (n = 54), Mild Cognitive Impairment (MCI) patients (n = 20), Alzheimer control set (n = 22), Multiple Sclerosis patients (n = 15), Multiple Sclerosis control set (n = 22). All samples were sequenced on Illumina HiSeq systems. As source, whole blood has been used. Details on library preparation and sequencing are presented in [16, 17]. The data that have been used for this analysis are available for download from the web resource. Beyond the standard parameters we also tested the influence of different parameter sets on the results, specifically the influence of the window sizes is described in detail in the Results section. The pre-calculated examples (Alzheimer and MS) are available from the miFRame homepage without password protection.
miFRame gets pre-processed NGS data along with a parameter set as input. The parameter input can be easily performed via the web-interface and the following three input options are supported. First, our tool offers to re-enter a previous study ID to access analysis results that have been performed earlier. In that, miFRame also allows for sharing analysis results with collaborators. Besides the unique study ID users also get a password to protect their data. With that ID and the password, collaboration partners can directly access the analysis results (option 1). Second, arf files that are the standard output of miRDeep2, a popular miRNA analysis tool, can be uploaded (option 2). Besides the arf files, the user has to provide a file containing the total number of reads per sample that should be used for normalization. Here, the total reads from the NGS run directly can be used, or alternatively, the read counts after matching to the genome of H. sapiens or no normalization can be performed. As third input set, the grouping of the arf file to classes has to be uploaded as tab delimited data file (file name and class for each sample in an own row). Example input is provided for all three files on the miFrame homepage. Third, it is possible to use any other miRNA annotation tool and to generate the annotation files for upload with miFRame (option 3). The latter option allows for largest flexibility and transfer of small data sets in the range of few megabytes but substantial local pre-processing by the user is required. A more detailed description of the different input formats is available on the miFRame main webpage (http://www.ccb.uni-saarland.de/miframe/).
As parameters the user can specify significance threshold, select whether just single positions or many bases across the precursor have to be significantly changed (to observe differential regulation of one mature miRNA, around 15 positions should at least be significant), a window size for detection of iso-microRNAs (typical parameters are 2–4 bases) and the different analyses to be carried out. Moreover, the user can specify the statistical test to be applied (parametric t test for normally distributed data or Wilcoxon Mann–Whitney test) and the graphics output format (pdf vector graphics, jpeg or png). Finally, the user can decide whether the data should be used for the aggregation analysis. In this case he gets the number of significant results others obtained for the respective miRNAs and his own data are added to the data pool.
Here, miFRame tests the hypothesis that all reads mapping to the precursor are differentially expressed between a case and control cohort. The mean read count per sample across the precursor is calculated. By applying the hypothesis test specified by the user, average count in case and control cohort is compared to each other and a p-value per miRNA precursor is calculated.
Mature expression analysis
In this step, miFRame tests the hypothesis that miRNA precursors are differentially expressed between cases and controls. To this end, the average read count per sample on each mature 3′ and 5′ miRNA is calculated and by applying the hypothesis test specified by the user, a p-value is calculated.
3′ to 5′ expression analysis ratio
Here, miFRame searches for miRNAs that show either opposite or same differential expression between cases and controls on the 3′ compared to the 5′ mature forms of miRNAs. First, miRNAs with just a single mature form are omitted, first. Then, for each remaining miRNA the fold changes and p-values for the 3′ and 5′ forms are determined analogously to the mature expression analysis. From these two fold changes, the difference between the 3′ and the 5′ mature form is calculated. miRNA precursors with resulting expression quotients close to 1 show the same up- or down-regulation of both mature forms, while those miRNAs with fold changes ≫1 are more up-regulated in the 5′ mature form and miRNAs with fold changes ≪1 are more up-regulated in the 3′ mature form.
Per-base expression analysis
In this analysis, miFRame checks how many base positions across the precursor of each miRNA are significantly differentially expressed in cases compared to controls. Most computational approaches just consider either maximal or average expression across the mature forms of miRNAs. In several cases, however, just few positions, especially at the 3′ or the 5′ end of the mature miRNA(s) are significantly different between two cohorts. These sites are detected by the per-base expression analysis but are usually not discovered by considering the full precursor or mature miRNAs (see also window analysis).
Novel mature miRNA analysis
For a substantial amount of miRNAs in the miRBase just a single mature form (either the 3′ or 5′) is known. However, in many cases a significant number of reads maps exactly to positions matching the respective second mature form (5′ or 3′) of known miRNAs. To this end, miFRame calculates for those miRNAs where just a single mature form is known whether reads are matching to the respective 3′ or 5′ mature form. Additionally, miFRame calculates whether the respective mature form shows significantly different expression in cases as compared to controls and outputs how many reads are located on this potential novel mature miRNA.
As described in the per base analysis section, it is often the case that only a few single positions, predominantly at the 3′ or 5′ end of the mature miRNA(s) belonging to one precursor are significantly changed or show at least a lower p-value value than the remaining part of the precursor or mature form(s) of the miRNA. These sites are frequently not taken into account in statistical analyses or are down-weighted by averaging across the mature miRNAs. miFRame allows the user to select a window size in the parameter selection step. Then, it calculates the percentage of bases covered in this window with respect to the 3′ and 5′ end of the 3′ and 5′ mature miRNA form. In consequence, for each precursor with two mature forms 4 percentages in cases and controls are calculated along with the difference between the two cohorts while for precursors with just one mature form only two percentages are calculated. Additionally, miFRame also outputs the respective motifs at the 3′ and 5′ end of the mature miRNAs. Thus, mature miRNAs with isoforms in specific traits can be well detected.
In case the user decided to carry out the aggregation analysis, just the relevant findings from the current study are stored anonymously in a local database. In return, the relevant findings from that user are compared to the database such that it becomes clear whether e.g. a novel mature miRNA is specific for a certain trait or is reported in many experiments, making a false positive finding by NGS artifacts more likely. This analysis may help to prioritize the replication and validation experiments by selecting findings that are less frequently discovered.
Output of results
As second output, miFRame presents detailed results for all analyses that lead to significant findings. The result tables consist of two parts, the left hand contains key characteristics such as read counts, p-values, fold changes, etc. All entries can be sorted in increasing or decreasing manner for the respective features. The right hand part of the result tables consists of three thumbnails for each miRNA. Clicking each of them opens high-resolution graphical representations of the respective results. The three representations include the following plot: (1) significant bases across the precursor molecules. For each miRNA, the negative decade logarithm of p-values is presented. Remarkably, for each base of the precursor an own bar is drawn. Green bars represent down-regulation in cases while red bars mean up-regulation in cases as compared to controls. (2) read distribution across the precursor. Each sample is presented as line plot along the miRNA precursors. Here, red lines correspond to cases while green lines correspond to controls. (3) Pileup plots for each miRNA are also calculated. These plots show in the middle the precursor miRNA sequence. Mature forms are colored in blue and red, respectively, and the area where reads map to known miRNAs is shaded in blue. Above the precursor sequence, the consensus sequence per case individual is presented while below the precursor sequence the consensus per control individual is drawn. Interpreting these results allows for easy detection of miRNAs that show inaccurate mappings, indicating potentially false positive findings, or for discovering iso-miRNAs. In case the user decided to carry out the aggregation analysis, a fourth plot is generated, showing in how many different experiments the respective miRNA has been included and in how many of these experiments a significant finding was calculated.
Besides the web-based representation of the results all findings can be downloaded as compressed folder. The download does not include just the Manhattan plots, bar diagrams, read distribution graphics and pileup plots but also tab delimited flat files that can be imported in various standard software for spreadsheet analysis.
Application of miFRame to Alzheimer and multiple sclerosis
To demonstrate the usability of miFRame we applied the tool to two publicly available data sets that can be also downloaded from the miFrame homepage, containing circulating miRNA profiles from whole blood of Alzheimer and Multiple Sclerosis patients [16, 17]. The data sets are sketched in the “Methods” section. Detailed description can be found in the respective original publications. For both data sets the full analysis has been carried out using the standard parameters of miFRame and analysis results are available from the miFRame homepage (reference ID and password are provided in the “Methods” section). All analyses have been carried out on samples with read counts normalized to one million total reads.
While these analyses generally could have been done using qRT-PCR or microarray data, the NGS data however offer the additional avenue of a detailed per base data analysis.
Analysis of mapped reads
Influence of the window size
Influence of the window size on the results computation of miFRame
As described in the Introduction, analysis tools for next generation sequencing data of small non-coding RNAs are constantly improved and novel tools are implemented. While a substantial portion of these tools focuses on the detection of novel miRNAs from NGS reads, other tools aim at improved understanding of pathogenic processes by integrating miRNA data to biochemical pathways and linking them with target genes.
Besides furthering our understanding of human pathogenic processes, miRNAs however offer themselves as biomarkers to detect disease in time or monitor disease progression. With miFRame, we developed a tool that supports the investigation and graphical analysis of biomarker candidates. Besides classical quantitative analyses, miFRame also provides qualitative analyses such as the detection of novel mature forms of miRNAs or the discovery of potential iso-miRNAs, specifically extended mature forms of miRNAs.
Key functionality of miFRame is the concise graphical representation of the high-throughput data. For each analysis, an overview plot showing the significant results along the genome is generated, a well-known representation from Genome Wide Association Studies. More importantly, three plots are generated for each miRNA, showing the significance and respectively read distribution along the miRNA precursor together with pileup plots, representing the consensus sequence of all reads and all samples mapping to this precursor.
The potential of miFRame is presented on Multiple Sclerosis and Alzheimer Disease whole blood NGS data. Beyond the known markers for this disease, especially in case of Alzheimer novel markers were identified. The increased significance in AD compared to MS may be partially due to the larger cohort size for the AD study. Beyond discovering of differentially regulated miRNAs, our analysis also revealed miRNAs with different fold changes on the 3′ to 5′ form in AD as compared to controls, novel potential mature forms of known miRNAs as well as potential isoforms. Remarkably, in case of the Alzheimer data set, we originally reported a 12-miRNA signature that has been fully validated by qRT-PCR. While in 10 cases results matched well, for 2 miRNAs, however, the qRT-PCR data did not correspond to the NGS results (miR-26a-5p and miR-1285-5p). For miR-26a-5p, miFRame did not reveal a potential source leading to a false positive biomarker. In contrast, for miR-1285, miFRame revealed a wired read mapping to the mature 5′ form on the precursor. The results are shown in detail in Additional file 1: Figure S1. Only few bases at the 5′ end of the miRNA were highly significant, the difference between Alzheimer Patients and controls decreased towards the middle of that miRNA and many reads mapped outside the actual annotated mature 5′ biomarker, representing a potential mapping issue in case of this biomarker, which may lead to a divergent qRT-PCR result.
We developed miFRame, a tool that gets miRNA NGS data and carries out statistical analysis of read counts as well as detection of novel mature miRNAs and iso-miR analysis. Compared to other tools, miFRame has multiple additional features, for example users can share their aggregated and anonymized results and as benefit get the feedback how many other researchers observed similar results, highlighting potential systematic errors and help to prioritize candidates for validation studies. The very detailed graphical representation of the results allows researchers to interpret and evaluate relevant findings from their experiments. Thus, our tool augments current miRNA NGS analysis pipelines that are tailored for the detection of novel miRNAs, the prediction of target genes or functional miRNA enrichment analysis.
CB contributed in study design and data analysis, JH contributed in data collection, PL contributed in sample processing, KF contributed in data collection, TG contributed in implementing the web interface, KR provided patients and clinical information, BM contributed in manuscript writing and data collection, EM contributed in manuscript writing and data collection, AK wrote the manuscript and contributed in data analysis. All authors read and approved the final manuscript.
This work has been funded by the EU FP7 Project BestAgeing.
Compliance with ethical guidelines
Competing interests The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Wang WC, Lin FM, Chang WC, Lin KY, Huang HD, Lin NS (2009) miRExpress: analyzing high-throughput sequencing data for profiling microRNA expression. BMC Bioinform 10:328View ArticleGoogle Scholar
- Hackenberg M, Rodriguez-Ezpeleta N, Aransay AM (2011) miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments. Nucleic Acids Res 39:W132–W138PubMed CentralPubMedView ArticleGoogle Scholar
- Hackenberg M, Sturm M, Langenberger D, Falcon-Perez JM, Aransay AM (2009) miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res 37:W68–W76PubMed CentralPubMedView ArticleGoogle Scholar
- Pantano L, Estivill X, Marti E (2010) SeqBuster, a bioinformatic tool for the processing and analysis of small RNAs datasets, reveals ubiquitous miRNA modifications in human embryonic cells. Nucleic Acids Res 38:e34PubMed CentralPubMedView ArticleGoogle Scholar
- Wu J, Liu Q, Wang X, Zheng J, Wang T, You M et al (2013) mirTools 2.0 for non-coding RNA discovery, profiling, and functional annotation based on high-throughput sequencing. RNA Biol 10:1087–1092PubMed CentralPubMedView ArticleGoogle Scholar
- Zhu E, Zhao F, Xu G, Hou H, Zhou L, Li X et al (2010) mirTools: microRNA profiling and discovery based on high-throughput sequencing. Nucleic Acids Res 38:W392–W397PubMed CentralPubMedView ArticleGoogle Scholar
- Fasold M, Langenberger D, Binder H, Stadler PF, Hoffmann S (2011) DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res 39(Web Server issue):W112–W117. doi:10.1093/nar/gkr357 Google Scholar
- Zhao W, Liu W, Tian D, Tang B, Wang Y, Yu C et al (2011) wapRNA: a web-based application for the processing of RNA sequences. Bioinformatics 27:3076–3077PubMedView ArticleGoogle Scholar
- Yuan T, Huang X, Dittmar RL, Du M, Kohli M, Boardman L et al (2014) eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing. BMC Genom 15:176View ArticleGoogle Scholar
- Buermans HP, Ariyurek Y, van Ommen G, den Dunnen JT, t Hoen PA (2010) New methods for next generation sequencing based microRNA expression profiling. BMC Genom 11:716View ArticleGoogle Scholar
- Friedlander MR, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S et al (2008) Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotech 26:407–415View ArticleGoogle Scholar
- Friedländer MR, Mackowiak SD, Li N, Chen W, Rajewsky N (2012) miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res 40(1):37–52. doi:10.1093/nar/gkr688 PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang Y, Xu B, Yang Y, Ban R, Zhang H, Jiang X et al (2012) CPSS: a computational platform for the analysis of small RNA deep sequencing data. Bioinformatics 28:1925–1927PubMedView ArticleGoogle Scholar
- Muller S, Rycak L, Winter P, Kahl G, Koch I, Rotter B (2013) omiRas: a Web server for differential expression analysis of miRNAs derived from small RNA-Seq data. Bioinformatics 29:2651–2652PubMedView ArticleGoogle Scholar
- Seguin J, Otten P, Baerlocher L, Farinelli L, Pooggin MM (2014) MISIS: a bioinformatics tool to view and analyze maps of small RNAs derived from viruses and genomic loci generating multiple small RNAs. J Virol Methods 195:120–122PubMedView ArticleGoogle Scholar
- Leidinger P, Backes C, Deutscher S, Schmitt K, Mueller SC, Frese K et al (2013) A blood based 12-miRNA signature of Alzheimer disease patients. Genome Biol 14:R78PubMed CentralPubMedView ArticleGoogle Scholar
- Keller A, Leidinger P, Steinmeyer F, Stahler C, Franke A, Hemmrich-Stanisak G et al (2014) Comprehensive analysis of microRNA profiles in multiple sclerosis including next-generation sequencing. Mult Scler 20:295–303PubMedView ArticleGoogle Scholar
- Londin E, Loher P, Telonis AG, Quann K, Clark P, Jing Y et al (2015) Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs. Proc Natl Acad Sci USA 112:E1106–E1115PubMed CentralPubMedView ArticleGoogle Scholar