Skip to main content
Fig. 6 | Journal of Translational Medicine

Fig. 6

From: Unraveling metagenomics through long-read sequencing: a comprehensive review

Fig. 6

Overview of workflow of functional annotation. Functional annotation utilizes data from previous steps to identify genes and maps them against databases, elucidating the functions of each gene and the respective host microbe. A The workflow of EggNOG-mapper v2 consists of gene prediction, search, orthology inference, and annotation stages. Gene prediction uses assembled contigs as input for Prodigal. Search stage aligns the input read against HMMER, DIAMOND, and MMseqs2. During orthology inference, a taxonomic scope filter is applied to get the desired orthologs. Lastly, in the annotation stage, annotated orthologs are put through eggNOG annotation database and other annotation tools, resulting in annotated ortholog. B MEGAN-LR starts by aligning the input reads against NCBI-nr, DNA-to-protein database using the LAST alignment tool. The LAST tool outputs a MAF file which is converted into DAA file. The DAA file is taken by Meganizer to perform taxonomic and functional binning, and the outputs are appended back into the DAA file. The newly appended DAA file is then opened in MEGAN-LR for visualization and analysis. C MetaErg uses assembled contigs as input and identifies CRISPR region, and non-coding regions, which include tRNA and rRNA. Prodigal uses the outcome to predict the protein coding region or ORF. The predicted ORFs are run through various functional categories, similarity search, and database such as GenomeDB, Casgene HMM, Metabolic HMM, Swiss-Prot, FOAM, Pfam-A, TIGRFAMs, etc. Once functional annotation is complete, output and visualization can be returned in various formats. D Nanopore (pipeline) starts by converting fast5 data into fastq files through base calling and demultiplexing. The taxonomic annotation takes fastq files and annotates them using two different tools. The first method uses Centrifuge to perform taxonomic binning and remove erroneous taxonomic assignments using minimap2. Sequences with mapQ score of 5 or higher are kept. The second method uses IGC and minimap2 and only sequence with highest mapQ score is kept. The gene count table is constructed by counting the number of sequences mapped by ONT reads. Using the mean value of the 50 most connected genes from the gene count table, metagenomic species abundance is estimated. For functional annotation, taxonomic results from Centrifuge utilize the KEGG API to retrieve KO content, while those from IGC use the IGC reference to obtain KO content

Back to article page