# QUAST

Quality Assessment Tool for Genome Assemblies by CAB

{ "# contigs" : "is the total number of contigs in the assembly.", "Largest contig" : "is the length of the longest contig in the assembly.", "Total length" : "is the total number of bases in the assembly.", "Reference length" : "is the total number of bases in the reference.", "# contigs (>= 0 bp)" : "is the total number of contigs in the assembly that have size greater than or equal to 0 bp.", "Total length (>= 0 bp)" : "is the total number of bases in the contigs having size greater than or equal to 0 bp.", "N50" : "is the contig length such that using longer or equal length contigs produces half (50%) of the bases of the assembly. Usually there is no value that produces exactly 50%, so the technical definition is the maximum length x such that using contigs of length at least x accounts for at least 50% of the total assembly length.", "NG50" : "is the contig length such that using longer or equal length contigs produces half (50%) of the bases of the reference genome. This metric is computed only if a reference genome is provided.", "N75" : "is the contig length such that using longer or equal length contigs produces 75% of the bases of the assembly. Usually there is no value that produces exactly 75%, so the technical definition is the maximum length x such that using contigs of length at least x accounts for at least 75% of the total assembly length.", "NG75" : "is the contig length such that using longer or equal length contigs produces 75% of the bases of the reference genome. This metric is computed only if a reference genome is provided.", "L50" : "is the minimum number of contigs that produce half (50%) of the bases of the assembly. In other words, it's the number of contigs of length at least N50.", "LG50" : "is the minimum number of contigs that produce half (50%) of the bases of the reference genome. In other words, it's the number of contigs of length at least NG50. This metric is computed only if a reference genome is provided.", "L75" : "is the minimum number of contigs that produce 75% of the bases of the assembly. In other words, it's the number of contigs of length at least N75.", "LG75" : "is the minimum number of contigs that produce 75% of the bases of the reference genome. In other words, it's the number of contigs of length at least NG75. This metric is computed only if a reference genome is provided.", "NA50" : "is N50 where the lengths of aligned blocks are counted instead of contig lengths. I.e., if a contig has a misassembly with respect to the reference, the contig is broken into smaller pieces. This metric is computed only if a reference genome is provided.", "NGA50" : "is NG50 where the lengths of aligned blocks are counted instead of contig lengths. I.e., if a contig has a misassembly with respect to the reference, the contig is broken into smaller pieces. This metric is computed only if a reference genome is provided.", "NA75" : "is N75 where the lengths of aligned blocks are counted instead of contig lengths. I.e., if a contig has a misassembly with respect to the reference, the contig is broken into smaller pieces. This metric is computed only if a reference genome is provided.", "NGA75" : "is NG75 where the lengths of aligned blocks are counted instead of contig lengths. I.e., if a contig has a misassembly with respect to the reference, the contig is broken into smaller pieces. This metric is computed only if a reference genome is provided.", "LA50" : "is L50 where aligned blocks are counted instead of contigs. I.e., if a contig has a misassembly with respect to the reference, the contig is broken into smaller pieces.", "LGA50" : "is LG50 where aligned blocks are counted instead of contigs. I.e., if a contig has a misassembly with respect to the reference, the contig is broken into smaller pieces.", "LA75" : "is L75 where aligned blocks are counted instead of contigs. I.e., if a contig has a misassembly with respect to the reference, the contig is broken into smaller pieces.", "LGA75" : "is LG75 where aligned blocks are counted instead of contigs. I.e., if a contig has a misassembly with respect to the reference, the contig is broken into smaller pieces.", "Average %IDY" : "is the average of alignment identity percent (Nucmer measure of alignment accuracy) among all contigs.", "# misassemblies" : "is the number of positions in the assembled contigs where the left flanking sequence aligns over 1 kbp away from the right flanking sequence on the reference (relocation) or they overlap on more than 1 kbp (relocation) or flanking sequences align on different strands (inversion) or different chromosomes (translocation).", "# large block misassemblies" : "is the number of misassemblies between alignments with length greater than or equal to 3 kbp and with the misassembly threshold equal to 5 kbp (instead of default 1 kbp for regular misassemblies).", "# misassembled contigs" : "is the number of contigs that contain misassembly events.", "Misassembled contigs length" : "is the number of total bases contained in all contigs that have one or more misassemblies.", "# relocations" : "is the number of relocation events among all misassembly events. Relocation is a misassembly where the left flanking sequence aligns over 1 kbp away from the right flanking sequence on the reference, or they overlap by more than 1 kbp and both flanking sequences align on the same chromosome.", "# translocations" : "is the number of translocation events among all misassembly events. Translocation is a misassembly where the flanking sequences align on different chromosomes.", "# interspecies translocations" : "is the number of interspecies translocation events among all misassembly events. Interspecies translocation is a misassembly where the flanking sequences align on different references (based on alignments to the combined reference).", "# inversions" : "is the number of inversion events among all misassembly events. Inversion is a misassembly where it is not a relocation and the flanking sequences align on opposite strands of the same chromosome.", "# large relocations" : "is the number of relocation events among all large block misassemblies. Relocation is a misassembly where the left flanking sequence aligns over 5 kbp away from the right flanking sequence on the reference, or they overlap by more than 5 kbp and both flanking sequences align on the same chromosome.", "# large translocations" : "is the number of translocation events among all large block misassemblies. Translocation is a misassembly where the flanking sequences align on different chromosomes.", "# large i/s translocations" : "is the number of interspecies translocation events among all large block misassemblies. Interspecies translocation is a misassembly where the flanking sequences align on different references (based on alignments to the combined reference).", "# large inversions" : "is the number of inversion events among all large block misassemblies. Inversion is a misassembly where it is not a relocation and the flanking sequences align on opposite strands of the same chromosome.", "# local misassemblies" : "is the number of local misassemblies. We define a local misassembly breakpoint as a breakpoint that satisfies these conditions:
1. Two or more distinct alignments cover the breakpoint.
2. The gap between left and right flanking sequences is less than 1 kbp.
3. The left and right flanking sequences both are on the same strand of the same chromosome of the reference genome.
", "# scaffold gap size mis." : "is the number of scaffold gap size misassemblies. We define scaffold gap size misassembly as a breakpoint where the flanking sequences combined in scaffold on the wrong distance. These misassemblies are not included in the total number of misassemblies. ", "# possibly misassembled contigs": "is the number of contigs that contain large unaligned fragment (default min length is 500 bp) and thus could possibly contain interspecies translocation with unknown reference.", "# possible misassemblies" : "is the number of putative interspecies translocations in possibly misassembled contigs if each large unaligned fragment is supposed to be a fragment of unknown reference.", "# intergenomic misassemblies" : "is the number of all found and putative (possible) interspecies translocations.", "# structural variations" : "is the number of misassemblies matched with structural variations.", "# possible MGEs" : "is the number of misassemblies possibly caused by mobile genetic elements (MGE). We define a possible MGE as an event that satisfies these conditions:
1. There is two misassemblies of the same type around a short alignment (less than 6 kbp)
2. The gap between two long flanking sequences on the sides of the short alignment is less than 6 kbp.
3. The long flanking sequences both are on the same strand of the same chromosome of the reference genome.