Difference between revisions of "Sequence assembly"

From PGI

Jump to: navigation, search
(Created page with "<p><span style="color: #000000">In bioinformatics, <b>sequence assembly</b> refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the ori...")
 
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
<p><span style="color: #000000">In bioinformatics, <b>sequence assembly</b> refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases, depending on the technology used. Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">In bioinformatics, <b>sequence assembly</b> refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence. </span></span></p>
<p><span style="color: #000000">The problem of sequence assembly can be compared to taking many copies of a book, passing them all through a shredder, and piecing a copy of the book back together from only shredded pieces. The book may have many repeated paragraphs, and some shreds may be modified to have typos. Excerpts from another book may be added in, and some shreds may be completely unrecognizable.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases. </span></span></p>
<p><span style="color: #000000">
+
<p><span style="font-size: small"><span style="color: #000000">Typically the short fragments, called <b>reads</b>, result from shotgun sequencing genomic DNA, or gene transcript ([[EST]]s).</span></span></p>
 +
<p><b><span style="font-size: small"><span style="color: #000000">Sequence assembly as reconstructing a book</span></span></b></p>
 +
<p>&nbsp;</p>
 +
<p><span style="font-size: small"><span style="color: #000000">The problem of sequence assembly can be compared to taking many copies of a book, passing them all through a shredder, and piecing a copy of the book back together from only shredded pieces. The book may have many repeated paragraphs, and some shreds may be modified to have typos. Excerpts from another book may be added in, and some shreds may be completely unrecognizable.</span></span></p>
 +
<p>&nbsp;</p>
 
<h2><span style="color: #000000"><span id="Genome_assemblers" class="mw-headline">Genome assemblers</span></span></h2>
 
<h2><span style="color: #000000"><span id="Genome_assemblers" class="mw-headline">Genome assemblers</span></span></h2>
<p><span style="color: #000000">The first sequence assemblers began to appear in the late 1980s and early 1990s as variants of simpler sequence alignment programs to piece together vast quantities of fragments generated by automated sequencing instruments called DNA sequencers. As the sequenced organisms grew in size and complexity (from small viruses over plasmids to bacteria and finally eukaryotes), the assembly programs needed to increasingly employ more and more sophisticated strategies to handle:</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">The first sequence assemblers began to appear in the late 1980s and early 1990s as variants of simpler sequence alignment programs to piece together vast quantities of fragments generated by automated sequencing instruments called DNA sequencers. </span></span></p>
 +
<p><span style="font-size: small"><span style="color: #000000">As the sequenced organisms grew in size and complexity, the assembly programs needed to increasingly employ sophisticated strategies to handle:</span></span></p>
 
<ul>
 
<ul>
     <li><span style="color: #000000">terabytes of sequencing data which need processing on computing clusters;</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">terabytes of data which need processing on computing clusters;</span> </span></li>
     <li><span style="color: #000000">identical and nearly identical sequences (known as <i>repeats</i>) which can, in the worst case, increase the time and space complexity of algorithms exponentially;</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">identical and nearly identical sequences (known as <i>repeats</i>) which can, in the worst case, increase the time and space complexity of algorithms exponentially;</span> </span></li>
     <li><span style="color: #000000">and errors in the fragments from the sequencing instruments, which can confound assembly.</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">and errors in the fragments from the sequencing instruments, which can confound assembly.</span> </span></li>
 
</ul>
 
</ul>
<p><span style="color: #000000">Faced with the challenge of assembling the first larger eukaryotic genomes, the fruit fly Drosophila melanogaster, in 2000 and the human genome just a year later, scientists developed assemblers like Celera Assembler<sup id="cite_ref-0" class="reference"><font size="2">[1]</font></sup> and Arachne<sup id="cite_ref-1" class="reference"><font size="2">[2]</font></sup> able to handle genomes of 100-300 million base pairs. Subsequent to these efforts, several other groups, mostly at the major genome sequencing centers, built large-scale assemblers, and an open source effort known as AMOS<sup id="cite_ref-2" class="reference"><font size="2">[3]</font></sup> was launched to bring together all the innovations in genome assembly technology under the open source framework.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">Faced with the challenge of assembling the first larger eukaryotic genomes, the fruit fly [[Drosophila melanogaster]], in <b>2000 </b>and the human genome in <b>2001</b>, scientists developed assemblers&nbsp;such as&nbsp;Celera Assembler<sup id="cite_ref-0" class="reference">[1]</sup> and [[Arachne]]<sup id="cite_ref-1" class="reference">[2]</sup> able to handle genomes of <b>100,000,000 - 300,000,000</b> base pairs. Subsequent to these efforts, several other groups, mostly at the major genome sequencing centers, built large-scale assemblers, and an open source effort known as [[AMOS]]<sup id="cite_ref-2" class="reference">[3]</sup> was launched to bring together all the innovations in genome assembly technology under the open source framework.</span></span></p>
<h2><span style="color: #000000"><span id="EST_assemblers" class="mw-headline">EST assemblers</span></span></h2>
+
<h2><span style="font-size: medium"><span style="color: #000000"><span id="EST_assemblers" class="mw-headline">EST assemblers</span></span></span></h2>
<p><span style="color: #000000">EST assembly differs from genome assembly in several ways. The sequences for EST assembly are the transcribed mRNA of a cell and represent only a subset of the whole genome. At a first glance, underlying algorithmical problems differ between genome and EST assembly. For instance, genomes often have large amounts of repetitive sequences, mainly in the inter-genic parts. Since ESTs represent gene transcripts, they will not contain these repeats. On the other hand, cells tend to have a certain number of genes that are constantly expressed in very high amounts (housekeeping genes), which again leads to the problem of similar sequences present in high amounts in the data set to be assembled.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">EST assembly differs from genome assembly in several ways. The sequences for EST assembly are the transcribed mRNA of a cell and represent only a subset of the whole genome. At a first glance, underlying algorithmical problems differ between genome and EST assembly. For instance, genomes often have large amounts of repetitive sequences, mainly in the inter-genic parts. Since ESTs represent gene transcripts, they will not contain these repeats. On the other hand, cells tend to have a certain number of genes that are constantly expressed in very high amounts (housekeeping genes), which again leads to the problem of similar sequences present in high amounts in the data set to be assembled.</span></span></p>
<p><span style="color: #000000">Furthermore, genes sometimes overlap in the genome (sense-antisense transcription), and should ideally still be assembled separately. EST assembly is also complicated by features like (cis-) alternative splicing, trans-splicing, single-nucleotide polymorphism, recoding, and post-transcriptional modification.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">Furthermore, genes sometimes overlap in the genome (sense-antisense transcription), and should ideally still be assembled separately. EST assembly is also complicated by features like (cis-) alternative splicing, trans-splicing, single-nucleotide polymorphism, recoding, and post-transcriptional modification.</span></span></p>
<h2><span style="color: #000000"><span id="De-novo_vs._mapping_assembly" class="mw-headline">De-novo vs. mapping assembly</span></span></h2>
+
<h2><span style="font-size: small"><span style="color: #000000"><span id="De-novo_vs._mapping_assembly" class="mw-headline">De-novo vs. mapping assembly</span></span></span></h2>
<p><span style="color: #000000">In sequence assembly, two different types can be distinguished:</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">In sequence assembly, two different types can be distinguished:</span></span></p>
 
<ol>
 
<ol>
     <li><span style="color: #000000">de-novo: assembling reads together so that they form a new, previously unknown sequence</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">de-novo: assembling reads together so that they form a new, previously unknown sequence</span> </span></li>
     <li><span style="color: #000000">mapping: assembling reads against an existing backbone sequence, building a sequence that is similar but not necessarily identical to the backbone sequence</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">mapping: assembling reads against an existing backbone sequence, building a sequence that is similar but not necessarily identical to the backbone sequence</span> </span></li>
 
</ol>
 
</ol>
<p><span style="color: #000000">In terms of complexity and time requirements, de-novo assemblies are orders of magnitude slower and more memory intensive than mapping assemblies. This is mostly due to the fact that the assembly algorithm need to compare every read with every other read (an operation that is has a complexity of O(<var>n</var><sup><font size="2">2</font></sup>) but can be reduced to O(<var>n</var> log(<var>n</var>)). Referring to the comparison drawn to shredded books in the introduction: while for mapping assemblies one would have a very similar book as template (perhaps with the names of the main characters and a few locations changed), the de-novo assemblies are more hardcore in a sense as one would not know beforehand whether this would become a science book, or a novel, or a catalogue etc.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">In terms of complexity and time requirements, de-novo assemblies are orders of magnitude slower and more memory intensive than mapping assemblies. This is mostly due to the fact that the assembly algorithm need to compare every read with every other read (an operation that is has a complexity of O(<var>n</var><sup>2</sup>) but can be reduced to O(<var>n</var> log(<var>n</var>)). Referring to the comparison drawn to shredded books in the introduction: while for mapping assemblies one would have a very similar book as template (perhaps with the names of the main characters and a few locations changed), the de-novo assemblies are more hardcore in a sense as one would not know beforehand whether this would become a science book, or a novel, or a catalogue etc.</span></span></p>
<h2><span style="color: #000000"><span id="Influence_of_technological_changes" class="mw-headline">Influence of technological changes</span></span></h2>
+
<h2><span style="font-size: small"><span style="color: #000000"><span id="Influence_of_technological_changes" class="mw-headline">Influence of technological changes</span></span></span></h2>
<p><span style="color: #000000">The complexity of sequence assembly is driven by two major factors: the number of fragments and their lengths. While more and longer fragments allow better identification of sequence overlaps, they also pose problems as the underlying algorithms show quadratic or even exponential complexity behaviour to both number of fragments and their length. And while shorter sequences are faster to align, they also complicate the layout phase of an assembly as shorter reads are more difficult to use with repeats or near identical repeats.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">The complexity of sequence assembly is driven by two major factors: the number of fragments and their lengths. While more and longer fragments allow better identification of sequence overlaps, they also pose problems as the underlying algorithms show quadratic or even exponential complexity behaviour to both number of fragments and their length. And while shorter sequences are faster to align, they also complicate the layout phase of an assembly as shorter reads are more difficult to use with repeats or near identical repeats.</span></span></p>
<p><span style="color: #000000">In the earliest days of DNA sequencing, scientists could only gain a few sequences of short length (some dozen bases) after weeks of work in laboratories. Hence, these sequences could be aligned in a few minutes by hand.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">In the earliest days of DNA sequencing, scientists could only gain a few sequences of short length (some dozen bases) after weeks of work in laboratories. Hence, these sequences could be aligned in a few minutes by hand.</span></span></p>
<p><span style="color: #000000">In 1975, the Dideoxy termination method (also known as <i>Sanger sequencing</i>) was invented and until shortly after 2000, the technology was improved up to a point were fully automated machines could churn out sequences in a highly parallelised mode 24 hours a day. Large genome centers around the world housed complete farms of these sequencing machines, which in turn led to the necessity of assemblers to be optimised for sequences from whole-genome shotgun sequencing projects where the reads</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">In 1975, the Dideoxy termination method (also known as <i>Sanger sequencing</i>) was invented and until shortly after 2000, the technology was improved up to a point were fully automated machines could churn out sequences in a highly parallelised mode 24 hours a day. Large genome centers around the world housed complete farms of these sequencing machines, which in turn led to the necessity of assemblers to be optimised for sequences from whole-genome shotgun sequencing projects where the reads</span></span></p>
 
<ul>
 
<ul>
     <li><span style="color: #000000">are about 800&ndash;900 bases long</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">are about 800&ndash;900 bases long</span> </span></li>
     <li><span style="color: #000000">contain sequencing artifacts like sequencing and cloning vectors</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">contain sequencing artifacts like sequencing and cloning vectors</span> </span></li>
     <li><span style="color: #000000">have error rates between 0.5 and 10%</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">have error rates between 0.5 and 10%</span> </span></li>
 
</ul>
 
</ul>
<p><span style="color: #000000">With the Sanger technology, bacterial projects with 20,000 to 200,000 reads could easily be assembled on one computer. Larger ones like the human genome with approximately 35 million reads needed already large computing farms and distributed computing.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">With the Sanger technology, bacterial projects with 20,000 to 200,000 reads could easily be assembled on one computer. Larger ones like the human genome with approximately 35 million reads needed already large computing farms and distributed computing.</span></span></p>
<p><span style="color: #000000">By 2004 / 2005, pyrosequencing had been brought to commercial viability by 454 Life Sciences. This new sequencing methods generated reads much shorter than from Sanger sequencing: initially about 100 bases, now 400 bases and expected to grow to 1000 bases by the end of 2010. However, due to the much higher throughput and lower cost than Sanger sequencing, the adoption of this technology by genome centers pushed development of sequence assemblers to deal with this new type of sequences. The sheer amount of data coupled with technology specific error patterns in the reads delayed development of assemblers, at the beginning in 2004 only the Newbler assembler from 454 was available. Presented in mid 2007<sup id="cite_ref-3" class="reference"><font size="2">[4]</font></sup>, the hybrid version of the MIRA assembler by Chevreux et al. was the first freely available assembler who could assemble 454 reads and mixtures of 454 reads and Sanger reads; using sequences from different sequencing technologies was subsequently coined <i>hybrid assembly</i>.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">By 2004 / 2005, pyrosequencing had been brought to commercial viability by 454 Life Sciences. This new sequencing methods generated reads much shorter than from Sanger sequencing: initially about 100 bases, now 400 bases and expected to grow to 1000 bases by the end of 2010. However, due to the much higher throughput and lower cost than Sanger sequencing, the adoption of this technology by genome centers pushed development of sequence assemblers to deal with this new type of sequences. The sheer amount of data coupled with technology specific error patterns in the reads delayed development of assemblers, at the beginning in 2004 only the Newbler assembler from 454 was available. Presented in mid 2007<sup id="cite_ref-3" class="reference">[4]</sup>, the hybrid version of the MIRA assembler by Chevreux et al. was the first freely available assembler who could assemble 454 reads and mixtures of 454 reads and Sanger reads; using sequences from different sequencing technologies was subsequently coined <i>hybrid assembly</i>.</span></span></p>
<p><span style="color: #000000">Ironically, technological development of sequencing continued to improve in the wrong way (from a sequence assembly point of view). Since 2006, the Solexa technology is available and heavily used to generate roundabout 100 million reads per day on a single sequencing machine. Compare this to the 35 million reads of the human genome project which needed several years to be produced on hundreds of sequencing machines. The downside is that these reads have a length of only 36 bases (expected to grow to 50 bases by the end of 2008). This makes sequence alignment an even more daunting task. Presented by the end of 2007, the SHARCGS assembler<sup id="cite_ref-4" class="reference"><font size="2">[5]</font></sup> by Dohm et al. was the first published assembler that was used for an assembly with Solexa reads, quickly followed by a number of others.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">Ironically, technological development of sequencing continued to improve in the wrong way (from a sequence assembly point of view). Since 2006, the Solexa technology is available and heavily used to generate roundabout 100 million reads per day on a single sequencing machine. Compare this to the 35 million reads of the human genome project which needed several years to be produced on hundreds of sequencing machines. The downside is that these reads have a length of only 36 bases (expected to grow to 50 bases by the end of 2008). This makes sequence alignment an even more daunting task. Presented by the end of 2007, the SHARCGS assembler<sup id="cite_ref-4" class="reference">[5]</sup> by Dohm et al. was the first published assembler that was used for an assembly with Solexa reads, quickly followed by a number of others.</span></span></p>
<h2><span style="color: #000000"><span id="Greedy_algorithm" class="mw-headline">Greedy algorithm</span></span></h2>
+
<h2><span style="font-size: small"><span style="color: #000000"><span id="Greedy_algorithm" class="mw-headline">Greedy algorithm</span></span></span></h2>
<p><span style="color: #000000">Given a set of sequence fragments the object is to find the Shortest common supersequence.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">Given a set of sequence fragments the object is to find the Shortest common supersequence.</span></span></p>
 
<ol>
 
<ol>
     <li><span style="color: #000000">calculate pairwise alignments of all fragments</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">calculate pairwise alignments of all fragments</span> </span></li>
     <li><span style="color: #000000">choose two fragments with the largest overlap</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">choose two fragments with the largest overlap</span> </span></li>
     <li><span style="color: #000000">merge chosen fragments</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">merge chosen fragments</span> </span></li>
     <li><span style="color: #000000">repeat step 2. and 3. until only one fragment is left</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">repeat step 2. and 3. until only one fragment is left</span> </span></li>
 
</ol>
 
</ol>
<p><span style="color: #000000">The result is a suboptimal solution to the problem.</span></p>
+
<p><span style="font-size: small"><span style="color: #000000">The result is a suboptimal solution to the problem.</span></span></p>
<h2><span style="color: #000000"><span id="Available_assemblers" class="mw-headline">Available assemblers</span></span></h2>
+
<h2><span style="font-size: small"><span style="color: #000000"><span id="Available_assemblers" class="mw-headline">Available assemblers</span></span></span></h2>
<p><span style="color: #000000">The following table lists assemblers that have a de-novo assembly capability on at least one of the supported technologies.<sup id="cite_ref-5" class="reference"><font size="2">[6]</font></sup></span></p>
+
<p><span style="font-size: small"><span style="color: #000000">The following table lists assemblers that have a de-novo assembly capability on at least one of the supported technologies.<sup id="cite_ref-5" class="reference">[6]</sup></span></span></p>
 
<p>
 
<p>
 
<table class="wikitable" border="1">
 
<table class="wikitable" border="1">
 
     <tbody>
 
     <tbody>
 
         <tr>
 
         <tr>
             <th><span style="color: #000000">Name</span></th>
+
             <th><span style="font-size: small"><span style="color: #000000">Name</span></span></th>
             <th><span style="color: #000000">Type</span></th>
+
             <th><span style="font-size: small"><span style="color: #000000">Type</span></span></th>
             <th><span style="color: #000000">Technologies</span></th>
+
             <th><span style="font-size: small"><span style="color: #000000">Technologies</span></span></th>
             <th><span style="color: #000000">Author</span></th>
+
             <th><span style="font-size: small"><span style="color: #000000">Author</span></span></th>
             <th><span style="color: #000000">Presented / </span>
+
             <th><span style="font-size: small"><span style="color: #000000">Presented / </span></span>
             <p><span style="color: #000000">Last updated</span></p>
+
             <p><span style="font-size: small"><span style="color: #000000">Last updated</span></span></p>
 
             </th>
 
             </th>
             <th><span style="color: #000000">Licence*</span></th>
+
             <th><span style="font-size: small"><span style="color: #000000">Licence*</span></span></th>
             <th><span style="color: #000000">Homepage</span></th>
+
             <th><span style="font-size: small"><span style="color: #000000">Homepage</span></span></th>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">ABySS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">ABySS</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Solexa, SOLiD</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Solexa, SOLiD</span></span></td>
             <td><span style="color: #000000">Simpson, J. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Simpson, J. et al.</span></span></td>
             <td><span style="color: #000000">2008 / 2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2008 / 2010</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">AMOS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">AMOS</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Sanger, 454</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454</span></span></td>
             <td><span style="color: #000000">Salzberg, S. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Salzberg, S. et al.</span></span></td>
             <td><span style="color: #000000">2002? / 2008?</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2002? / 2008?</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Celera WGA Assembler / CABOG</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Celera WGA Assembler / CABOG</span></span></td>
             <td><span style="color: #000000">(large) genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(large) genomes</span></span></td>
             <td><span style="color: #000000">Sanger, 454, Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454, Solexa</span></span></td>
             <td><span style="color: #000000">Myers, G. et al.; Miller G. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Myers, G. et al.; Miller G. et al.</span></span></td>
             <td><span style="color: #000000">2004 / 2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2004 / 2010</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">CLC Genomics Workbench</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">CLC Genomics Workbench</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Sanger, 454, Solexa, SOLiD</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454, Solexa, SOLiD</span></span></td>
             <td><span style="color: #000000">CLC bio</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">CLC bio</span></span></td>
             <td><span style="color: #000000">2008 / 2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2008 / 2010</span></span></td>
             <td><span style="color: #000000">C</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">C</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Edena</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Edena</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Solexa</span></span></td>
             <td><span style="color: #000000">D. Hernandez, P. Fran&ccedil;ois, L. Farinelli, M. Osteras, and J. Schrenzel.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">D. Hernandez, P. Fran&ccedil;ois, L. Farinelli, M. Osteras, and J. Schrenzel.</span></span></td>
             <td><span style="color: #000000">2008</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2008</span></span></td>
             <td><span style="color: #000000">C</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">C</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Euler</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Euler</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Sanger, 454 (,Solexa&nbsp;?)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454 (,Solexa&nbsp;?)</span></span></td>
             <td><span style="color: #000000">Pevzner, P. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Pevzner, P. et al.</span></span></td>
             <td><span style="color: #000000">2001 / 2006?</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2001 / 2006?</span></span></td>
             <td><span style="color: #000000">(C / NC-A?)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(C / NC-A?)</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Euler-sr</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Euler-sr</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">454, Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">454, Solexa</span></span></td>
             <td><span style="color: #000000">Chaisson, MJ. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Chaisson, MJ. et al.</span></span></td>
             <td><span style="color: #000000">2008</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2008</span></span></td>
             <td><span style="color: #000000">NC-A</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">NC-A</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Forge</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Forge</span></span></td>
             <td><span style="color: #000000">(large) genomes, EST, metagenomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(large) genomes, EST, metagenomes</span></span></td>
             <td><span style="color: #000000">454, Solexa , SOLID, Sanger</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">454, Solexa , SOLID, Sanger</span></span></td>
             <td><span style="color: #000000">Platt, DM, Evers, D.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Platt, DM, Evers, D.</span></span></td>
             <td><span style="color: #000000">2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2010</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Geneious</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Geneious</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Sanger, 454, Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454, Solexa</span></span></td>
             <td><span style="color: #000000">Biomatters Ltd</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Biomatters Ltd</span></span></td>
             <td><span style="color: #000000">2009 / 2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2009 / 2010</span></span></td>
             <td><span style="color: #000000">C</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">C</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">IDBA (Iterative De Bruijn graph short read Assembler)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">IDBA (Iterative De Bruijn graph short read Assembler)</span></span></td>
             <td><span style="color: #000000">(large) genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(large) genomes</span></span></td>
             <td><span style="color: #000000">Sanger</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger</span></span></td>
             <td><span style="color: #000000">Yu Peng, Henry C. M. Leung, Siu-Ming Yiu, Francis Y. L. Chin</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Yu Peng, Henry C. M. Leung, Siu-Ming Yiu, Francis Y. L. Chin</span></span></td>
             <td><span style="color: #000000">2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2010</span></span></td>
             <td><span style="color: #000000">(C / NC-A?)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(C / NC-A?)</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">MIRA (Mimicking Intelligent Read Assembly)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">MIRA (Mimicking Intelligent Read Assembly)</span></span></td>
             <td><span style="color: #000000">genomes, ESTs</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes, ESTs</span></span></td>
             <td><span style="color: #000000">Sanger, 454, Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454, Solexa</span></span></td>
             <td><span style="color: #000000">Chevreux, B.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Chevreux, B.</span></span></td>
             <td><span style="color: #000000">1998 / 2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">1998 / 2010</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">NextGENe</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">NextGENe</span></span></td>
             <td><span style="color: #000000">(small genomes?)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(small genomes?)</span></span></td>
             <td><span style="color: #000000">454, Solexa, SOLiD</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">454, Solexa, SOLiD</span></span></td>
             <td><span style="color: #000000">Softgenetics</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Softgenetics</span></span></td>
             <td><span style="color: #000000">2008</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2008</span></span></td>
             <td><span style="color: #000000">C</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">C</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Newbler</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Newbler</span></span></td>
             <td><span style="color: #000000">genomes, ESTs</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes, ESTs</span></span></td>
             <td><span style="color: #000000">454, Sanger</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">454, Sanger</span></span></td>
             <td><span style="color: #000000">454/Roche</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">454/Roche</span></span></td>
             <td><span style="color: #000000">2009</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2009</span></span></td>
             <td><span style="color: #000000">C</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">C</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Phrap</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Phrap</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Sanger, 454</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454</span></span></td>
             <td><span style="color: #000000">Green, P.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Green, P.</span></span></td>
             <td><span style="color: #000000">2002 / 2003 / 2008</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2002 / 2003 / 2008</span></span></td>
             <td><span style="color: #000000">C / NC-A</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">C / NC-A</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">TIGR Assembler</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">TIGR Assembler</span></span></td>
             <td><span style="color: #000000">genomic</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomic</span></span></td>
             <td><span style="color: #000000">Sanger</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger</span></span></td>
             <td><span style="color: #000000">-</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">-</span></span></td>
             <td><span style="color: #000000">1995 / 2003</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">1995 / 2003</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Ray<sup id="cite_ref-6" class="reference"><font size="2">[7]</font></sup></span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Ray<sup id="cite_ref-6" class="reference">[7]</sup></span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Illumina, mix of Illumina and 454, paired or not</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Illumina, mix of Illumina and 454, paired or not</span></span></td>
             <td><span style="color: #000000">S&eacute;bastien Boisvert, Fran&ccedil;ois Laviolette &amp; Jacques Corbeil.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">S&eacute;bastien Boisvert, Fran&ccedil;ois Laviolette &amp; Jacques Corbeil.</span></span></td>
             <td><span style="color: #000000">2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2010</span></span></td>
             <td><span style="color: #000000">OS [GNU General Public License]</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS [GNU General Public License]</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Sequencher</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sequencher</span></span></td>
             <td><span style="color: #000000">(small) genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(small) genomes</span></span></td>
             <td><span style="color: #000000">Sanger</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger</span></span></td>
             <td><span style="color: #000000">Gene Codes Corporation</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Gene Codes Corporation</span></span></td>
             <td><span style="color: #000000">1991 / 2009</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">1991 / 2009</span></span></td>
             <td><span style="color: #000000">C</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">C</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">SeqMan NGen</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">SeqMan NGen</span></span></td>
             <td><span style="color: #000000">(small) genomes, ESTs</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(small) genomes, ESTs</span></span></td>
             <td><span style="color: #000000">Sanger, 454, Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454, Solexa</span></span></td>
             <td><span style="color: #000000">DNASTAR</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">DNASTAR</span></span></td>
             <td><span style="color: #000000">&nbsp;? / 2008</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">&nbsp;? / 2008</span></span></td>
             <td><span style="color: #000000">C</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">C</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">SHARCGS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">SHARCGS</span></span></td>
             <td><span style="color: #000000">(small) genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(small) genomes</span></span></td>
             <td><span style="color: #000000">Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Solexa</span></span></td>
             <td><span style="color: #000000">Dohm et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Dohm et al.</span></span></td>
             <td><span style="color: #000000">2007 / 2007</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2007 / 2007</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">SOPRA</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">SOPRA</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Solexa, SOLiD, Sanger, 454</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Solexa, SOLiD, Sanger, 454</span></span></td>
             <td><span style="color: #000000">Dayarian, A. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Dayarian, A. et al.</span></span></td>
             <td><span style="color: #000000">2010 / 2010</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2010 / 2010</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">SSAKE</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">SSAKE</span></span></td>
             <td><span style="color: #000000">(small) genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(small) genomes</span></span></td>
             <td><span style="color: #000000">Solexa (SOLiD? Helicos?)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Solexa (SOLiD? Helicos?)</span></span></td>
             <td><span style="color: #000000">Warren, R. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Warren, R. et al.</span></span></td>
             <td><span style="color: #000000">2007 / 2007</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2007 / 2007</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">SOAPdenovo</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">SOAPdenovo</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Solexa</span></span></td>
             <td><span style="color: #000000">Li, R. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Li, R. et al.</span></span></td>
             <td><span style="color: #000000">2009 / 2009</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2009 / 2009</span></span></td>
             <td><span style="color: #000000">Closed</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Closed</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Staden gap4 package</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Staden gap4 package</span></span></td>
             <td><span style="color: #000000">BACs (, small genomes?)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">BACs (, small genomes?)</span></span></td>
             <td><span style="color: #000000">Sanger</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger</span></span></td>
             <td><span style="color: #000000">Staden et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Staden et al.</span></span></td>
             <td><span style="color: #000000">1991 / 2008</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">1991 / 2008</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">VCAKE</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">VCAKE</span></span></td>
             <td><span style="color: #000000">(small) genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(small) genomes</span></span></td>
             <td><span style="color: #000000">Solexa (SOLiD?, Helicos?)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Solexa (SOLiD?, Helicos?)</span></span></td>
             <td><span style="color: #000000">Jeck, W. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Jeck, W. et al.</span></span></td>
             <td><span style="color: #000000">2007 / 2007</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2007 / 2007</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Phusion assembler</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Phusion assembler</span></span></td>
             <td><span style="color: #000000">(large) genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(large) genomes</span></span></td>
             <td><span style="color: #000000">Sanger</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger</span></span></td>
             <td><span style="color: #000000">Mullikin JC, et.al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Mullikin JC, et.al.</span></span></td>
             <td><span style="color: #000000">2003</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2003</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Quality Value Guided SRA (QSRA)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Quality Value Guided SRA (QSRA)</span></span></td>
             <td><span style="color: #000000">genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">genomes</span></span></td>
             <td><span style="color: #000000">Sanger, Solexa</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, Solexa</span></span></td>
             <td><span style="color: #000000">Bryant DW, et.al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Bryant DW, et.al.</span></span></td>
             <td><span style="color: #000000">2009</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2009</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td><span style="color: #000000">Velvet (algorithm)</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Velvet (algorithm)</span></span></td>
             <td><span style="color: #000000">(small) genomes</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">(small) genomes</span></span></td>
             <td><span style="color: #000000">Sanger, 454, Solexa, SOLiD</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Sanger, 454, Solexa, SOLiD</span></span></td>
             <td><span style="color: #000000">Zerbino, D. et al.</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">Zerbino, D. et al.</span></span></td>
             <td><span style="color: #000000">2007 / 2009</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">2007 / 2009</span></span></td>
             <td><span style="color: #000000">OS</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">OS</span></span></td>
             <td><span style="color: #000000">link</span></td>
+
             <td><span style="font-size: small"><span style="color: #000000">link</span></span></td>
 
         </tr>
 
         </tr>
 
         <tr>
 
         <tr>
             <td style="border-top: #333 1px solid" colspan="7"><span style="color: #000000"><small><font size="2">*<b>Licences:</b> OS = Open Source; C = Commercial; C / NC-A = Commercial but free for non-commercial and academics; Brackets = unclear, but most likely C / NC-A</font></small></span></td>
+
             <td style="border-top: #333 1px solid" colspan="7"><span style="font-size: small"><span style="color: #000000"><small>*<b>Licences:</b> OS = Open Source; C = Commercial; C / NC-A = Commercial but free for non-commercial and academics; Brackets = unclear, but most likely C / NC-A</small></span></span></td>
 
         </tr>
 
         </tr>
 
     </tbody>
 
     </tbody>
 
</table>
 
</table>
 
</p>
 
</p>
<h2><span style="color: #000000"><span id="See_also" class="mw-headline">See also</span></span></h2>
+
<h2><span style="font-size: large"><span style="color: #000000"><span id="See_also" class="mw-headline">See also</span></span></span></h2>
 
<ul>
 
<ul>
     <li><span style="color: #000000">Sequence alignment</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">[[Sequence alignment]]</span> </span></li>
     <li><span style="color: #000000">Genome assembly</span></li>
+
     <li><span style="font-size: small"><span style="color: #000000">Genome assembly</span> </span></li>
 
</ul>
 
</ul>
<h2><span id="References" class="mw-headline">References</span></h2>
+
<h2><span style="font-size: large"><span id="References" class="mw-headline">References</span></span></h2>
 
<ol class="references">
 
<ol class="references">
     <li id="cite_note-0"><b><a href="#cite_ref-0"><font color="#0645ad">^</font></a></b> <span class="citation Journal">Myers EW, Sutton GG, Delcher AL, <i>et al.</i> (March 2000). <a class="external text" href="http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=10731133" rel="nofollow"><font color="#3366bb">&quot;A whole-genome assembly of Drosophila&quot;</font></a>. <i>Science</i> <b>287</b> (5461): 2196&ndash;204. <a class="mw-redirect" title="PubMed Identifier" href="/wiki/PubMed_Identifier"><font color="#0645ad">PMID</font></a>&nbsp;<a class="external text" href="http://www.ncbi.nlm.nih.gov/pubmed/10731133" rel="nofollow"><font color="#3366bb">10731133</font></a><span class="printonly">. <a class="external free" href="http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=10731133" rel="nofollow"><font color="#3366bb">http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=10731133</font></a></span>.</span><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.genre=article&amp;rft.atitle=A+whole-genome+assembly+of+Drosophila&amp;rft.jtitle=Science&amp;rft.aulast=Myers+EW%2C+Sutton+GG%2C+Delcher+AL%2C+%27%27et+al.%27%27&amp;rft.au=Myers+EW%2C+Sutton+GG%2C+Delcher+AL%2C+%27%27et+al.%27%27&amp;rft.date=March+2000&amp;rft.volume=287&amp;rft.issue=5461&amp;rft.pages=2196%E2%80%93204&amp;rft_id=info:pmid/10731133&amp;rft_id=http%3A%2F%2Fwww.sciencemag.org%2Fcgi%2Fpmidlookup%3Fview%3Dlong%26pmid%3D10731133&amp;rfr_id=info:sid/en.wikipedia.org:Sequence_assembly"><span style="display: none">&nbsp;</span></span></li>
+
     <li id="cite_note-0"><span style="font-size: small"><b><a href="#cite_ref-0"><font color="#0645ad">^</font></a></b></span><span style="font-size: small"> <span class="citation Journal">Myers EW, Sutton GG, Delcher AL, <i>et al.</i> (March 2000). </span></span><span class="citation Journal"><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=10731133"><font color="#3366bb">&quot;A whole-genome assembly of Drosophila&quot;</font></a></span><span style="font-size: small">. <i>Science</i> <b>287</b> (5461): 2196&ndash;204. </span><span style="font-size: small"><a class="mw-redirect" title="PubMed Identifier" href="/wiki/PubMed_Identifier"><font color="#0645ad">PMID</font></a></span><span style="font-size: small">&nbsp;</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.ncbi.nlm.nih.gov/pubmed/10731133"><font color="#3366bb">10731133</font></a></span><span style="font-size: small"><span class="printonly">. </span></span></span><span style="font-size: large"><span class="citation Journal"><span class="printonly"><a class="external free" rel="nofollow" href="http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=10731133"><span style="font-size: small"><font color="#3366bb">http://www.sciencemag.org/cgi/pmidlookup?view=long&amp;pmid=10731133</font></span></a></span></span></span><span style="font-size: small"><span class="citation Journal">.</span><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.genre=article&amp;rft.atitle=A+whole-genome+assembly+of+Drosophila&amp;rft.jtitle=Science&amp;rft.aulast=Myers+EW%2C+Sutton+GG%2C+Delcher+AL%2C+%27%27et+al.%27%27&amp;rft.au=Myers+EW%2C+Sutton+GG%2C+Delcher+AL%2C+%27%27et+al.%27%27&amp;rft.date=March+2000&amp;rft.volume=287&amp;rft.issue=5461&amp;rft.pages=2196%E2%80%93204&amp;rft_id=info:pmid/10731133&amp;rft_id=http%3A%2F%2Fwww.sciencemag.org%2Fcgi%2Fpmidlookup%3Fview%3Dlong%26pmid%3D10731133&amp;rfr_id=info:sid/en.wikipedia.org:Sequence_assembly"><span style="display: none">&nbsp;</span></span> </span></li>
     <li id="cite_note-1"><b><a href="#cite_ref-1"><font color="#0645ad">^</font></a></b> <span class="citation Journal">Batzoglou S, Jaffe DB, Stanley K, <i>et al.</i> (January 2002). <a class="external text" href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=11779843" rel="nofollow"><font color="#3366bb">&quot;ARACHNE: a whole-genome shotgun assembler&quot;</font></a>. <i>Genome Res.</i> <b>12</b> (1): 177&ndash;89. <a title="Digital object identifier" href="/wiki/Digital_object_identifier"><font color="#0645ad">doi</font></a>:<a class="external text" href="http://dx.doi.org/10.1101%2Fgr.208902" rel="nofollow"><font color="#3366bb">10.1101/gr.208902</font></a>. <a class="mw-redirect" title="PubMed Identifier" href="/wiki/PubMed_Identifier"><font color="#0645ad">PMID</font></a>&nbsp;<a class="external text" href="http://www.ncbi.nlm.nih.gov/pubmed/11779843" rel="nofollow"><font color="#3366bb">11779843</font></a>. <a title="PubMed Central" href="/wiki/PubMed_Central"><font color="#0645ad">PMC</font></a>&nbsp;<a class="external text" href="http://www.pubmedcentral.gov/articlerender.fcgi?tool=pmcentrez&amp;artid=155255" rel="nofollow"><font color="#3366bb">155255</font></a><span class="printonly">. <a class="external free" href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=11779843" rel="nofollow"><font color="#3366bb">http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=11779843</font></a></span>.</span><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.genre=article&amp;rft.atitle=ARACHNE%3A+a+whole-genome+shotgun+assembler&amp;rft.jtitle=Genome+Res.&amp;rft.aulast=Batzoglou+S%2C+Jaffe+DB%2C+Stanley+K%2C+%27%27et+al.%27%27&amp;rft.au=Batzoglou+S%2C+Jaffe+DB%2C+Stanley+K%2C+%27%27et+al.%27%27&amp;rft.date=January+2002&amp;rft.volume=12&amp;rft.issue=1&amp;rft.pages=177%E2%80%9389&amp;rft_id=info:doi/10.1101%2Fgr.208902&amp;rft_id=info:pmid/11779843&amp;rft_id=http%3A%2F%2Fwww.genome.org%2Fcgi%2Fpmidlookup%3Fview%3Dlong%26pmid%3D11779843&amp;rfr_id=info:sid/en.wikipedia.org:Sequence_assembly"><span style="display: none">&nbsp;</span></span></li>
+
     <li id="cite_note-1"><span style="font-size: small"><b><a href="#cite_ref-1"><font color="#0645ad">^</font></a></b></span><span style="font-size: small"> <span class="citation Journal">Batzoglou S, Jaffe DB, Stanley K, <i>et al.</i> (January 2002). </span></span><span class="citation Journal"><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=11779843"><font color="#3366bb">&quot;ARACHNE: a whole-genome shotgun assembler&quot;</font></a></span><span style="font-size: small">. <i>Genome Res.</i> <b>12</b> (1): 177&ndash;89. </span><span style="font-size: small"><a title="Digital object identifier" href="/wiki/Digital_object_identifier"><font color="#0645ad">doi</font></a></span><span style="font-size: small">:</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://dx.doi.org/10.1101%2Fgr.208902"><font color="#3366bb">10.1101/gr.208902</font></a></span><span style="font-size: small">. </span><span style="font-size: small"><a class="mw-redirect" title="PubMed Identifier" href="/wiki/PubMed_Identifier"><font color="#0645ad">PMID</font></a></span><span style="font-size: small">&nbsp;</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.ncbi.nlm.nih.gov/pubmed/11779843"><font color="#3366bb">11779843</font></a></span><span style="font-size: small">. </span><span style="font-size: small"><a title="PubMed Central" href="/wiki/PubMed_Central"><font color="#0645ad">PMC</font></a></span><span style="font-size: small">&nbsp;</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.pubmedcentral.gov/articlerender.fcgi?tool=pmcentrez&amp;artid=155255"><font color="#3366bb">155255</font></a></span><span style="font-size: small"><span class="printonly">. </span></span></span><span style="font-size: large"><span class="citation Journal"><span class="printonly"><a class="external free" rel="nofollow" href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=11779843"><span style="font-size: small"><font color="#3366bb">http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=11779843</font></span></a></span></span></span><span style="font-size: small"><span class="citation Journal">.</span><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.genre=article&amp;rft.atitle=ARACHNE%3A+a+whole-genome+shotgun+assembler&amp;rft.jtitle=Genome+Res.&amp;rft.aulast=Batzoglou+S%2C+Jaffe+DB%2C+Stanley+K%2C+%27%27et+al.%27%27&amp;rft.au=Batzoglou+S%2C+Jaffe+DB%2C+Stanley+K%2C+%27%27et+al.%27%27&amp;rft.date=January+2002&amp;rft.volume=12&amp;rft.issue=1&amp;rft.pages=177%E2%80%9389&amp;rft_id=info:doi/10.1101%2Fgr.208902&amp;rft_id=info:pmid/11779843&amp;rft_id=http%3A%2F%2Fwww.genome.org%2Fcgi%2Fpmidlookup%3Fview%3Dlong%26pmid%3D11779843&amp;rfr_id=info:sid/en.wikipedia.org:Sequence_assembly"><span style="display: none">&nbsp;</span></span> </span></li>
     <li id="cite_note-2"><b><a href="#cite_ref-2"><font color="#0645ad">^</font></a></b> <a class="external text" href="http://amos.sourceforge.net/" rel="nofollow"><font color="#3366bb">AMOS page</font></a> with links to various papers</li>
+
     <li id="cite_note-2"><span style="font-size: small"><b><a href="#cite_ref-2"><font color="#0645ad">^</font></a></b></span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://amos.sourceforge.net/"><font color="#3366bb">AMOS page</font></a></span><span style="font-size: small"> with links to various papers </span></li>
     <li id="cite_note-3"><b><a href="#cite_ref-3"><font color="#0645ad">^</font></a></b> Copy in Google groups of the <a class="external text" href="http://groups.google.com/group/bionet.software/browse_thread/thread/b34b348011d04f0e?fwc=1" rel="nofollow"><font color="#3366bb">post announcing MIRA 2.9.8 hybrid version</font></a> in the bionet.software Usenet group</li>
+
     <li id="cite_note-3"><span style="font-size: small"><b><a href="#cite_ref-3"><font color="#0645ad">^</font></a></b></span><span style="font-size: small"> Copy in Google groups of the </span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://groups.google.com/group/bionet.software/browse_thread/thread/b34b348011d04f0e?fwc=1"><font color="#3366bb">post announcing MIRA 2.9.8 hybrid version</font></a></span><span style="font-size: small"> in the bionet.software Usenet group </span></li>
     <li id="cite_note-4"><b><a href="#cite_ref-4"><font color="#0645ad">^</font></a></b> <span class="citation Journal">Dohm JC, Lottaz C, Borodina T, Himmelbauer H (November 2007). <a class="external text" href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=17908823" rel="nofollow"><font color="#3366bb">&quot;SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing&quot;</font></a>. <i>Genome Res.</i> <b>17</b> (11): 1697&ndash;706. <a title="Digital object identifier" href="/wiki/Digital_object_identifier"><font color="#0645ad">doi</font></a>:<a class="external text" href="http://dx.doi.org/10.1101%2Fgr.6435207" rel="nofollow"><font color="#3366bb">10.1101/gr.6435207</font></a>. <a class="mw-redirect" title="PubMed Identifier" href="/wiki/PubMed_Identifier"><font color="#0645ad">PMID</font></a>&nbsp;<a class="external text" href="http://www.ncbi.nlm.nih.gov/pubmed/17908823" rel="nofollow"><font color="#3366bb">17908823</font></a>. <a title="PubMed Central" href="/wiki/PubMed_Central"><font color="#0645ad">PMC</font></a>&nbsp;<a class="external text" href="http://www.pubmedcentral.gov/articlerender.fcgi?tool=pmcentrez&amp;artid=2045152" rel="nofollow"><font color="#3366bb">2045152</font></a><span class="printonly">. <a class="external free" href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=17908823" rel="nofollow"><font color="#3366bb">http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=17908823</font></a></span>.</span><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.genre=article&amp;rft.atitle=SHARCGS%2C+a+fast+and+highly+accurate+short-read+assembly+algorithm+for+de+novo+genomic+sequencing&amp;rft.jtitle=Genome+Res.&amp;rft.aulast=Dohm+JC%2C+Lottaz+C%2C+Borodina+T%2C+Himmelbauer+H&amp;rft.au=Dohm+JC%2C+Lottaz+C%2C+Borodina+T%2C+Himmelbauer+H&amp;rft.date=November+2007&amp;rft.volume=17&amp;rft.issue=11&amp;rft.pages=1697%E2%80%93706&amp;rft_id=info:doi/10.1101%2Fgr.6435207&amp;rft_id=info:pmid/17908823&amp;rft_id=http%3A%2F%2Fwww.genome.org%2Fcgi%2Fpmidlookup%3Fview%3Dlong%26pmid%3D17908823&amp;rfr_id=info:sid/en.wikipedia.org:Sequence_assembly"><span style="display: none">&nbsp;</span></span></li>
+
     <li id="cite_note-4"><span style="font-size: small"><b><a href="#cite_ref-4"><font color="#0645ad">^</font></a></b></span><span style="font-size: small"> <span class="citation Journal">Dohm JC, Lottaz C, Borodina T, Himmelbauer H (November 2007). </span></span><span class="citation Journal"><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=17908823"><font color="#3366bb">&quot;SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing&quot;</font></a></span><span style="font-size: small">. <i>Genome Res.</i> <b>17</b> (11): 1697&ndash;706. </span><span style="font-size: small"><a title="Digital object identifier" href="/wiki/Digital_object_identifier"><font color="#0645ad">doi</font></a></span><span style="font-size: small">:</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://dx.doi.org/10.1101%2Fgr.6435207"><font color="#3366bb">10.1101/gr.6435207</font></a></span><span style="font-size: small">. </span><span style="font-size: small"><a class="mw-redirect" title="PubMed Identifier" href="/wiki/PubMed_Identifier"><font color="#0645ad">PMID</font></a></span><span style="font-size: small">&nbsp;</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.ncbi.nlm.nih.gov/pubmed/17908823"><font color="#3366bb">17908823</font></a></span><span style="font-size: small">. </span><span style="font-size: small"><a title="PubMed Central" href="/wiki/PubMed_Central"><font color="#0645ad">PMC</font></a></span><span style="font-size: small">&nbsp;</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.pubmedcentral.gov/articlerender.fcgi?tool=pmcentrez&amp;artid=2045152"><font color="#3366bb">2045152</font></a></span><span style="font-size: small"><span class="printonly">. </span></span></span><span style="font-size: large"><span class="citation Journal"><span class="printonly"><a class="external free" rel="nofollow" href="http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=17908823"><span style="font-size: small"><font color="#3366bb">http://www.genome.org/cgi/pmidlookup?view=long&amp;pmid=17908823</font></span></a></span></span></span><span style="font-size: small"><span class="citation Journal">.</span><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.genre=article&amp;rft.atitle=SHARCGS%2C+a+fast+and+highly+accurate+short-read+assembly+algorithm+for+de+novo+genomic+sequencing&amp;rft.jtitle=Genome+Res.&amp;rft.aulast=Dohm+JC%2C+Lottaz+C%2C+Borodina+T%2C+Himmelbauer+H&amp;rft.au=Dohm+JC%2C+Lottaz+C%2C+Borodina+T%2C+Himmelbauer+H&amp;rft.date=November+2007&amp;rft.volume=17&amp;rft.issue=11&amp;rft.pages=1697%E2%80%93706&amp;rft_id=info:doi/10.1101%2Fgr.6435207&amp;rft_id=info:pmid/17908823&amp;rft_id=http%3A%2F%2Fwww.genome.org%2Fcgi%2Fpmidlookup%3Fview%3Dlong%26pmid%3D17908823&amp;rfr_id=info:sid/en.wikipedia.org:Sequence_assembly"><span style="display: none">&nbsp;</span></span> </span></li>
     <li id="cite_note-5"><b><a href="#cite_ref-5"><font color="#0645ad">^</font></a></b> <a class="external text" href="http://seqanswers.com/forums/showthread.php?t=43" rel="nofollow"><font color="#3366bb">list of software including mapping assemblers in the SeqAnswers discussion forum.</font></a></li>
+
     <li id="cite_note-5"><span style="font-size: small"><b><a href="#cite_ref-5"><font color="#0645ad">^</font></a></b></span><span style="font-size: medium"><a class="external text" rel="nofollow" href="http://seqanswers.com/forums/showthread.php?t=43"><span style="font-size: small"><font color="#3366bb">list of software including mapping assemblers in the SeqAnswers discussion forum.</font></span></a></span></li>
     <li id="cite_note-6"><b><a href="#cite_ref-6"><font color="#0645ad">^</font></a></b> <span class="citation Journal">Boisvert S, Laviolette F, Corbeil J. (October 2010). <a class="external text" href="http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0238" rel="nofollow"><font color="#3366bb">&quot;Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.&quot;</font></a>. <i>J Comput Biol.</i> <b>17</b> (11): 1519-33. <a title="Digital object identifier" href="/wiki/Digital_object_identifier"><font color="#0645ad">doi</font></a>:<a class="external text" href="http://dx.doi.org/10.1089%2Fcmb.2009.0238" rel="nofollow"><font color="#3366bb">10.1089/cmb.2009.0238</font></a>. <a class="mw-redirect" title="PubMed Identifier" href="/wiki/PubMed_Identifier"><font color="#0645ad">PMID</font></a>&nbsp;<a class="external text" href="http://www.ncbi.nlm.nih.gov/pubmed/20958248" rel="nofollow"><font color="#3366bb">20958248</font></a><span class="printonly">. <a class="external free" href="http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0238" rel="nofollow"><font color="#3366bb">http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0238</font></a></span>.</span></li>
+
     <li id="cite_note-6"><span style="font-size: small"><b><a href="#cite_ref-6"><font color="#0645ad">^</font></a></b></span><span style="font-size: small"> <span class="citation Journal">Boisvert S, Laviolette F, Corbeil J. (October 2010). </span></span><span class="citation Journal"><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0238"><font color="#3366bb">&quot;Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.&quot;</font></a></span><span style="font-size: small">. <i>J Comput Biol.</i> <b>17</b> (11): 1519-33. </span><span style="font-size: small"><a title="Digital object identifier" href="/wiki/Digital_object_identifier"><font color="#0645ad">doi</font></a></span><span style="font-size: small">:</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://dx.doi.org/10.1089%2Fcmb.2009.0238"><font color="#3366bb">10.1089/cmb.2009.0238</font></a></span><span style="font-size: small">. </span><span style="font-size: small"><a class="mw-redirect" title="PubMed Identifier" href="/wiki/PubMed_Identifier"><font color="#0645ad">PMID</font></a></span><span style="font-size: small">&nbsp;</span><span style="font-size: small"><a class="external text" rel="nofollow" href="http://www.ncbi.nlm.nih.gov/pubmed/20958248"><font color="#3366bb">20958248</font></a></span><span style="font-size: small"><span class="printonly">. </span></span></span><span style="font-size: large"><span class="citation Journal"><span class="printonly"><a class="external free" rel="nofollow" href="http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0238"><span style="font-size: small"><font color="#3366bb">http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0238</font></span></a></span></span></span><span style="font-size: small"><span class="citation Journal">.</span> </span></li>
 
</ol>
 
</ol>

Latest revision as of 00:13, 19 December 2010

In bioinformatics, sequence assembly refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence.

This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases.

Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).

Sequence assembly as reconstructing a book

 

The problem of sequence assembly can be compared to taking many copies of a book, passing them all through a shredder, and piecing a copy of the book back together from only shredded pieces. The book may have many repeated paragraphs, and some shreds may be modified to have typos. Excerpts from another book may be added in, and some shreds may be completely unrecognizable.

 

Genome assemblers

The first sequence assemblers began to appear in the late 1980s and early 1990s as variants of simpler sequence alignment programs to piece together vast quantities of fragments generated by automated sequencing instruments called DNA sequencers.

As the sequenced organisms grew in size and complexity, the assembly programs needed to increasingly employ sophisticated strategies to handle:

  • terabytes of data which need processing on computing clusters;
  • identical and nearly identical sequences (known as repeats) which can, in the worst case, increase the time and space complexity of algorithms exponentially;
  • and errors in the fragments from the sequencing instruments, which can confound assembly.

Faced with the challenge of assembling the first larger eukaryotic genomes, the fruit fly Drosophila melanogaster, in 2000 and the human genome in 2001, scientists developed assemblers such as Celera Assembler[1] and Arachne[2] able to handle genomes of 100,000,000 - 300,000,000 base pairs. Subsequent to these efforts, several other groups, mostly at the major genome sequencing centers, built large-scale assemblers, and an open source effort known as AMOS[3] was launched to bring together all the innovations in genome assembly technology under the open source framework.

EST assemblers

EST assembly differs from genome assembly in several ways. The sequences for EST assembly are the transcribed mRNA of a cell and represent only a subset of the whole genome. At a first glance, underlying algorithmical problems differ between genome and EST assembly. For instance, genomes often have large amounts of repetitive sequences, mainly in the inter-genic parts. Since ESTs represent gene transcripts, they will not contain these repeats. On the other hand, cells tend to have a certain number of genes that are constantly expressed in very high amounts (housekeeping genes), which again leads to the problem of similar sequences present in high amounts in the data set to be assembled.

Furthermore, genes sometimes overlap in the genome (sense-antisense transcription), and should ideally still be assembled separately. EST assembly is also complicated by features like (cis-) alternative splicing, trans-splicing, single-nucleotide polymorphism, recoding, and post-transcriptional modification.

De-novo vs. mapping assembly

In sequence assembly, two different types can be distinguished:

  1. de-novo: assembling reads together so that they form a new, previously unknown sequence
  2. mapping: assembling reads against an existing backbone sequence, building a sequence that is similar but not necessarily identical to the backbone sequence

In terms of complexity and time requirements, de-novo assemblies are orders of magnitude slower and more memory intensive than mapping assemblies. This is mostly due to the fact that the assembly algorithm need to compare every read with every other read (an operation that is has a complexity of O(n2) but can be reduced to O(n log(n)). Referring to the comparison drawn to shredded books in the introduction: while for mapping assemblies one would have a very similar book as template (perhaps with the names of the main characters and a few locations changed), the de-novo assemblies are more hardcore in a sense as one would not know beforehand whether this would become a science book, or a novel, or a catalogue etc.

Influence of technological changes

The complexity of sequence assembly is driven by two major factors: the number of fragments and their lengths. While more and longer fragments allow better identification of sequence overlaps, they also pose problems as the underlying algorithms show quadratic or even exponential complexity behaviour to both number of fragments and their length. And while shorter sequences are faster to align, they also complicate the layout phase of an assembly as shorter reads are more difficult to use with repeats or near identical repeats.

In the earliest days of DNA sequencing, scientists could only gain a few sequences of short length (some dozen bases) after weeks of work in laboratories. Hence, these sequences could be aligned in a few minutes by hand.

In 1975, the Dideoxy termination method (also known as Sanger sequencing) was invented and until shortly after 2000, the technology was improved up to a point were fully automated machines could churn out sequences in a highly parallelised mode 24 hours a day. Large genome centers around the world housed complete farms of these sequencing machines, which in turn led to the necessity of assemblers to be optimised for sequences from whole-genome shotgun sequencing projects where the reads

  • are about 800–900 bases long
  • contain sequencing artifacts like sequencing and cloning vectors
  • have error rates between 0.5 and 10%

With the Sanger technology, bacterial projects with 20,000 to 200,000 reads could easily be assembled on one computer. Larger ones like the human genome with approximately 35 million reads needed already large computing farms and distributed computing.

By 2004 / 2005, pyrosequencing had been brought to commercial viability by 454 Life Sciences. This new sequencing methods generated reads much shorter than from Sanger sequencing: initially about 100 bases, now 400 bases and expected to grow to 1000 bases by the end of 2010. However, due to the much higher throughput and lower cost than Sanger sequencing, the adoption of this technology by genome centers pushed development of sequence assemblers to deal with this new type of sequences. The sheer amount of data coupled with technology specific error patterns in the reads delayed development of assemblers, at the beginning in 2004 only the Newbler assembler from 454 was available. Presented in mid 2007[4], the hybrid version of the MIRA assembler by Chevreux et al. was the first freely available assembler who could assemble 454 reads and mixtures of 454 reads and Sanger reads; using sequences from different sequencing technologies was subsequently coined hybrid assembly.

Ironically, technological development of sequencing continued to improve in the wrong way (from a sequence assembly point of view). Since 2006, the Solexa technology is available and heavily used to generate roundabout 100 million reads per day on a single sequencing machine. Compare this to the 35 million reads of the human genome project which needed several years to be produced on hundreds of sequencing machines. The downside is that these reads have a length of only 36 bases (expected to grow to 50 bases by the end of 2008). This makes sequence alignment an even more daunting task. Presented by the end of 2007, the SHARCGS assembler[5] by Dohm et al. was the first published assembler that was used for an assembly with Solexa reads, quickly followed by a number of others.

Greedy algorithm

Given a set of sequence fragments the object is to find the Shortest common supersequence.

  1. calculate pairwise alignments of all fragments
  2. choose two fragments with the largest overlap
  3. merge chosen fragments
  4. repeat step 2. and 3. until only one fragment is left

The result is a suboptimal solution to the problem.

Available assemblers

The following table lists assemblers that have a de-novo assembly capability on at least one of the supported technologies.[6]

Name Type Technologies Author Presented /

Last updated

Licence* Homepage
ABySS genomes Solexa, SOLiD Simpson, J. et al. 2008 / 2010 OS link
AMOS genomes Sanger, 454 Salzberg, S. et al. 2002? / 2008? OS link
Celera WGA Assembler / CABOG (large) genomes Sanger, 454, Solexa Myers, G. et al.; Miller G. et al. 2004 / 2010 OS link
CLC Genomics Workbench genomes Sanger, 454, Solexa, SOLiD CLC bio 2008 / 2010 C link
Edena genomes Solexa D. Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel. 2008 C link
Euler genomes Sanger, 454 (,Solexa ?) Pevzner, P. et al. 2001 / 2006? (C / NC-A?) link
Euler-sr genomes 454, Solexa Chaisson, MJ. et al. 2008 NC-A link
Forge (large) genomes, EST, metagenomes 454, Solexa , SOLID, Sanger Platt, DM, Evers, D. 2010 OS link
Geneious genomes Sanger, 454, Solexa Biomatters Ltd 2009 / 2010 C link
IDBA (Iterative De Bruijn graph short read Assembler) (large) genomes Sanger Yu Peng, Henry C. M. Leung, Siu-Ming Yiu, Francis Y. L. Chin 2010 (C / NC-A?) link
MIRA (Mimicking Intelligent Read Assembly) genomes, ESTs Sanger, 454, Solexa Chevreux, B. 1998 / 2010 OS link
NextGENe (small genomes?) 454, Solexa, SOLiD Softgenetics 2008 C link
Newbler genomes, ESTs 454, Sanger 454/Roche 2009 C link
Phrap genomes Sanger, 454 Green, P. 2002 / 2003 / 2008 C / NC-A link
TIGR Assembler genomic Sanger - 1995 / 2003 OS link
Ray[7] genomes Illumina, mix of Illumina and 454, paired or not Sébastien Boisvert, François Laviolette & Jacques Corbeil. 2010 OS [GNU General Public License] link
Sequencher (small) genomes Sanger Gene Codes Corporation 1991 / 2009 C link
SeqMan NGen (small) genomes, ESTs Sanger, 454, Solexa DNASTAR  ? / 2008 C link
SHARCGS (small) genomes Solexa Dohm et al. 2007 / 2007 OS link
SOPRA genomes Solexa, SOLiD, Sanger, 454 Dayarian, A. et al. 2010 / 2010 OS link
SSAKE (small) genomes Solexa (SOLiD? Helicos?) Warren, R. et al. 2007 / 2007 OS link
SOAPdenovo genomes Solexa Li, R. et al. 2009 / 2009 Closed link
Staden gap4 package BACs (, small genomes?) Sanger Staden et al. 1991 / 2008 OS link
VCAKE (small) genomes Solexa (SOLiD?, Helicos?) Jeck, W. et al. 2007 / 2007 OS link
Phusion assembler (large) genomes Sanger Mullikin JC, et.al. 2003 OS link
Quality Value Guided SRA (QSRA) genomes Sanger, Solexa Bryant DW, et.al. 2009 OS link
Velvet (algorithm) (small) genomes Sanger, 454, Solexa, SOLiD Zerbino, D. et al. 2007 / 2009 OS link
*Licences: OS = Open Source; C = Commercial; C / NC-A = Commercial but free for non-commercial and academics; Brackets = unclear, but most likely C / NC-A

See also

References

  1. ^ Myers EW, Sutton GG, Delcher AL, et al. (March 2000). "A whole-genome assembly of Drosophila". Science 287 (5461): 2196–204. PMID 10731133. http://www.sciencemag.org/cgi/pmidlookup?view=long&pmid=10731133. 
  2. ^ Batzoglou S, Jaffe DB, Stanley K, et al. (January 2002). "ARACHNE: a whole-genome shotgun assembler". Genome Res. 12 (1): 177–89. doi:10.1101/gr.208902. PMID 11779843. PMC 155255. http://www.genome.org/cgi/pmidlookup?view=long&pmid=11779843. 
  3. ^AMOS page with links to various papers
  4. ^ Copy in Google groups of the post announcing MIRA 2.9.8 hybrid version in the bionet.software Usenet group
  5. ^ Dohm JC, Lottaz C, Borodina T, Himmelbauer H (November 2007). "SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing". Genome Res. 17 (11): 1697–706. doi:10.1101/gr.6435207. PMID 17908823. PMC 2045152. http://www.genome.org/cgi/pmidlookup?view=long&pmid=17908823. 
  6. ^list of software including mapping assemblers in the SeqAnswers discussion forum.
  7. ^ Boisvert S, Laviolette F, Corbeil J. (October 2010). "Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.". J Comput Biol. 17 (11): 1519-33. doi:10.1089/cmb.2009.0238. PMID 20958248. http://www.liebertonline.com/doi/abs/10.1089/cmb.2009.0238.