|Mail to a friend · Print this article · Previous Columns|
Tony Phillips' Take on Math in the Media
A monthly survey of math news
"The sequence and de novo assembly of the giant panda genome" appeared in the January 21 2010 Nature. The over 100 authors were led by a 5-person squad based in Shenzhen, Beijing and Shanghai, with first author Ruiqiang Li. As the authors remark, this is only the second genome of the order Carnivora to be sequenced (the first was the dog). They call special attention to the method they used: "our demonstration that next-generation sequencing technology can allow accurate de novo assembly of the giant panda genome will have far-reaching implications for promoting the construction of reference sequences for other animal and plant genomes in an efficient and cost-effective way." The main drawback of the next-generation technology was the much shorter "read length" of the decoded sequences it produces (in comparison with the methods used in the human genome sequencing of the last decade). One of the tools used to overcome this limitation was an application of the de Bruijn graph algorithm, which allowed the team to handle the vexing problem of repeats: identical stretches of code which occur in different places in the genome. Since the reconstruction of the genome from the huge number of fragments is done by matching congruent stretches, the existence of repeats forces the consideration of many spurious correspondences, which have to be eliminated by hand.
Schematic illustration of the de Bruijn graph algorithm applied to genome sequencing. Here a represents a portion of the genome to be analysed, along with the fragments generated by the first step in the analysis (amplification via polymerase chain reaction). This portion contains a repeated stretch (red). The task is to reconstruct the original configuration. The possible matchings between fragments are schematized in b. Note the complication caused by the repeats. The de Bruijn algorithm consolidates all the appearances of a repeat into one, producing an oriented graph (c) with looping. The solution now will be an Eulerian circuit of this graph: one that traverses each edge at least once. Image adapted from Pevzner, Tang and Waterman, An Eulerian path approach to DNA fragment assembly, PNAS 98 (2001) 9748-9753. Those authors remark that the solution of the b graph would be a Hamiltonian circuit (visiting each vertex exactly once) and that "efficient algorithms for solving this problem are unknown." Whereas "the Eulerian path problem is easy to solve even for graphs with millions of vertices, because there exist linear-time Eulerian path algorithms."
It's February again, and Barry Cipra is reporting in Science (February 19, 2010) from the Joint Mathematics Meetings, in San Francisco this year.
A nice vignette, thanks to Ian Stewart, pops up in John Gribbin's review of Seeing Further: The Story of Science and the Royal Society (Nature, January 28, 2010). "As an example of the lack of understanding of the role of mathematics even among scientists, Stewart cites a member of NASA's Jet Propulsion Laboratory. On commenting that in the Mars Rover programme 'we don't really use any abstract algebra, group theory, and that sort', the researcher was confounded when told of the importance of finite, or Galois, fields in the channel coding used to reduce errors in the flow of information--in messages from Mars, as well as in reproducing sound from a compact disk. Scientists ought to know better. "