### Tony Phillips' Take Blog on Math Blogs

 Mail to a friend · Print this article · Previous Columns Tony Phillips' Take on Math in the Media A monthly survey of math news

# This month's topics:

## Eulerian circuits in the giant panda genome

"The sequence and de novo assembly of the giant panda genome" appeared in the January 21 2010 Nature. The over 100 authors were led by a 5-person squad based in Shenzhen, Beijing and Shanghai, with first author Ruiqiang Li. As the authors remark, this is only the second genome of the order Carnivora to be sequenced (the first was the dog). They call special attention to the method they used: "our demonstration that next-generation sequencing technology can allow accurate de novo assembly of the giant panda genome will have far-reaching implications for promoting the construction of reference sequences for other animal and plant genomes in an efficient and cost-effective way." The main drawback of the next-generation technology was the much shorter "read length" of the decoded sequences it produces (in comparison with the methods used in the human genome sequencing of the last decade). One of the tools used to overcome this limitation was an application of the de Bruijn graph algorithm, which allowed the team to handle the vexing problem of repeats: identical stretches of code which occur in different places in the genome. Since the reconstruction of the genome from the huge number of fragments is done by matching congruent stretches, the existence of repeats forces the consideration of many spurious correspondences, which have to be eliminated by hand.

Schematic illustration of the de Bruijn graph algorithm applied to genome sequencing. Here a represents a portion of the genome to be analysed, along with the fragments generated by the first step in the analysis (amplification via polymerase chain reaction). This portion contains a repeated stretch (red). The task is to reconstruct the original configuration. The possible matchings between fragments are schematized in b. Note the complication caused by the repeats. The de Bruijn algorithm consolidates all the appearances of a repeat into one, producing an oriented graph (c) with looping. The solution now will be an Eulerian circuit of this graph: one that traverses each edge at least once. Image adapted from Pevzner, Tang and Waterman, An Eulerian path approach to DNA fragment assembly, PNAS 98 (2001) 9748-9753. Those authors remark that the solution of the b graph would be a Hamiltonian circuit (visiting each vertex exactly once) and that "efficient algorithms for solving this problem are unknown." Whereas "the Eulerian path problem is easy to solve even for graphs with millions of vertices, because there exist linear-time Eulerian path algorithms."

## The Joint Mathematics Meetings covered in Science

It's February again, and Barry Cipra is reporting in Science (February 19, 2010) from the Joint Mathematics Meetings, in San Francisco this year.

• Politics as (Un)usual: "Anna Popova, a graduate student in psychology at the University of Illinois, Urbana-Champaign, described an analysis of ranked ballots from 8 years of voting for the presidency of the American Psychological Association (APA). She and her adviser, Michel Regenwetter, found that different voting methods gave the same result much more often than not." Cipra interviewed other voting experts who questioned whether elections in professional societies (where slates are often rigged) are fair representatives of the process.
• Perfection in a Box: A perfect cuboid would be a rectangular parallelipiped such that all 3 side lengths, all three side diagonals and the main diagonal are integers. "Computers have looked at all possible bricks with edge lengths up to 10 billion without finding a single one that's perfect," according to Ezra Brown (Number Theory, VPI). But suppose we relax "rectangular"? Then there are more diagonals to worry about (13 lengths in all) but perfect ones turn out to be quite plentiful. "Clifford Reiter of Lafayette College in Easton, Pennsylvania, and Jorge Sawyer, an undergraduate at Lafayette, have found the first examples of perfect parallelepipeds." The first example, found by a computer-aided search, has sides of length 271, 106, and 103. The 271x106 face has diagonals of length 255 and 323; the 271x103 face has diagonals 312 and 266; the 106x103 face has diagonals 101 and 183.
• What Comes Next? "One of mathematicians' most beloved Web sites is getting ready for a makeover. The Online Encyclopedia of Integer Sequences, established by Neil Sloane at AT&T Labs Research in 1996 and run largely as a one-man shop, is poised to go "wiki," with 50 associate editors taking over much of the workload." If you come across a sequence of integers in your research, you look it up in the Encyclopedia. Cipra quotes Doron Zeilberger (Combinatorics, Rutgers): it is "one of the most useful tools available online for the working mathematician. It also is a great tool for determining the novelty of a new sequence. If it is not in Sloane, it is most likely to be new!" The Encyclopedia has currently over 200,000 integer sequences. "In 2009 alone, the total increased by 18,709 sequences, or more than 50 a day."

## "We don't use algebra"

A nice vignette, thanks to Ian Stewart, pops up in John Gribbin's review of Seeing Further: The Story of Science and the Royal Society (Nature, January 28, 2010). "As an example of the lack of understanding of the role of mathematics even among scientists, Stewart cites a member of NASA's Jet Propulsion Laboratory. On commenting that in the Mars Rover programme 'we don't really use any abstract algebra, group theory, and that sort', the researcher was confounded when told of the importance of finite, or Galois, fields in the channel coding used to reduce errors in the flow of information--in messages from Mars, as well as in reproducing sound from a compact disk. Scientists ought to know better. "

Tony Phillips
Stony Brook University
tony at math.sunysb.edu