## Mathematics and the Genome: The Wider Picture and the Future

7. The Wider Picture and the Future

When the sequencing machines are finished with their work, whether they were starting with DNA from a mouse, bacteria, or a human, biologists are trying to find out what the active parts of the DNA do. This means locating the genes hidden in the DNA and trying to figure out what these genes do, since even when one is reasonably certain that one has a section of DNA that codes for a particular protein, it is not always apparent what the protein is for and, hence, what the gene does. If the organism being sequenced is a prokaryote, this task is not simple but, because the genome of such an organism is relatively small, and since we know what some of the function of genes in similar organisms is, the task is not totally daunting. However, for eukaryotes the situation is surprisingly challenging. Despite all we know, it is common to be clueless concerning what a stretch of DNA that we have good reason to believe is a gene actually does.

How do we locate stretches of DNA that are good candidates for being genes? Obviously, we build on elementary knowledge of basic molecular biology. Thus, if a DNA stretch codes for a protein it must begin with the start codon and terminate with an end codon. It makes sense to look for such stretches of DNA as potential candidates for genes. Remember, however, that sequencing machines are not always letter perfect and especially during the reassembly phase one may not have a 100 percent accurate reconstruction. How can one tell that a putative gene really is one? The task of gene finding has relied on a wide array of mathematical tools. Hidden markov chains, a tool which has found application in speech recognition, has proved to be a useful tool. Work in pattern recognition, probability, and database analysis has also proved useful.

No hint has been given above of other whole areas where mathematics is being used in genome related work. Examples include using graph theory, the theory of braids, and knot theory to study coiled DNA (DNA sometimes comes in circular form rather than linear strands). Differential geometry has been used to study the relation between writhe, twist, and linking number. Probabilists and statisticians are attempting to use their skills to examine questions about the evolution of the genetic code. Differential equations and partial differential equations are being used to study diffusion questions. Linear algebra, information theory, and computer graphics have found many applications. Much work is being done to put DNA to forensic uses. One area that will require many new ideas relates to protein folding. After proteins are manufactured they show a wide range of ways to do their work by the configurations they form in 3-dimensional space. Even with the fastest computers determining a protein structure is very time consuming. Speed can not compensate for the need for better insight. Mathematicians in the future will continue to work on the many unresolved issues discussed above. What is certain is that mathematicians and biologists will continue to be inspired by each others' work to obtain future insights.

Joseph Malkevitch
York College (CUNY)

Email: malkevitch@york.cuny.edu