Mathematics and the Genome
7. The Wider
Picture and the Future
When the sequencing machines are finished with their work, whether they were
starting with DNA from a mouse, bacteria, or a human, biologists are trying
to find out what the active parts of the DNA do. This means locating the genes
hidden in the DNA and trying to figure out what these genes do, since even when
one is reasonably certain that one has a section of DNA that codes for a particular
protein, it is not always apparent what the protein is for and, hence, what
the gene does. If the organism being sequenced is a prokaryote, this task is
not simple but, because the genome of such an organism is relatively small,
and since we know what some of the function of genes in similar organisms is,
the task is not totally daunting. However, for eukaryotes the situation is surprisingly
challenging. Despite all we know, it is common to be clueless concerning what
a stretch of DNA that we have good reason to believe is a gene actually does.
How do we locate stretches of DNA that are good candidates for being genes?
Obviously, we build on elementary knowledge of basic molecular biology. Thus,
if a DNA stretch codes for a protein it must begin with the start codon and
terminate with an end codon. It makes sense to look for such stretches of DNA
as potential candidates for genes. Remember, however, that sequencing machines
are not always letter perfect and especially during the reassembly phase one
may not have a 100 percent accurate reconstruction. How can one tell that a
putative gene really is one? The task of gene finding has relied on a wide array
of mathematical tools. Hidden markov chains, a tool which has found application
in speech recognition, has proved to be a useful tool. Work in pattern recognition,
probability, and database analysis has also proved useful.
No hint has been given above of other whole areas where mathematics is being
used in genome related work. Examples include using graph theory, the theory
of braids, and knot theory to study coiled DNA (DNA sometimes comes in circular
form rather than linear strands). Differential geometry has been used to study
the relation between writhe, twist, and linking number. Probabilists and statisticians
are attempting to use their skills to examine questions about the evolution
of the genetic code. Differential equations and partial differential equations
are being used to study diffusion questions. Linear algebra, information theory,
and computer graphics have found many applications. Much work is being done
to put DNA to forensic uses. One area that will require many new ideas relates
to protein folding. After proteins are manufactured they show a wide range of
ways to do their work by the configurations they form in 3-dimensional space.
Even with the fastest computers determining a protein structure is very time
consuming. Speed can not compensate for the need for better insight. Mathematicians
in the future will continue to work on the many unresolved issues discussed
above. What is certain is that mathematicians and biologists will continue to
be inspired by each others' work to obtain future insights.
Joseph Malkevitch
York College (CUNY)
Email: malkevitch@york.cuny.edu
- Introduction
- Mathematics
and Classical Genetics (The Early Days)
- Mathematics
and Clasical Genetics (1900-1953)
- Molecular
Genetics (1953-Present)
- Near
and Far (Strings)
- Near
and Far (Trees)
- The
Wider Picture and the Future
- References