Mathematics and the Genome
Mathematics and the Genome:
Molecular Genetics (1950-Present)
Feature Column Archive
4. Molecular Genetics
(1953-Present)
Given the short history of the evolution of molecular genetics, the support mathematics
has provided has been equally dramatic. Intriguingly, history has repeated itself
to some extent in this saga: new methods to deal with fast developing problems
are being found, but there has also been an available reservoir of tools developed
without specific reference to the needs of molecular genetics that are proving
valuable off the shelf. Many of the ways that mathematics has been put to use
is that DNA can be thought of as a string made up of an alphabet of 4 letters,
and a protein can be thought of as a string whose letters are the amino acids
which make it up.
Problems that have attracted mathematical interest include:
1. Finding an exact match for a given string within a long string.
2. Finding an approximate match for a given string within a long string.
3. Finding the best match for a string in a list of strings with respect to some
optimization criterion.
4. Finding the best approximate match for a string in a list of strings with respect
to some optimization criterion.
5. Finding the largest common string in a list of strings.
6. Given a set of strings with information about the overlap properties of these
strings, to reconstruct a string which has the original strings as substrings.
7. Given a set of strings, find the shortest string which has the given strings
as a substring.
8. Given a set of strings, find the shortest string which has the given strings
or the reversals of these strings as a substring.
9. Given a pair of strings, what is the distance between them (see next section).
10. Finding trees that help represent relationships between objects.
Some of these problems have been shown to have polynomial time complexity but
others have been shown to be NP-complete. Since the problems that biologists are
interested in are quite large, computers have been used from the very early days.
However, the implications in some of the problems listed for having approximate
answers in a biological setting are more problematical than in some other situations.
For example, if one is designing a spell checker, one wants to locate strings
in a dictionary that are close to a string (which presumably is not a true word)
which is not in the dictionary. The consequence of not finding the word that the
typist meant to enter is rather different from reaching the wrong conclusion about
relationships between species.
Methods exist for taking a long stretch of DNA whose structure is not known and
cleaving it at particular sites. One then needs to sequence the resulting pieces
and to be able to reconstruct the original DNA from these pieces. In the traditional
method of sequencing, one locates particular stretches of DNA as being associated
with known sections of DNA from specific chromosomes. In the so-called shotgun
method the whole genome is in small pieces and one reconstructs the structure
of all of the chromosomes from the small pieces. Many people have been involved
in various parts of the designing of algorithms for reconstruction of long stretches
of DNA from shorter pieces. However, the work of E. W. Myers is especially noteworthy
in promoting this approach, whose success from a biological viewpoint is still
being debated. However, much new has been learned in a mathematical sense.
- Introduction
- Mathematics
and Classical Genetics (The Early Days)
- Mathematics
and Clasical Genetics (1900-1953)
- Molecular
Genetics (1953-Present)
- Near
and Far (Strings)
- Near
and Far (Trees)
- The
Wider Picture and the Future
- References