(One in a series of six articles on Mathematics and Medicine being distributed by the Joint Policy Board for Mathematics in celebration of Mathematics Awareness Week 1994.)
Twenty years ago, mathematician Michael Waterman became interested in biology and also learned of a parlor game. The game went something like this: Given a certain word, what are the fewest steps required to transform the word into another word, if governed by particular letter substitutions, insertions, or deletions? (For example, the word WORM can be changed to the word WORSE in two steps: WORM, WORE, WORSE.) This game caused Waterman to consider DNA strands, which, over time, can similarly change and evolve -- and whose structures are often spoken of as combinations of "words" and "letters."
DNA strands comprise thousands of joined molecules known as nucleotides. Certain groups of these molecules perform specific functions and are called genes. Residing in the nuclei of cells, genes contain codes or instructions for manufacturing key substances (proteins) for the body. By coding for proteins, genes determine the chains of events that produce the body's traits -- from blue eyes, to blond hair, to allergies. When genes are passed on through generations, traits are passed on as well. When genes are altered, traits are altered, sometimes with harmful results. Interestingly, sections of DNA often break and reunite, shuffling the order of genes. And because such substitutions have mathematical probabilities, certain aspects of DNA structure can be deduced by knowing the probabilities. Here is a case of mathematics, especially statistics, meeting biology.
There are many potential applications of such knowledge. One of Waterman's current lines of research promises to help biologists who are locating new genes. Waterman uses statistical analysis to compare new genes and corresponding proteins with previously studied proteins catalogued in a database. (Researchers routinely enter data about DNA proteins into a common database.) If a new gene matches an already characterized sequence, then effort can be saved, as researchers exploit prior knowledge in their study of the new gene.
"It's as if you are trying to detect plagiarisms," says Waterman, a professor at the University of Southern California. In other words, the idea is to establish whether a new gene resembles a previously studied sequence so closely that it can be considered to be the same sequence. This is trickier than it sounds. Genes are large and complex, information about them can be sketchy, and parts of different genes can be similar. However, statistical procedures can increase the certainty that two segments are identical. "It's the statistics of coincidence," says Waterman. "You'll find pieces that look alike because of randomness. But how do you tell biology from coincidence?" It turns out that you can determine that two segments are probably identical by applying some fairly simple mathematics.
Sam Karlin, a Stanford University professor, applies this mathematics of coincidence to genetic research in a related way. He searches for patterns in the molecules strung together in DNA strands. "The goal of this 'molecular sequence analysis' is to make sense out of all those sequences that are being discovered by the biological community," says Karlin. His reference is to the search by biologists for genes in human DNA.
"Given a DNA sequence that has been determined, we are interested in any anomalies," says Karlin. "Are certain types of segments distributed uniformly? Or are they concentrated in certain regions -- clustering? Are certain special [types of segments] too frequent or too rare?" The point is that discovery of patterns in DNA strands would help genetic researchers, including those looking for genes.
Like Waterman, Karlin also develops statistical methods and
algorithms -- mathematical rules -- to match genetic sequences.
The research of these and other mathematicians influences many
aspects of genetic research. Another example of the use of the
"statistics of coincidence," is in genetic linkage studies, which
determine where genes reside in DNA and what they do in the human
body. Family studies of inherited genes and traits rely on
statistical analysis to determine whether genes and traits are
inherited together and linked causally.
[an error occurred while processing this directive]