Math Awareness Month - April 2002

Microarrays, Mathematics, and Medicine

[an error occurred while processing this directive]

How much math will people doing medical research need to know in the coming decades? A safe answer would be "a lot more than they need to know now."

The rapidly developing area of microarrays is just one instance of why this is going to happen. Microarrays are a relatively new invention that let scientists measure something that they could not measure before.

The DNA of each cell contains the instructions for all of the proteins that cells somewhere in the body might need to manufacture. The actual manufacturing is done using messenger RNA. Some proteins are made by all cells, but many are made only in certain places in the body, or only in certain circumstances. When two individuals have different DNA in the sequence that encodes that protein, they may make different versions of that protein, or one person may make the protein and another may not.

What a microarray does is to measure how much of messenger RNA of a given type is being made in a sample of tissue at a given moment. This gives a good idea of how much of the corresponding protein is being made. A single microarray "chip" can do this for around 15,000 proteins at one time. Experiments using microarrays produce an enormous amount of data.

This microarray represents the correlation of 81 mRNA samples with respect to the expression of 4682 genes in a lymphoma study.
Image and abstract courtesy of S. Dudoit, J. Fridlyand and T. Speed.

How might a microarray be used? Cancer researchers have compared microarray data for tumor samples from a variety of patients. The word "cancer" is used to describe many quite different diseases, each involving some form of breakdown in the control of cellular machinery. Each tumor involves an individualized set of mutations that have made its cellular machinery go off-track and have led to uncontrolled cell division. A microarray reading of how much of each of 15,000 proteins are being produced in the tumor cells gives a "signature" of the tumor. What researchers are trying to do is to use these signatures to diagnose what kind of tumor it is. In some cases, they have succeeded in doing this.

In deciding which of the medical treatments available to use in treating a tumor, oncologists must first diagnose what kind of tumor it is. In some cases, this is easy. But there are many tumors that look the same under a microscope that behave differently when treated -- some respond well to chemotherapy, some do not. The hope is that the extra information available in the microarray signature of the tumor might provide new clues about how the tumor is going to behave. There have been some tantalizing positive results in this directionsome tumors that have similar microarray signatures also behave similarly under treatment.

Where does mathematics come into this picture? The signature of each tumor is 15,000 numbers. A fruitful way to think of this is that the signature is a point in a 15,000-dimensional space. The data from, say, 50 tumor samples gives 50 points in this enormous space. What researchers are looking for is whether these 50 points fall into clusters -- groups of tumors that are in some sense close to each other in this big space. No one can visualize that many dimensions, and most of them are usually irrelevant in trying to classify the tumors. Statistics comes to the rescue in the form of a collection of algorithms for finding clusters. It is still an open mathematical problem what is the best way to find clusters, but the methods we already have are enough to detect clusters in many situations. In some cases, clusters of tumors detected in this way turned out to behave similarly under treatment.

This opens the door to a totally new way for oncologists to diagnose tumors, and to make what is often a life-or-death decision about what treatment to use. Microarrays are only one among a rapidly emerging set of technologies that allow researchers to measure quantities that could not be measured before. Mountains of data are being generated by new experiments, and this poses an opportunity and a challenge -- how do we analyze this data and make the most of it?

Effectively and efficiently solving these problems involves heavy use of high-speed computation. This requires wisely designed algorithms, which are a set of instructions for how to carry out the computations. Because of the vast quantity of data produced by microarray experiments, all methods of analyzing the data use computers and the people who do the analysis need to be quite ingenious in designing strategies to sift through all of this information.

What people who work in this area are hoping is that it will lead to an era of individualized medicine. Your doctor will have your DNA sequence available on the computer. When you are sick, he or she might use microarrays or another technology to diagnose the problem and to predict how you as an individual are likely to respond to the treatments available.

If this dream is realized, those who work as health care providers will need to be more sophisticated about mathematics and statistics, and those who do medical research will need to know even more. New partnerships between experts in mathematics and statistics and medical researchers will develop.

Already, some universities have begun to offer graduate programs to train people who will be able to do this. This rapidly expanding area goes by a variety of names -- computational biology and bioinformatics are among the most common. The curriculum includes a knowledge of molecular biology and a knowledge of mathematics. People who know both of these fields are currently in great demand to work in universities, medical research, and biotech startup companies.

[an error occurred while processing this directive]