Peder A. Olsen

Research Staff Mamber
IBM, T. J. Watson Research Center


I work in the Human Language Technology (HLT) group at IBM's T.J. Watson Research Center, located in Westchester County, north of New York City. The group invents technology to do speech recognition, speaker identification, speech synthesis and natural language understanding. It has approximately 70 members, with most everyone having a Ph.D. A variety of fields are represented: Physics, Electrical Engineering, Computer Science, Mathematics, and Phonetics. In contrast to a university mathematics department where everyone has their own specialty, everyone in the HLT department has worked or is working on the same problems that I am working on.

My role in the department has been to develop algorithms and ideas to improve the quality of automatic speech recognition. This includes coming up with mathematical formulation and solution, implementing the concept and testing it thoroughly in multiple environments. One thing I like about working in industry, is that it allows me to see the connection between mathematics and its applications. I love mathematics for the shear fun of it and the reward of solving real world problems is an added incentive to working in industry.

All our systems are based on statistical modeling techniques and many mathematical problems are central to the workings of a speech recognition system. Examples are density estimation in high dimensions, clustering of points in high dimensional spaces, signal processing from acoustics to phonetic features, linear discriminant analysis, smoothing and interpolation of discrete and continuous probability distributions, maximum entropy modeling and so forth. My department participates in multiple government sponsored evaluations where the leading edge of speech recognition is developed. The keys to success in these evaluations are new mathematical techniques, good software engineering and massive compute power.

Some recent mathematical innovations from my department are: Use of the Bayesian Information Criteria to segment acoustic and cluster acoustically similar environments (i.e. by speakers, phone vs. microphone, etc.); Linear Discriminant Analysis and Factor Analysis are used to find and model similarities in covariance structures between acoustic features of different sounds; Mixtures of power exponentials are used to improve the statistical models for each sound. Several successful mathematical techniques that were pioneered for use in speech recognition and are now used worldwide came out of IBM's Human Language Technology group. Examples are maximum entropy language modeling and the use of Hidden Markov Models in modeling acoustics.

I have a master's degree from the Norwegian Institute of Technology and a Ph.D. from the University of Michigan. I found the position at T.J. Watson through IBM's job posting on the web and have been here for four years. I started as a post doc and am now a Research Staff Member.

Some courses I recommend for students wanting to pursue speech recognition are probability, statistics, linear analysis, functional analysis and computer programming courses (algorithms, C++ programming). I also encourage students to try out a summer-internship. The pay is good and it's a great way to make connections. Also the problems are usually very interesting and well prepared.


Question and Answer Forum for Peder A. Olsen


    Return to Archived Profiles and Forums