Date: 13 March 1998
To: ISO/IEC JTC1/SC2/WG2
Unicode Technical Committee
From: STIX Project of the STIPUB Consortium
(a consortium of scientific societies
and scientific/technical publishers)
Subject: Request for assignment of codes to mathematical and technical
symbols that do not appear in ISO/IEC 10646 or Unicode 2.0
The members of the STIPUB Consortium are all active publishers of
mathematical, scientific, and technical books and journals. The
following organizations and representatives have contributed to the
STIX project:
American Mathematical Society (AMS)
Barbara Beeton (bnb@ams.org)
Patrick D. F. Ion (ion@math.ams.org)
American Institute of Physics (AIP)
Chris Hamlin (chamlin@aip.org)
American Physical Society (APS)
Arthur Smith (apsmith@aps.org)
American Chemical Society (ACS)
Joe Yurvati (jyurvati@cas.org)
Institute of Electrical and Electronic Engineers (IEEE)
Ira Polans (i.polans@ieee.org)
Elsevier Science
Nico Poppelier (n.poppelier@elsevier.nl)
Fred Veldmeijer (f.veldmeijer@elsevier.nl)
J. Friederich (j.friederich@elsevier.nl)
Wolfram Research Institute
AMS is taking the lead in the STIX effort; Ralph Youngen (rey@ams.org),
Director for Electronic Product Development, is the AMS liaison with
STIPUB, and the two active participants have experience both with the font
requirements of math publishing and in standards work in that area.
Barbara Beeton was, until last year, the AMS representative to ISO/IEC
JTC1/SC18/WG8; that working group has since been reorganized as JTC1/WG4.
Patrick Ion is co-chair of the HTML-Math Working Group of the World Wide
Web Consortium (W3C).
The charter of the STIX project is to create one comprehensive set of
fonts for scientific and technical publishing. This set of fonts should
be adopted and supported by all major STM publishers, and will also be
made available for general use under license but free of charge, with the
explicit aim to ease and foster the uninhibited flow, exchange, and linking
of scientific information.
The rationale behind this is that availability of a universal font set
will benefit scientific and technical publishing. For example, it will
eliminate certain legal problems with distributing PDF files and publishing
on the World Wide Web, and will ease the exchange of documents from
different publishers.
Scientific communication and publication via the Web are currently hindered
by the absence both of suitable symbol fonts and of recognized methods of
indicating particular symbols and their relationships to one another.
The font problems of ordinary text, which are considerable irrespective of
language, have so far been addressed essentially only by the introduction
of the ISO 10646/Unicode standard. The special problems of handling
technical texts have been examined by the W3C HTML-Math Working Group,
and their MathML proposal, which is interdependent with this request,
has recently been put forward to W3C as a Proposed Recommendation [see
http://www.w3.org/TR/PR-Math]. The work of the HTML-Math WG is also
related to the work of the OpenMath consortium.
The STIX group has agreed that a suitable font set should contain at least:
1 Latin alphabet (for instance Times) in 2 shapes (upright, italic)
and 2 weights (medium, bold), including suitable punctuation
2 Greek alphabet that blends well with 1 in 2 shapes (upright, "italic")
3 Cyrillic alphabet that blends well with 1
4 Sans serif Latin alphabet (for instance Helvetica) in 2 shapes
(upright, italic) and 2 weights (medium, bold), including suitable
punctuation
5 Letters a-z A-Z, numerals 0-9, and delimiters () [] {} in openface
6 Letters a-z A-Z in script
7 Letters a-z A-Z in fraktur
8 Mathematical and technical symbols of ISO 10646/Unicode vers. 2.0
9 Additional mathematical and technical symbols found in use by STIX
We have also agreed that the best basis for the organization of such a font
set would be ISO 10646/Unicode. Some arguments in favor of ISO 10646/Unicode
are: it is the basis for XML, and therefore for MathML, and it is the
character set of the programming language Java and the operating system
Windows NT.
In XML documents, and most importantly for use in MathML, we need to be
able to identify all characters, either by numerical character reference or
by entity reference. But numerical character references are ISO 10646/Unicode
numbers, since that is the character set underlying XML. If entity
names are used, they must still be mapped to something that applications
will be able to handle and render.
In the documentation that accompanies this request, the following
information is provided:
1. The "names" by which a symbol is known, and STIX participants
reporting it in their current symbol sets
2. The symbol "class":
- N: normal or ordinary, e.g. symbol used as a variable
- A: alphabetic; subclass of ordinary
- D: diacritic
- P: punctuation
- B: binary operator, e.g. a + b
- R: relation, e.g. a = b
- L: large operator, e.g. sum, product
- O: opening delimiter
- C: closing delimiter
3. A description of the symbol
4. A sample glyph
5. ISO 10646/Unicodes of possibly related symbols or symbols with
similar shape
The documentation we are sending consists of two tables of symbols that the
STIX team have identified as in use in their publishing production but not
in ISO 10646/Unicode 2.0. The first contains the mathematical and technical
symbol characters already found in ISO 10646/Unicode. The second contains
the symbols for which we could find no existing codes. The development of
these lists by a group of widely separated collaborators has been done using
the Web. The tables are available online, and the glyph samples look best
at 72dpi screen resolution. The printed forms are necessarily not as
effective. The high-quality fonts intended have naturally not yet been made.
We believe that all possible valid associations have been made between the
symbols in our combined collections and existing ISO 10646/Unicodes, and
that the residue are good candidates for new codes. We would be happy to
work with people from WG2 and the Unicode group to find the right blocks
and ranges in to which to put new codes, for the assignments in the private
zones we have used for processing are in no way a suggestion as to the
numbering sequences to be used. We also welcome suggestions for the text
of the formal ISO 10646/Unicode names, which are not yet present, from
someone who has more experience with the naming conventions than we do.
Whether or not our proposals for additions to ISO 10646/Unicode are
accepted, the the scientific community will continue using these symbols.
What this could mean is that the STIX project will have to put the rejected
symbols into the private zone of ISO 10646/Unicode, which defeats the whole
purpose of this exercise. Wolfram Research has already done this with
their symbol set, and if more users do it independently, the initiative
that ISO 10646/Unicode has taken to tame Babel will not have worked for
the world of scientific communication.