4.4 Adobe Glyph List
In the previous article we recommended that glyphs should be named in accordance with the standard set and maintained by Adobe. What is this standard and how does it work?
Adobe Glyph List
Adobe maintains this standard in two public GitHub repositories under their adobe-type-tools umbrella:
The various files in these two repos work together and should be thought of as one standard — and that standard is often called, simply, the AGL.
How does it work?
There are actually number of components involved, specifying:
- allowed form for glyph names, including
- what letters are allowed
- permitted length
- a complete list of names that PostScript® interpreters/engines are guaranteed to recognize.
- a subset of the above that is recommended for new fonts.
- a general mechanism to generate names for every character that is, or ever will be, in the Unicode standard.
- how to combine names to denote ligatures
- how to denote glyph variants (swash, small caps, etc)
Requirements for all glyph names
At the minimum, to conform to AGL requirements, a glyph name:
- can be no longer than 31 characters, and
- must consist only of characters from the following set:
- . (period, U+002E FULL STOP)
- _ (underscore, U+005F LOW LINE)
In a font project, working glyph names should at least meet these two minimum requirements.
Glyph names can be thought of as having two parts, which we will call basename and suffix. These parts are identified as follows:
- If the glyph name does not contain a period (
.) then the entire name is the basename.
- If the glyph name contains at least one period (
.), then the first period is the separator: everything before it is the basename and everything after it is the suffix.
The suffix can be about anything — its purpose is to identify variant glyphs. For example a
.swsh suffix might be used to indicate a swash variant and
.smcp used to indicate small cap.
NB: Some modern tools such as the GlyphsApp font editor understand commonly used suffixes and will automatically build font smarts for them.
For working names the basename can be about anything. For production names, however, it is essential that the basename be constructed so that it identifies the Unicode character(s) that the glyph represents. This mapping, from glyph name to Unicode character sequence, is the essence of the AGL specification.
Historically, in its own fonts, Adobe has used a lot of names that are no longer recommended. For this reason there are several different glyph lists in the AGL. For new fonts, we recommend using only the names that are in the Adobe GLyph List for New Fonts [AGLFN].
Basename for arbitrary Unicode character(s)
What if a needed basename is not included in the AGLFN? In this case a special naming convention using the Unicode Scalar Value [USV] of the character(s) should be used.
Characters in Unicode’s Basic Multilingual Plane (BMP) may be represented by either of the formats
uni<CODE>. Characters in Unicode’s supplemental planes may be represented only by the format
u<CODE>. <CODE> is the Unicode Scalar Value of the character, an uppercase hexadecimal number four to six digits long. There must be no leading zeros, unless the code value would have fewer than four hexadecimal digits, in which case it must be padded to four digits. Surrogate code values (U+D800 to U+DFFF, inclusive) and the two noncharacter code values (U+FFFE and U+FFFF) are prohibited.
Caution: while both the
u<CODE> notations are likely to be supported in all modern tools, there may be older applications that do not recognize the
u<CODE> names. For that reason, for BMP characters, we recommend using
Ligature or other decomposition sequences that contain only BMP characters may be represented by either of the following formats:
- Underscore-separated: In this format, the underscore (
_) separates component names. Component names may be AGL,
uni<CODE>names. For example:
- Code-concatenated: In this format, the glyph name is expressed as
unifollowed by two or more BMP <CODE>s, which indicate the code values of the components. <CODE> follows the same specification as for
uni<CODE>names. For example,
uni12345678represents <U+1234, U+5678>.
Ligature or other decomposition sequences that contain a supplemental character may be represented only by the underscore-separated format. For example:
No two glyph names in a font should yield the same (non-variant) Unicode character on analysis. If they do (e.g.
uni1234), the results are unspecified.
|Glyph name||Unicode characters(s)|
||UNRECOGNIZED (zero padding required for < 4 digits)|
||UNRECOGNIZED (zero padding not allowed for 5 digits)|
||UNRECOGNIZED (U+FFFE not allowed)|
||UNRECOGNIZED (U+FFFF not allowed)|
||U+1023, U+4510, U+6789|
||UNRECOGNIZED (<CODE> must be <= 0x10FFFF)|
||UNRECOGNIZED (surrogates not allowed)|