axslFont: Font Encoding

Contents

Encoding Selection

Type 1 fonts can address a maximum of about 224 glyphs at one time (256 is the maximum that can be addressed with 1 byte, and the first 32 are considered control codes). However, a Type 1 font can contain many more glyphs than that. This is possible because a different set of glyphs can be addressed by selecting different encoding vectors. When using a Type 1 font, you may select which encoding vector should be used by specifying the "encoding" attribute on a font-description element in the font-configuration.

Support of various encoding schemes is application dependent. However, the following named encoding vectors are hard-coded, well-documented encoding schemes that many applications will make available:

Encoding Vector Name Predefined Encoding For
AdobeStandardEncoding (can also be referenced as "StandardEncoding"). Closely matches ISO 8859-1. PostScript
ISOLatin1Encoding PostScript
WinAnsiEncoding (matches Microsoft Windows code page 1252). PDF
SymbolEncoding n/a
ZapfDingbatsEncoding n/a
CEEncoding (matches Microsoft Windows code page 1250). n/a
MacRomanEncoding PDF
PDFDocEncoding n/a
MacExpertEncoding PDF
ExpertEncoding n/a
ExpertSubsetEncoding n/a
MacStandardEncoding (this is the "Standard Macintosh Ordering" that is used within some TrueType fonts).

Note that selection of an encoding that is not predefined for the output medium will result in the need to write that encoding into the output document. The only drawbacks to this are the (probably unnoticable) performance penalty for writing the encoding into the output, and the somewhat larger output itself.

Custom Encodings

Custom encodings can also be defined in the font-configuration file. Once that is done, they are available to be used in font-description entries just like the predefined encodings.

To create a custom encoding, first place the data defining the encoding into a text file. (The files to create the predefined encodings can be cut-and-pasted from PostScript and PDF documentation). The format of the file containing a custom encoding is best documented by providing an example. Here is an example from the Latin encodings for PostScript:

#                         -- Encoding  --
# Character Name          STD   ISO   CE
# --------------          ---   ---   ---

A                         101   101   101
AE                        341   306   -
Aacute                    -     301   301

Each line of the file is a record. Each record consists of two or more fields delimited by whitespace. The first field is the glyph name. All additional fields are glyph indexes to which the glyph name points within specific encoding. Multiple encodings can be described within a single encoding file by adding more than two columns. The illustration above includes encoding information for "StandardEncoding", "ISOLatin1Encoding", and "CEEncoding", which share enough common characters to make it useful to place them in the same file. Blank lines and line starting with an octothorp (#) are ignored. When processing a specific column of the file, a hyphen in that column serves to indicate that the line should be ignored for that encoding.

The glyph index value may be specified using any radix. The examples above are in octal (base- or radix-8). Encoding schemes are also sometimes expressed in hexadecimal (radix-16) or decimal (radix-10). When creating a custom encoding in the font-configuration, you must specify the radix that should be used for parsing.

After creating the encoding file, make it available to your application by creating an encoding element in the font-configuration file:

<encoding
    name="my-encoding"
    glyph-lists="my-glyph-list another-glyph-list"/
    file="file://C:/some/path/encoding.txt"
    column-to-parse="2"
    radix="8">

Glyph List Selection

In order to create an EncodingVector instance, an application needs to have some way of mapping a glyph name to a Unicode code point. A "glyph list" is used to define this mapping. Adobe provides two standard glyph lists, both of which are readily available, and possibly hard-coded into your application. If so, they are available by reference to their names: "AGL" (the Adobe Glyph List), and "ZapfDingbats". Unless the font you are using has a very unusual character set, these should be sufficient for creation of custom encodings.

Custom Glyph Lists

If a custom glyph list is needed (to handle an unusual character set), first create the list defining the glyph list. The format of this file is best documented by providing an excerpt from the Adobe Glyph List:

B;0042
Bcircle;24B7

Each line of the file is a record. Each record consists of two fields delimited by a semicolon. The first field is the glyph name. The second is the hexadecimal value of the Unicode code point to which that glyph name is pointed. Blank lines and line starting with an octothorp (#) are ignored.

The custom glyph list is then made available to your application by creating a glyph-list element in the font-configuration file:

<glyph-list
    name="my-glyph-list"
    file="file://C:/some/path/my-glyph-list.txt"/>