genlayouttbl(1)
NAME
genlayouttbl - generate layout table for complex text layout
SYNOPSIS
genlayouttbl [-o outfile] [infile]
DESCRIPTION
The genlayouttbl utility accepts a locale's layout defini-
tion in a flat text file and writes a binary layout table
file that can be used in the complex text layout of the
locale.
OPTIONS
The following option is supported:
-o outfile
Writes output binary layout table to the outfile.
OPERANDS
The following operand is supported:
infile
A path name of an input file. If no input file is
specified, genlayouttbl reads from the standard input
stream.
OUTPUT AND SYMBOLIC LINKS
If no outfile is specified, genlayouttbl writes output to
the standard output stream.
The generated output file must be moved to the following
directory prior to the use at the system and the file name
should be layout.dat:
/usr/lib/locale/locale/LO_LTYPE/layout.dat
The locale should also have a symbolic link,
/usr/lib/locale/locale/LO_LTYPE/locale.layout.so.1, to the
32-bit Universal Multiscript Layout Engine (UMLE),
/usr/lib/locale/common/LO_LTYPE/umle.layout.so.1.
For proper 64-bit platform operations, the locale should
also have a symbolic link, as for instance, in 64-bit SPARC
platform,
/usr/lib/locale/locale/LO_LTYPE/sparcv9/locale.layout.so.1,
to the 64-bit UMLE,
/usr/lib/locale/common/LO_LTYPE/sparcv9/umle.layout.so.1.
The locale is the locale that you want to provide and to use
the layout functionality you defined.
INPUT FILE FORMAT
A layout definition file to genlayouttbl contains three dif-
ferent sections of definitions:
o Layout attribute definition
o Bidirectional data and character type data definition
o Shaping data definition
For appropriate complex text layout support, all three sec-
tions need to be defined in the layout definition file.
The Lexical Conventions
The following lexical conventions are used in the layout
definition:
NAME A string of characters that consists of printable
ASCII characters. It includes DECIMAL and HEXADECIMAL
also. Examples: test, a1_src, b32, 123.
HEXADECIMAL_BYTE
Two-digit hexadecimal number. The number starts with a
hexadecimal digit followed by another hexadecimal
digit. Examples: e0, E1, a7, fe.
HEXADECIMAL
A hexadecimal number. The hexadecimal representation
consists of an escape character, '0' followed by the
constant 'x' or 'X' and one or more hexadecimal
digits. Examples: 0x0, 0x1, 0x1a, 0xA, 0x1b3.
DECIMAL
A decimal number, represented by one or more decimal
digits. Examples: 0, 123, 2165.
Each comment must start with '#'. The comment ends at the
end of the line.
The following keywords are reserved:
active_directional, active_shape_editing, AL,
ALGORITHM_BASIC, ALGORITHM_IMPLICIT, AN, BN, check_mode,
context, CONTEXT_LTR, CONTEXT_RTL, CS, EN, END, ES, ET, FALSE,
FILE_CODE_REPRESENTATION, implicit_algorithm, keep, L,
LAYOUT_ATTRIBUTES, LAYOUT_BIDI_CHAR_TYPE_DATA,
LAYOUT_SHAPE_DATA, LRE, LRO, MODE_EDIT, MODE_STREAM, NSM,
national_numerals, numerals, NUMERALS_CONTEXTUAL,
NUMERALS_NATIONAL, NUMERALS_NOMINAL, ON, orientation,
ORIENTATION_CONTEXTUAL, ORIENTATION_LTR, ORIENTATION_RTL,
ORIENTATION_TTBLR, ORIENTATION_TTBRL, PDF,
PROCESS_CODE_REPRESENTATION, PS, R, repeat*, repeat+, RLE, RLO, S,
shape_charset, shape_charset_size, shape_context_size, swapping,
SWAPPING_NO, swapping_pairs, SWAPPING_YES, TEXT_EXPLICIT,
TEXT_IMPLICIT, TEXT_NOMINAL, TEXT_SHAPED, text_shaping, TEXT_VISUAL,
TRUE, type_of_text, WS
Additionally, the following symbols are also reserved as
tokens:
( ) [ ] , : ; ... = -> +
Layout Attribute Definition
The layout attribute definition section defines the layout
attributes and their associated values.
The definition starts with a keyword, LAYOUT_ATTRIBUTES, and
ends with END LAYOUT_ATTRIBUTES:
LAYOUT_ATTRIBUTES
# Layout attributes here.
:
:
END LAYOUT_ATTRIBUTES
There are a total of eight layout attribute value trios that
can be defined in this section:
o orientation
o context
o type_of_text
o implicit_algorithm
o swapping
o numerals
o text_shaping
o shape_context_size
Additionally, there are five layout attribute value pairs
that also can be defined in this section:
o active_directional
o active_shape_editing
o shape_charset
o shape_charset_size
o check_mode
Each attribute value trio will have an attribute name, an
attribute value for the input buffer, and an attribute value
for the output buffer, as in the following example:
# Orientation layout attribute value trio. The input and output
# attribute values are separated by a colon and the left one
# is the input attribute value:
orientation ORIENTATION_LTR:ORIENTATION_LTR
Each attribute value pair will have an attribute name and an
associated attribute value, as in the following example:
# Shape charset attribute value pair:
shape_charset ISO8859-6
The orientation value trio defines the global directional
text orientation. The possible values are:
ORIENTATION_LTR
Left-to-right horizontal rows that progress from top
to bottom.
ORIENTATION_RTL
Right-to-left horizontal rows that progress from top
to bottom.
ORIENTATION_TTBRL
Top-to-bottom vertical columns that progress from
right to left.
ORIENTATION_TTBLR
Top-to-bottom vertical columns that progress from left
to right.
ORIENTATION_CONTEXTUAL
The global orientation is set according to the direc-
tion of the first significant (strong) character. If
there are no strong characters in the text and the
attribute is set to this value, the global orientation
of the text is set according to the value of the
attribute context. This value is meaningful only for
bidirectional text.
If no value or value trio is defined, the default is
ORIENTATION_LTR.
The context value trio is meaningful only if the attribute
orientation is set to ORIENTATION_CONTEXTUAL. It defines
what orientation is assumed when no strong character appears
in the text. The possible values are:
CONTEXT_LTR
In the absence of characters with strong directional-
ity in the text, orientation is assumed to be left-
to-right rows progressing from top to bottom.
CONTEXT_RTL
In the absence of characters with strong directional-
ity in the text, orientation is assumed to be right-
to-left rows progressing from top to bottom.
If no value or value trio is specified, the default is
CONTEXT_LTR.
The type_of_text value trio specifies the ordering of the
directional text. The possible values are:
TEXT_VISUAL
Code elements are provided in visually ordered seg-
ments, which can be rendered without any segment
inversion.
TEXT_IMPLICIT
Code elements are provided in logically ordered seg-
ments. Logically ordered means that the order in which
the characters are provided is the same as the order
in which the characters are pronounced when reading
the presented text or the order in which characters
would be entered from a keyboard.
TEXT_EXPLICIT
Code elements are provided in logically ordered seg-
ments with a set of embedded controls. Some examples
of such embedded controls from ISO/IEC 10646-1 are:
LEFT-TO-RIGHT EMBEDDING (LRE)
RIGHT-TO-LEFT EMBEDDING (RLE)
RIGHT-TO-LEFT OVERRIDE (RLO)
LEFT-TO-RIGHT OVERRIDE (LRO)
POP DIRECTIONAL FORMAT (PDF)
If no value or value trio is specified, the default is
TEXT_IMPLICIT.
The implicit_algorithm value trio specifies the type of
bidirectional implicit algorithm used in reordering and
shaping of directional or context-dependent text. The
possible values are:
ALGORITHM_IMPLICIT
Directional code elements will be reordered using an
implementation-defined implicit algorithm.
ALGORITHM_BASIC
Directional code elements will be reordered using a
basic implicit algorithm defined in the Unicode stan-
dard.
Even though we allow two different values for the
implicit_algorithm, since the Solaris implementation-defined
implicit algorithm is based on the Unicode standard, there
is no difference in behavior whether you choose
ALGORITHM_IMPLICIT or ALGORITHM_BASIC for this attribute.
The default value is ALGORITHM_IMPLICIT.
The swapping value trio specifies whether symmetric swapping
is applied to the text. The possible values are:
SWAPPING_YES
The text conforms to symmetric swapping.
SWAPPING_NO
The text does not conform to symmetric swapping.
If no value or value trio is specified, the default is
SWAPPING_NO.
The numerals value trio specifies the shaping of numerals.
The possible values are:
NUMERALS_NOMINAL
Nominal shaping of numerals using the Arabic numbers
of the portable character set (in Solaris, ASCII
digits).
NUMERALS_NATIONAL
National shaping of numerals based on the script of
the locale. For instance, Thai digits in the Thai
locale.
NUMERALS_CONTEXTUAL
Contextual shaping of numerals depending on the con-
text script of surrounding text, such as Hindi numbers
in Arabic text and Arabic numbers otherwise.
If no value or value trio is specified, the default is
NUMERALS_NOMINAL.
The text_shaping value trio specifies the shaping; that is,
choosing (or composing) the correct shape of the input or
output text. The possible values are:
TEXT_SHAPED
The text has presentation form shapes.
TEXT_NOMINAL
The text is in basic form.
If no value or value trio is specified, the default is
TEXT_NOMINAL for input and TEXT_SHAPED for output.
The shape_context_size value trio specifies the size of the
context (surrounding code elements) that must be accounted
for when performing active shape editing. If not defined,
the default value 0 is used for the number of surrounding
code elements at both front and rear:
# The shape_context_size for both front and rear surrounding code
# elements are all zero:
shape_context_size 0:0
The front and rear attribute values are separated by a
colon, with the front value to the left of the colon.
The active_directional value pair specifies whether the
current locale requires (bi-)directional processing. The
possible values are:
TRUE Requires (bi-)directional processing.
FALSE Does not require (bi-)directional processing.
The active_shape_editing value pair specifies whether the
current locale requires context-dependent shaping for
presentation. The possible values are:
TRUE Requires context-dependent shaping.
FALSE Does not require context-dependent shaping.
The shape_charset value pair specifies the current locale's
shape charset on which the complex text layout is based.
There are two different kinds of shape charset values that
can be specified:
o A single shape charset
o Multiple shape charsets
For a single shape charset, it can be defined by using NAME
as defined in the Lexical Convention section above. For mul-
tiple shape charsets, however, it should follow the syntax
given below in extended BNF form:
multiple_shape_charset
: charset_list
;
charset_list : charset
| charset_list ';' charset
;
charset : charset_name '=' charset_id
;
charset_name : NAME
;
charset_id : HEXADECIMAL_BYTE
;
For instance, the following is a valid multiple shape char-
sets value for the shape_charset attribute:
# Multi-shape charsets:
shape_charset tis620.2533=e4;iso8859-8=e5;iso8859-6=e6
The shape_charset must be specified.
The shape_charset_size value pair specifies the encoding
size of the current shape_charset. The valid value is a
positive integer from 1 to 4. If the multiple shape charsets
value is defined for the shape_charset attribute, the
shape_charset_size must be 4.
The shape_charset_size must be specified.
The check_mode value pair specifies the level of checking of
the elements in the input buffer for shaping and reordering
purposes. The possible values are:
MODE_STREAM
The string in the input buffer is expected to have
valid combinations of characters or character ele-
ments.
MODE_EDIT
The shaping of input text may vary depending on
locale-specific validation or assumption.
When no value or value pair is not specified, the default
value is MODE_STREAM.
Bidirectional Data And Character Type Data Definition
This section defines the bidirectional and other character
types that will be used in the Unicode Bidirectional Algo-
rithm and the shaping algorithm part of the UMLE.
The definition starts with a keyword
LAYOUT_BIDI_CHAR_TYPE_DATA and ends with END
LAYOUT_BIDI_CHAR_TYPE_DATA:
LAYOUT_BIDI_CHAR_TYPE_DATA
# Layout bidi definitions here.
:
:
END LAYOUT_BIDI_CHAR_TYPE_DATA
The bidirectional data and character type data definition
should be defined for the two different kinds of text shape
forms, TEXT_SHAPED and TEXT_NOMINAL, depending on the
text_shaping attribute value and also for the two different
kinds of text representations, file code representation and
process code representation (that is, wide character
representation):
LAYOUT_BIDI_CHAR_TYPE_DATA
FILE_CODE_REPRESENTATION
TEXT_SHAPED
# TEXT_SHAPED bidi and character type data
# definition in file code representation here.
:
:
END TEXT_SHAPED
TEXT_NOMINAL
# TEXT_NOMINAL bidi and character type data
# definition in file code representation here.
:
:
END TEXT_NOMINAL
END FILE_CODE_REPRESENTATION
PROCESS_CODE_REPRESENTATION
TEXT_SHAPED
# TEXT_SHAPED bidi and character type data
# definition in process code representation here.
:
:
END TEXT_SHAPED
TEXT_NOMINAL
# TEXT_NOMINAL bidi and character type data
# definition in process code representation here.
:
:
END TEXT_NOMINAL
END PROCESS_CODE_REPRESENTATION
END LAYOUT_BIDI_CHAR_TYPE_DATA
Each bidi and character type data definition can have the
following definitions:
o Bidirectional data type definition
o swapping_pairs character type definition
o national_numerals character type definition
There are nineteen different bidirectional data types that
can be defined, as in the following table:
Keyword Category Description
L Strong Left-to-right
LRE Strong Left-to-right embedding
LRO Strong Left-to-right override
R Strong Right-to-left
AL Strong Right-to-left
RLE Strong Right-to-left embedding
RLO Strong Right-to-left override
PDF Weak Pop directional format
EN Weak European number
ES Weak European number separator
ET Weak European number terminator
AN Weak Arabic number
CS Weak Common number separator
PS Separator Paragraph separator
S Separator Segment separator
WS Neutral White space
ON Neutral Other neutrals
NSM Weak Non-spacing mark
BN Weak Boundary neutral
If not defined in this section, the characters belong to the
other neutrals type, ON.
Each keyword list above will be accompanied by one or more
HEXADECIMAL ranges of characters that belong to the bidirec-
tional character type. The syntax is as follows:
bidi_char_type : bidi_keyword ':' range_list
;
bidi_keyword : 'L'
| 'LRE'
| 'LRO'
| 'R'
| 'AL'
| 'RLE'
| 'RLO'
| 'PDF'
| 'EN'
| 'ES'
| 'ET'
| 'AN'
| 'CS'
| 'PS'
| 'S'
| 'WS'
| 'ON'
| 'NSM'
| 'BN'
;
range_list : range
| range_list ',' range
;
range : HEXADECIMAL
| HEXADECIMAL '...' HEXADECIMAL
;
For example:
# Bidi character type definitions:
L: 0x26, 0x41...0x5a, 0xc380...0xc396, 0xe285a0...0xe28682
WS: 0x20, 0xc2a0, 0xe28080...0xe28086
The swapping_pairs specifies the list of swappable charac-
ters if SWAPPING_YES is specified as a value at the swapping
value trio. The syntax of the swapping_pairs is as follows:
swapping_pair_list : swapping_keyword ':' swap_pair_list
;
swapping_keyword : 'swapping_pairs'
;
swap_pair_list : swap_pair
| swap_pair_list ',' swap_pair
;
swap_pair : '(' HEXADECIMAL ',' HEXADECIMAL ')'
For example:
# Swapping pair definitions:
swapping_pairs: (0x28, 0x29), (0x7b, 0x7d)
The national_numerals specifies the list of national digits
that can be converted as the numerals value trio specifies.
The syntax of the national_numerals is as follows:
numerals_list : numerals_keyword ':'
numerals_list ';' contextual_range_list
;
numerals_keyword : 'national_numerals'
;
numerals_list : '(' zero ',' one ',' two ',' three ','
four ',' five ',' six ',' seven ','
eight ',' nine ')'
zero : HEXADECIMAL
;
one : HEXADECIMAL
;
two : HEXADECIMAL
;
three : HEXADECIMAL
;
four : HEXADECIMAL
;
five : HEXADECIMAL
;
six : HEXADECIMAL
;
seven : HEXADECIMAL
;
eight : HEXADECIMAL
;
nine : HEXADECIMAL
;
contextual_range_list
: contextual_range
| contextual_range_list ',' contextual_range
;
contextual_range : HEXADECIMAL
| HEXADECIMAL '...' HEXADECIMAL
:
For instance:
# National numerals definition. The national number that will
# replace Arabic number 0 to 9 is 0, 0x41, 0x42, and so on.
# The contextual surrounding characters are 0x20 to 0x40 and
# 0x50 to 0x7f:
national_numerals:
(0x0, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49)
; 0x20...0x40, 0x50...0x7f
Unless NUMERALS_CONTEXTUAL is the value of the numerals
attribute, the contextual range list definition is meaning-
less.
Shaping Data Definition
The shaping data definition section defines the context-
dependent shaping rules that will be used in the shaping
algorithm of the UMLE.
The definition starts with a keyword, LAYOUT_SHAPE_DATA, and
ends with END LAYOUT_SHAPE_DATA:
LAYOUT_SHAPE_DATA
# Layout shaping data definitions here.
:
:
END LAYOUT_SHAPE_DATA
The shaping data definition should be defined for the two
different kinds of text shape forms, TEXT_SHAPED and
TEXT_NOMINAL, depending on the text_shaping attribute value
and also for the two different kinds of text representa-
tions, file code representation and process code representa-
tion (that is, wide character representation:
LAYOUT_SHAPE_DATA
FILE_CODE_REPRESENTATION
TEXT_SHAPED
# TEXT_SHAPED shaping data definition in file code
# representation here.
:
:
END TEXT_SHAPED
TEXT_NOMINAL
# TEXT_NOMINAL shaping data definition in file code
# representation here.
:
:
END TEXT_NOMINAL
END FILE_CODE_REPRESENTATION
PROCESS_CODE_REPRESENTATION
TEXT_SHAPED
# TEXT_SHAPED shaping data definition in process code
# representation here.
:
:
END TEXT_SHAPED
TEXT_NOMINAL
# TEXT_NOMINAL shaping data definition in process
# code representation here.
:
:
END TEXT_NOMINAL
END PROCESS_CODE_REPRESENTATION
END LAYOUT_SHAPE_DATA
Each shaping data definition consists of one or more of the
shaping sequence definitions. Each shaping sequence defini-
tion is a representation of a series of state transitions
triggered by an input character and the current state at
each transition.
The syntax of the shaping sequence definition is as follows:
shaping_sequence : initial_state '+' input '->' next_state_list
;
initial_state : '()'
;
input : HEXADECIMAL
;
next_state_list : next_state
| next_state_list '+' input '->' next_state
| '(' next_state_list '+' input ')' 'repeat+'
| '(' next_state_list '+' input ')' 'repeat*'
;
next_state : '(' out_buffer ',' in2out ',' out2in ','
property ')'
;
out_buffer : '[' out_char_list ']'
;
out_char_list : HEXADECIMAL
| '(' HEXADECIMAL ')' 'repeat+'
| out_char_list ';' HEXADECIMAL
;
in2out : '[' i2o_list ']'
;
i2o_list : DECIMAL
| '(' DECIMAL ')' 'repeat+'
| i2o_list ';' DECIMAL
;
out2in : '[' o2i_list ']'
;
o2i_list : DECIMAL
| '(' DECIMAL ')' 'repeat+'
| o2i_list ';' DECIMAL
;
property : '[' prop_list ']'
;
prop_list : HEXADECIMAL
| '(' HEXADECIMAL ')' 'repeat+'
| prop_list ';' HEXADECIMAL
;
For example, the following shaping sequences can be defined:
# A simple shaping sequence:
() + 0x21 ->
( [0x0021], [0], [0;0], [0x80] ) + 0x22 ->
( [0x0021;0x0022], [0;1], [0;0;1;1], [0x80;0x80] ) + 0xc2a0 ->
( [0x0021;0x0022;0xe030], [0;1;2], [0;0;1;1;2;2],
[0x80;0x80;0x80] )
# A repeating shaping sequence:
() + 0x21 ->
(
( [0x0021], [0], [0;0], [0x80] ) + 0x22 ->
( [0x0021;0x0022], [0;1], [0;0;1;1], [0x80;0x80] ) + 0xc2a2
) repeat+
The first example shows a shaping sequence such that if
0x21, 0x22, and 0xc2a0 are the input buffer contents, it
will be converted into an output buffer containing 0x0021,
0x0022, and 0xe030; an input to the output buffer containing
0, 1, and 2; an output to the input buffer containing 0, 0,
1, 1, 2, and 2; and a property buffer containing 0x80, 0x80,
and 0x80.
The second example shows a repeating shaping sequence where,
if the first input code element is 0x21, then the second and
third input code elements are 0x22 and 0xc2a2, respectively.
EXIT STATUS
The following exit values are returned:
0 No errors occurred and the output file was success-
fully created.
1 Command line options are not correctly used or unknown
command line option specified.
2 Invalid input or output file specified.
3 The layout definitions not correctly defined.
4 No more system resource error.
6 Internal error.
FILES
/usr/lib/locale/common/LO_LTYPE/umle.layout.so.1
The Universal Multiscript Layout Engine for 32-bit
platforms.
/usr/lib/locale/common/LO_LTYPE/sparcv9/umle.layout.so.1
The Universal Multiscript Layout Engine for 64-bit
SPARC platform.
/usr/lib/locale/common/LO_LTYPE/ia64/umle.layout.so.1
The Universal Multiscript Layout Engine for 64-bit
Intel platform.
/usr/lib/locale/locale/LO_LTYPE/layout.dat
The binary layout table file for the locale.
ATTRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
____________________________________________________________
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|_____________________________|_____________________________|
| Availability | SUNWglt |
|_____________________________|_____________________________|
SEE ALSO
m_create_layout(3LAYOUT), m_destroy_layout(3LAYOUT),
m_getvalues_layout(3LAYOUT), m_setvalues_layout(3LAYOUT),
m_transform_layout(3LAYOUT), m_wtransform_layout(3LAYOUT),
attributes(5), environ(5)
International Language Environments Guide
Unicode Technical Report #9: The Bidirectional Algorithm
from http://www.unicode.org/unicode/reports/
Man(1) output converted with
man2html