charmap(5)




NAME

     charmap - character set description file


DESCRIPTION

     A character set description file or charmap defines  charac-
     teristics for a coded character set. Other information about
     the coded character set may also be in the file. Coded char-
     acter  set character values are defined using symbolic char-
     acter names followed by character encoding values.

     The character set description file provides:

        o  The capability to describe  character  set  attributes
           (such   as   collation  order  or  character  classes)
           independent of character set encoding, and using  only
           the  characters  in  the  portable character set. This
           makes  it  possible  to  create  generic  localedef(1)
           source  files for all codesets that share the portable
           character set.

        o  Standardized symbolic names for all characters in  the
           portable character set, making it possible to refer to
           any such character regardless of encoding.

  Symbolic Names
     Each symbolic name  is included in the file and is mapped to
     a  unique  encoding  value  (except for those symbolic names
     that are shown with identical glyphs). If the control  char-
     acters  commonly  associated  with the symbolic names in the
     following table are supported  by  the  implementation,  the
     symbolic  names  and their corresponding encoding values are
     included in the file. Some of the encodings associated  with
     the  symbolic names in this table may be the same as charac-
     ters in the portable character set table.

     ________________________________________________________________________
    |   <ACK>       <DC2>       <ENQ>        <FS>        <IS4>       <SOH>  |
    |   <BEL>       <DC3>       <EOT>        <GS>        <LF>        <STX>  |
    |   <BS>        <DC4>       <ESC>        <HT>        <NAK>       <SUB>  |
    |   <CAN>       <DEL>       <ETB>       <IS1>        <RS>        <SYN>  |
    |   <CR>        <DLE>       <ETX>       <IS2>        <SI>        <US>   |
    |   <DC1>       <EM>        <FF>        <IS3>        <SO>        <VT>   |
    |_______________________________________________________________________|

  Declarations
     The following declarations can precede the character defini-
     tions.  Each must consist of the symbol shown in the follow-
     ing list, starting in column 1,  including  the  surrounding
     brackets, followed by one or more blank characters, followed
     by the value to be assigned to the symbol.

     <code_set_name>
           The name of the coded  character  set  for  which  the
           character set description file is defined.

     <mb_cur_max>
           The maximum number of bytes in a multi-byte character.
           This defaults to 1.

     <mb_cur_min>
           An unsigned positive integer value  that  defines  the
           minimum number of bytes in a character for the encoded
           character set.

     <escape_char>
           The escape character used to indicate that the charac-
           ters  following  will be interpreted in a special way,
           as defined later in this  section.  This  defaults  to
           backslash  (\thinsp;),  which  is  the character glyph
           used in all the following text  and  examples,  unless
           otherwise noted.

     <comment_char>
           The character that when placed in column 1 of a  char-
           map  line,  is used to indicate that the line is to be
           ignored.  The default character  is  the  number  sign
           (#).

  Format
     The character set mapping definitions will be all the  lines
     immediately  following  an  identifier  line  containing the
     string CHARMAP starting in column 1, and preceding a trailer
     line containing the string END CHARMAP starting in column 1.
     Empty lines and lines containing  a  <comment_char>  in  the
     first  column  will be ignored. Each non-comment line of the
     character set mapping definition (that is, between the CHAR-
     MAP  and END CHARMAP lines of the file) must be in either of
     two forms:

          "%s %s %s\n",<symbolic-name>,<encoding>,<comments>

     or

          "%s...%s   %s    %s\n",<symbolic-name>,<symbolic-name>,
          <encoding>,<comments>

     In the first format, the line in the character  set  mapping
     definition  defines a single symbolic name and a correspond-
     ing encoding. A character following an escape  character  is
     interpreted  as  itself;  for  example,  the  sequence <\i\>
     represents  the  symbolic  name  \  enclosed  between  angle
     brackets.

     In the second format, the line in the character set  mapping
     definition defines a range of one or more symbolic names. In
     this form, the symbolic names must consist of zero  or  more
     non-numeric characters,
      followed by an  integer  formed  by  one  or  more  decimal
     digits. The characters preceding the integer must be identi-
     cal in the two symbolic names, and the integer formed by the
     digits  in  the  second  symbolic  name  must be equal to or
     greater than the integer formed by the digits in  the  first
     name.  This  is  interpreted  as  a series of symbolic names
     formed from the common part and each of the integers between
     the  first and the second integer, inclusive. As an example,
     <j0101>...<j0104>  is  interpreted  as  the  symbolic  names
     <j0101>, <j0102>, <j0103>, and <j0104>, in that order.

     A character set mapping definition line must exist  for  all
     symbolic  names  and  must  define the coded character value
     that corresponds to the character  glyph  indicated  in  the
     table,  or  the  coded character value that corresponds with
     the control character symbolic name. If the control  charac-
     ters  commonly  associated with the symbolic names  are sup-
     ported by the implementation,  the  symbolic  name  and  the
     corresponding  encoding  value must be included in the file.
     Additional unique symbolic names may be  included.  A  coded
     character value can be represented by more than one symbolic
     name.

     The encoding part is expressed as one (for single-byte char-
     acter values) or more concatenated decimal, octal or hexade-
     cimal constants in the following formats:

               "%cd%d",<escape_char>,<decimal byte value>

               "%cx%x",<escape_char>,<hexadecimal byte value>

               "%c%o",<escape_char>,<octal byte value>

  Decimal Constants
     Decimal constants  must  be  represented  by  two  or  three
     decimal  digits,  preceded  by  the escape character and the
     lower-case letter d; for example, \d05, \d97, or \d143. Hex-
     adecimal  constants  must  be represented by two hexadecimal
     digits, preceded by the escape character and the  lower-case
     letter  x; for example, \x05, \x61, or \x8f. Octal constants
     must be represented by two or three octal  digits,  preceded
     by the escape character; for example, \05, \141, or \217. In
     a portable charmap file, each constant must represent an  8-
     bit  byte.  Implementations  supporting other byte sizes may
     allow constants to represent values larger than  those  that
     can  be  represented in 8-bit bytes, and to allow additional
     digits in constants. When  constants  are  concatenated  for
     multi-byte  character values, they must be of the same type,
     and interpreted in byte order from first to  last  with  the
     least significant byte of the multi-byte character specified
     by the last constant.

  Ranges of Symbolic Names
     In lines defining ranges  of  symbolic  names,  the  encoded
     value  is the value for the first symbolic name in the range
     (the symbolic name preceding the ellipsis). Subsequent  sym-
     bolic  names  defined by the range will have encoding values
     in increasing order. For example, the line

     <j0101>...<j0104>     \d129\d254

     will be interpreted as:

     <j0101>                \d129\d254
     <j0102>                \d129\d255
     <j0103>                \d130\d0
     <j0104>                \d130\d1

     Note that this line will be interpreted as the example  even
     on  systems  with  bytes  larger than 8 bits. The comment is
     optional.


SEE ALSO

     locale(1)   localedef(1)   nl_langinfo(3C)    extensions(5),
     locale(5)


Man(1) output converted with man2html