
The Linux Console Tools

Yann Dirson

   August 19th, 1999
     _________________________________________________________________
   
Status of this document

   This is an introduction to the Linux Console Tools package. You should
   refer to the manpages for more details.
     _________________________________________________________________
   
Other documents

   The Linux Console Tools [1]WWW site may contain additionnal
   informations, latest news, and such.
   
   Files in the doc/contrib/ directory are unsupported, and may be
   obsolete, but are provided just in case someone needs them.
   
   README.{acm,sfm,keytables} give some info on the respective included
   data files.
   
   kbd.FAQ.* is the Console and Keyboard HOWTO by Andries Brouwer, as
   included in kbd 0.97. It would need some corrections and updates,
   though. I'm thinking about taking relevant info from this and add it
   here.
     _________________________________________________________________
   
What the Linux Console Tools are

   The Linux Console Tools are a set of programs allowing the user to
   setup/customize your console (restricted meaning: text mode screen +
   keyboard only). It is derived from version 0.94 of the kbd package,
   and has benefited from most features introduced in kbd until version
   0.97.
   
   The Linux Console Tools are still under development, but using it just
   as a replacement for kbd should be quite safe, at it fixes many bugs
   kbd has. Further more, at the same time that new features are
   introduced in the current development version, a stable version is
   updated with bug fixes.
     _________________________________________________________________
   
Understanding the big picture of the console

   As of Linux 2.0, the console driver is made of 2 sub-drivers: the
   keyboard driver, and the screen driver. Basically, the keyboard driver
   sends characters to your application, then the application does its
   own job, and sends to the screen driver the characters to be
   displayed.
     _________________________________________________________________
   
What is Unicode

   Traditionnaly, character encodings use 8 bits, and thus are limited to
   256 characters. This causes problems because:
   
    1. it's not enough for some languages;
    2. people speaking languages using different encodings have to choose
       one encoding to use, and have to switch the system's state when
       changing the language, which makes it difficult to mix several
       languages in the same file;
    3. etc...
       
   Thus the UCS (Universal Character Set), also know as Unicode was
   created to handle and mix all of our world's scripts. This is a 32-bit
   (4 bytes) encoding, otherwise known as UCS4 because of the size of its
   characters, which is normalised by ISO as the 10646-1 standard. The
   most widely used characters from UCS are contained in the UCS2 16-bit
   subset of UCS; this is the subset used by the Linux console.
   
   For convenience, the UTF encoding was designed as a variable-length
   encoding with ASCII backward-compatibility; all chars that have a UCS4
   encoding can be expressed as a UTF sesquence, and vice-versa.
   
   [2]The Unicode consortium defines additional properties for UCS2
   characters, also known as Unicode characters.
   
   See: unicode(7), utf-8(7).
     _________________________________________________________________
   
Understanding and setting up the keyboard driver

How it works

   The keyboard driver is made up several levels:
   
     * the keyboard hardware, which turns the user's finger moves into
       so-called scancodes (Disclaimer: this is not really part of the
       software driver itself; no support is provided for bugs in this
       domain ;-). An event (key pressed or released) generates from 1 to
       6 scancodes.
     * a mechanism turning scancodes into keycodes using a
       translation-table which you can access with the getkeycodes(8) and
       setkeycodes(8) utilities. You will only need to look at that if
       you have some sort of non-standard (or programmable ?) keys on
       your keyboard. AFAIK, these keycodes are the same among a set of
       keyboards sharing the same hardware, but differing in the symbols
       drawn on the keys.
     * a mechanism turning keycodes into characters using a keymap. You
       can access this keymap using the loadkeys(1) and dumpkeys(1)
       utilities.
       
   The keyboard driver can be in one of 4 modes (which you can access
   using kbd_mode(1)), which will influence what type of data
   applications will get as keyboard input:
   
     * the scancode (K_RAW) mode, in which the application gets scancodes
       for input. It is used by applications that implement their own
       keyboard driver. For example, X11 does that.
     * the keycode (K_MEDIUMRAW) mode, in which the application gets
       information on which keys (identified by their keycodes) get
       pressed and released. AFAIK, no real-life application uses this
       mode, but it is useful to helper programs like showkey(1) to
       assist keymap designers.
     * the ASCII (K_XLATE) mode, in which the application effectively
       gets the characters as defined by the keymap, using an 8-bit
       encoding. In this mode, the Ascii_0 to Ascii_9 keymap symbols
       allow to compose characters by giving their decimal 8bit-code, and
       Hex_0 to Hex_F do the same with (2-digit) hexadecimal codes.
     * the Unicode (K_UNICODE) mode, which at this time only differs from
       the ASCII mode by allowing the user to compose UTF8 unicode
       characters by their decimal value, using Ascii_0 to Ascii_9 (who
       needs that ?), or their hexadecimal (4-digit) value, using Hex_0
       to Hex_9. A keymap can be set up to produce UTF8 sequences (with a
       U+XXXX pseudo-symbol, where each X is an hexadecimal digit), but
       be warned that these UTF8 sequences will also be produced even in
       ASCII mode. I think this is a bug in the kernel.
       
   BE WARNED that putting the keyboard in RAW or MEDIUMRAW mode will make
   it unusable for most applications. Use showkey(1) to get a demo of
   these special modes, or to find out what scancodes/keycodes are
   produced by a specific key.
     _________________________________________________________________
   
See also

   keytables(5), setleds(1), setmetamode(1).
     _________________________________________________________________
   
Understanding and setting up the screen driver

Unicode is everywhere

Screen Font Maps

   In recent kernels (at least since 2.0.x), the screen driver is based
   on 16-bit unicode (UCS2) encoding, which means that every console-font
   loaded should be defined using a unicode Screen Font Map (SFM for
   short), which tells, for each character in the font, the list of UCS2
   characters it will render. [3][1]
     _________________________________________________________________
   
SFM Fallback tables

   Starting with release 1997.11.13 of the Linux Console Tools,
   consolechars(8) now understands SFM fallback tables. Before that,
   SFM's should contain at the same time the Unicode of the characters it
   was primarily meant to render, as well as any approximations the user
   would like to. These fallback tables allow to only put the primary
   mappings in the SFM provided with the font-file, and to separatelykeep
   a list telling ``if no glyph for that character is available in the
   current font, then try to display it with the glyph for this one, or
   else the one for that one, or ...''. This permits to keep in one only
   place all possible fallbacks, and everyone will be able to choose
   which fallback tables (s)he wants. Have a look at
   data/consoletrans/*.fallback for examples.
   
   A fallback-table file is made of fallback entries, each entry being on
   its own line. Empty lines, and lines beginning with the # comment
   character are ignored.
   
   A fallback entry is a series of 2 or more UCS2 codes. The first one is
   the character for which we want a glyph; the following ones are those
   whose glyph we want to use when no glyph designed specially for our
   character is available. The order of the codes defines a priority
   order (own glyph if available, then second char's, then the third's,
   etc.)
   
   If a SFM was to be loaded, fallback mappings are added to this map
   before it is loaded. If there was not (ie. a font without SFM was
   loaded, and no --sfm option was given to consolechars, or the
   --force-no-sfm option was given), then the current SFM is requested
   from the kernel, the fallback mappings are added, and the resulting
   SFM is loaded back into the kernel.
   
   Note that each fallback entry is checked against the original SFM, not
   against the SFM we get by adding former fallback entries to the
   original SFM (the one read from a file, or given by the kernel); this
   applies even to entries in different files, and thus the order of -k
   options has no effect. If you want some entries to be influenced by
   previous ones, you will have to use different fallback files, and to
   load them with several consecutive invocations of consolechars -k.
     _________________________________________________________________
   
The unicode screen-mode

   There are basically 2 screen-modes (byte mode and UTF mode). The
   simpler to explain is the UTF mode, in which the bytes received from
   the application (ie. written to the console screen) are interpreted as
   UTF8 sequences, which are converted in the [4]the section called What
   is Unicode, and then looked-up in the SFM to determine the glyphs used
   to display each character.
   
   Switching to and from UTF mode is done by sending to the screen the
   escape sequences <ESC>%G and <ESC>%@ respectively. You may use the
   unicode_start(1) and unicode_stop(1) scripts instead, as they also
   change the keyboard mode, and let you optionally change the
   screen-font.
   
   Use vt-is-UTF8(1) to find out whether active VT is in UTF mode.
     _________________________________________________________________
   
The byte screen-mode

   The byte mode is a bit more complicated, as it uses an additional map
   to transform the byte-characters sent by the application into UCS2
   characters, which are then treated as told above. This map I call the
   Application Charset Map (ACM), because it defines the encoding the
   application uses, but it used to be called a ``screen map'', or
   ``console map'' (this comes from the time where the screen driver
   didn't use Unicode, and there was only one Map down there).
   
   Although there is only one ACM active at a given time, there are 4 of
   them at any time in the kernel; 3 of them are built-in and never
   change, and they define the IBM codepage 437 (the i386's default, and
   thus the kernel's default even on other archs), the DEC VT100 charset,
   and the ISO latin1 charset; the 4th is user-definable, and defaults on
   boot to the ``straight to font'' mapping, decribed below under
   ``Special UCS2 codes''.
   
   The consolechars(1) command can be used to change the ACM, as well as
   the font and its associated SFM.
     _________________________________________________________________
   
Charset slots

   The Linux Console Driver has 2 slots for charsets, labeled G0 and G1.
   Each of these slots contains a reference to one of the 4 kernel ACMs,
   3 of which are predefined to provide the cp437, iso01, and vt100
   graphics charsets. The 4th one is user-definable; this is the one you
   can set with consolechars --acm and get with consolechars --old-acm.
   
   Versions of the Linux Console Tools prior to 1998.08.11, as well as
   all versions of kbd at least until 0.96a, were always assuming you
   wanted to use the G0 slot, pointing to the user-defined ACM. You can
   now use the charset utility to tune your charset slots.
   
   You will note that, although each VT has its own slot settings, there
   is only one user-defined ACM for use by all the VTs. That is, whereas
   you can have tty1 using G0=cp437 and G1=vt100, at the same time as
   tty2 using G0=iso01 and G1=iso02 (user-defined), you cannot have at
   the same time tty1 using iso02 and tty2 using iso03. This is a
   limitation of the linux kernel.
   
   Note that you can emulate such a setting using the filterm utility,
   with your console in UTF8-mode, by telling filterm to translate screen
   output on-the-fly to UTF8.
   
   You'll find filterm in the konwert package, by Marcin Kowalczyk, which
   is available from [5]his WWW site.
     _________________________________________________________________
   
Special UCS2 codes

   There are special UCS2 values you should care about, but the present
   list is probably not exhaustive:
   
     * codes C from U+F000 to U+F1FF are not looked-up in the SFM, and
       directly accesses the character in font-position C & 0x01FF (yes,
       a font can be 512-chars on many hardware platforms, like VGA).
       This is refered to as the straight to font zone.
     * code U+FFFD is the replacement character, usually at font-position
       0 in a font. It is displayed by the kernel each time the
       application requested a unicode character that is not present in
       the SFM. This allows not only the driver to be safe in Unicode
       mode, but also prevents displaying invalid characters when the ACM
       on a particular VT contains characters not in the current font !
     _________________________________________________________________
   
About the old 8-bit ``screen maps''

   There was a time where the kernel didn't know anything about Unicode.
   In this ancient time, Application Charset Maps did not exist. Instead
   we had Font-charset maps (what they called ``screen maps''), and just
   mapped the application's characters into font positions. The file
   format used for these 8bit FCM's is still supported for backward
   compatibility, but should not be used any more.
   
   The FCM mechanism didn't know about unicode, so the FCM had to depend
   not only on the charset, but also on the current font. Now, as each VT
   chooses its own ACM (from the 4 ones in the kernel at a given time),
   and as the console-font is common to all VT's, we can use a charset
   even if the font can't display all of its characters; it will then
   display the replacement character (U+FFFD).
   
   Note that in Linux 2.2.x using framebuffer devices, you can even load
   a font per VT.
     _________________________________________________________________
   
See also

   psfaddtable(1), psfgettable(1), psfstriptable(1), showcfont(1).
     _________________________________________________________________
   
Font files

The formats

   The primary font file format for the Linux Console Tools, as of
   version 0.2.x, is the PSF format, which is also used by kbd. 0.3.x
   will introduce the XPSF format, which will be able to replace all
   existing file formats.
   
   Raw fonts can be converted into PSF files with the font2psf(1)
   (written by Martin Lohner, SuSE GmbH).
   
   Versions 0.2.x do not have support for the CP format again - this
   comes back in the 0.3.x development branch.
     _________________________________________________________________
   
Tools

Font-files manipulation tools

   The psfaddtable(1), psfgettable(1), and psfstriptable(1) tools are
   provided by the Linux Console Tools for manipulation of the SFM
   embedded in PSF files. These are the only font-file manipulation tools
   provided by the Linux Console Tools as of version 0.2.x. The
   font2psf(1) tool is available in the contrib directory to convert old
   raw fonts into PSF fonts.
   
   There are plans for a more generic font-conversion tool based on
   libcfont. It will be mostly trivial to write once work on libcfont
   will be advanced enough.
   
   The only way provided by the Linux Console Tools to display a font's
   contents is to load it, and then to display it using showcfont(1).
     _________________________________________________________________
   
Font editors

   I do not curently know of a good font-editor suitable for editing
   console fonts. I tried fonter, but this one has a bad design flaw: you
   can only properly edit cp437 fonts (or maybe ASCII-based fonts if you
   like unreadable screens) because it works on the console and loads the
   font you are editing. I was told about cse which I did not tried yet.
   Marcin Kowalczyk is working on a tool in his [6]fonty package (which I
   did not check yet either), which will help font designers, but is not
   AFAIK a real editor. Robert de Bath works on [7]his own tools which
   handle a variety of file formats and table formats.
     _________________________________________________________________
   
The libraries

   There are several shared libraries installed by the Linux Console
   Tools. They were at first meant just to share code between the various
   utilities (kbd has lots of duplicated code), but they could be used as
   a base to build new tools.
   
   However, they are not yet ready for production use (hence the version
   number 0.0.0), and are not complete yet. The are however more coherent
   today than they used to be not so long ago.
   
   Stability of the libraries' structure and APIs is a major requirement
   for the upcoming 1.0 release.
   
   Figure 1. Theoretical stacking of the libs
+----------------------------+
|           Tools            |
|      +------------+        |
|      | libctutils |        |
+------+---+    +---+--------+
| libcfont |    | libconsole |
+----------+----+------------+
|       libctgeneric         |
+----------------------------+

     _________________________________________________________________
   
libctgeneric

   This library contains various generic facilities used throughout the
   other libs and tools. They are of general interest and I hope most
   this stuff will one day make its way to an existing general purpose
   utility-library. Any offers welcomed.
   
   These facilities include:
   
     * File-finding function using path and suffixes, featuring
       auto-decompression using multiple compressor programs,
       magic-number-based filetype identification.
     * Unicode data type and utility functions.
     * Safety wrappers around standard C functions.
     * An stdio wrapper emulating fseek() on non-seekable IO streams (eg.
       pipes) when meaningful.
     _________________________________________________________________
   
libconsole

   This library is a collection of:
   
     * wrappers around the kernel-level functionnalities, which should be
       as kernel-version-independant as reasonable;
     * higher-level interfaces to these functionnalities.
       
   Currently, the following APIs are provided:
   
     * Font loading and retrieving (to/from data structures)
     * SFM loading and retrieving (to/from data structures)
     * ACM loading and retrieving (to/from files)
     * Console manipulation
     * Keysym manipulation (no influence on console itself; should be put
       somewhere else, libctutils seems a good candidate).
     _________________________________________________________________
   
libcfont

   This library is meant to provide a high-level interface to
   console-font file-handling. It also exports the lower-level functions
   used to construct higher-level ones.
   
   It defines two basic "font objects" allowing to handle in a somewhat
   object-oriented way simple fonts (one set of glyphs) and font groups
   (several set of glyphs, representing the same set of characters at
   different sizes).
   
   For now, only font-file reading is implemented at such a high level.
   Functions for reading/writing most parts of supported are provided,
   but the high-level writing glue is not there yet. Existing glue in
   consolechars(8) will probably be used as a code base.
     _________________________________________________________________
   
libctutils

     * Tools offering a glue between the two libs (findfile-lct.c)
     * SFM utilies (sfm-*.c)
     _________________________________________________________________
   
The future of the console driver and of the Linux Console Tools

   The Linux Console Tools were derived from kbd. It is not a good thing
   to have two distinct distributions for these tools, so we once hoped
   we'd manage to finally merge the two packages back, together with
   Andries Brouwer, who still maintains kbd. However, due to the lack of
   technical cooperation from kbd's maintainer, and to the growing gap
   with kbd, this project is now on hold. Now that major distributions
   like RedHat 6.0 and Debian 2.2 (not released yet) have moved from kbd
   to the Linux Console Tools, the latter seem to have finally obsoleted
   its famous ancestor.
   
   The driver in 2.2.x kernel has been reworked a lot, and it seems it
   will continue to evolve in 2.3.x. There are already some new features,
   such as fonts with width != 8, which will be supported in the future
   (Note: support for arbitrary width is implemented in the Linux Console
   Tools but mostly untested at this date).
   
   There is an ongoing project, known as GGI (for General Graphical
   Interface), which is in the process of, among other things,
   revolutionarize the way the console is handled. Have a look at
   [8]their WWW site for details.
   
   As far as possible, I will try to keep the Linux Console Tools in sync
   with what is developped for the kernel, and to what gets added to new
   releases of kbd but I have to look better at the current state of the
   GGI project before I give any more info.
   
  Notes
  
   [9][1]
   
   SFM's were formerly called ``Unicode Map'', or ``unimap'' for short,
   but this term should be dropped, as now what they called ``screen
   maps'' uses Unicode as well: it probably confused many many people

References

   1. http://www.multimania.com/ydirson/en/lct/
   2. http://unicode.org/
   3. file://localhost/tmp/@6136.2#FTN.AEN130
   4. file://localhost/tmp/@6136.2#SEC-UNICODE
   5. http://kki.net.pl/qrczak/programy/linux/konwert/
   6. http://kki.net.pl/qrczak/programy/linux/fonty/
   7. http://www.cix.co.uk/~mayday/font.tgz
   8. http://www.ggi-project.org/
   9. file://localhost/tmp/@6136.2#AEN130
