Determine the Character Set for Encoding

The suitable character set for decoding command output is determined at runtime. The multi-lingual solution is based on the following facts and assumptions:

  1. It is possible to determine the OS language in a locale-independent way, for example, by running the chcp command on Windows or the locale command on Linux.

  2. Relation Language-Encoding is well known and can be defined statically. For example, the Russian language has two of the most popular encoding: Cp866 and Windows-1251.

  3. One character set for each language is preferable, for example, the preferable character set for Russian language is Cp866. This means that most of the commands produce output in this encoding.

  4. Encoding in which the next command output is provided is unpredictable, but it is one of the possible encoding for a given language. For example, when working with a Windows machine with a Russian locale, the system provides the ver command output in Cp866, but the ipconfig command is provided in Windows-1251.

  5. A known command produces known key words in its output. For example, the ipconfig command contains the translated form of the IP-Address string. So the ipconfig command output contains IP-Address for the English OS, for the Russian OS, IP-Adresse for the German OS, and so on.

Once it is discovered in which language the command output is produced (# 1), possible character sets are limited to one or two (# 2). Furthermore, it is known which key words are contained in this output (# 5).

The solution, therefore, is to decode the command output with one of the possible encoding by searching for a key word in the result. If the key word is found, the current character set is considered the correct one.

Parent topic: Support Localization in Jython Adapters