ltguessencoding — Try to determine the encoding of a file


ltguessencoding [ filename ]


ltguessencoding attempts to determine the encoding of a file. The name of the file may be given as an argument, otherwise standard input is used. The programs prints one of the following to standard output:


if the file contains no bytes with values greater than 127.


if the file contains bytes greater than 127, and all are in legal UTF-8 sequences. This is quite a reliable indication that the encoding really is UTF-8.


if the file contains null bytes, and the number of bytes greater than 127 is sufficiently high.


if none of the above apply, and there are bytes which would be C1 controls if the encoding were one of the ISO Latin encodings. (Several windows encodings use these values for non-control characters.) No attempt is made to distinguish the various Windows encodings.


otherwise. No attempt is made to distinguish the other ISO Latin encodings.


The detection algorithm is very simplistic. It is really only useful for distinguishing UTF-8 from ISO Latin-1.