| Re: term/encoding problem |
|
 |
|
 |
|
 |
|
 |
Group: gnu.emacs.help · Group Profile
Author: Andreas PolitzAndreas Politz Date: Sep 19, 2008 12:30
Peter Dyballa wrote:
>
> Am 18.09.2008 um 20:16 schrieb Andreas Politz:
>
>> Note that I get a 'Invalid character' message, when I try to
>> insert it via quoted-insert and it's octal value
>> ( C-q 22622 ).
>
> Ahh! So you're with GNU Emacs 22.x? I can reproduce it in 22.2. Once I
> check this character in Kermit's utf8.txt file it's described as:
>
> character: â–’ (299218, #o1110322, #x490d2, U+2592)
> charset: mule-unicode-2500-33ff
> (Unicode characters of the range U+2500..U+33FF.)
>
> In UTF-8 presentation this character is encoded with these three bytes:
> E2 96 92. These are in "ASCII" (rather an 8-bit "ASCII"): ‚ ñ Ã. Using
> C-q 1 1 1 0 3 2 2 I can insert HALF SHADE. Could be
> this non-Unicode Emacs has to use some extras to handle this ...
>
> If no-one on this list has an explanation I'd write a bug report (see
> Help menu), also mentioning the 'Invalid character' message. Although it
> looks as if GNU Emacs 22.x seems to recommend to use 1110322 instead of
> 22622 ...
>
> --
From what I learned since my first mail, emacs22 uses it's own distinguished
encoding for it's buffers (mule), which explains the difference byte codes.
But, I think I found the problem. term uses `binary' as input coding.
After it has examined the input, it inserts the relevant/visible parts
of it into the buffer. Only at this point it decodes the bytes with
the apropriate coding (variable:locale-coding-system).
At some point it splits the input string, to make it suitable for the
with of the `terminal'. The problem is, that it measures bytes not
characters. So the 3-byte character in question in aptitude, which is mostly
on the last column, gets split in 2 strings a 1 and 2 byte. This 2
strings, when encoded and inserted independently, will result in
what was described as the problem.
I filed a bug report.
Thanks anyway.
-ap
|