decode-coding-string question
  Home FAQ Contact Sign in
gnu.emacs.help only
 
Advanced search
POPULAR GROUPS

more...

gnu.emacs.help Profile…
 Up
decode-coding-string question         


Author: Ted Zlatanov
Date: Aug 14, 2008 14:02

This should decode to нуль but doesn't (I get the same string instead):

(decode-coding-string "íîëü" 'cp1251)

Am I missing something obvious? Do I need to encode the string to
something else?

Ted
14 Comments
Re: decode-coding-string question         


Author: David Golden
Date: Aug 14, 2008 15:20

Ted Zlatanov wrote:
> This should decode to нуль but doesn't (I get the same string
> instead):
>
> (decode-coding-string "íîëü" 'cp1251)
>
> Am I missing something obvious? Do I need to encode the string to
> something else?
>
Guessing you're using a new multibyte/unicode emacs, and noting that I
do not currently fully understand emacs encoding handling, but...
probably - try:

(decode-coding-string (encode-coding-string "íîëü" 'iso-8859-1) 'cp1251)
no comments
Re: decode-coding-string question         


Author: Dmitry Dzhus
Date: Aug 14, 2008 15:19

Ted Zlatanov wrote:
> This should decode to нуль but doesn't (I get the same string instead):
>
> (decode-coding-string "íîëü" 'cp1251)
>
> Am I missing something obvious? Do I need to encode the string to
> something else?

0. «íóëü», not «íîëü»

1. (decode-coding-string (string-make-unibyte "íóëü") 'cp1251)
--
Happy Hacking.

http://sphinx.net.ru
no comments
Re: decode-coding-string question         


Author: Eli Zaretskii
Date: Aug 15, 2008 00:37

> From: Dmitry Dzhus sphinx.net.ru>
> Date: Fri, 15 Aug 2008 02:19:11 +0400
>
> Ted Zlatanov wrote:
>
>> This should decode to нуль but doesn't (I get the same string instead):
>>
>> (decode-coding-string "íîëü" 'cp1251)
>>
>> Am I missing something obvious? Do I need to encode the string to
>> something else?
>
> 0. «íóëü», not «íîëü»

That's not important, the original problem remains, even with a
different spelling of the word.
> 1. (decode-coding-string (string-make-unibyte "íóëü") 'cp1251)
Show full article (0.83Kb)
no comments
Re: decode-coding-string question         


Author: Ted Zlatanov
Date: Aug 15, 2008 08:54

On Fri, 15 Aug 2008 02:19:11 +0400 Dmitry Dzhus sphinx.net.ru> wrote:
DD> (decode-coding-string (string-make-unibyte "íóëü") 'cp1251)

Thanks.

There should probably be a specific function for this:

(decode-coding-string-as-unibyte "íóëü" 'cp1251)

ditto for decode-coding-region. Should I add it or is that not
generally useful? A flag is not as good because both functions have
several flags already.

Ted
no comments
Re: decode-coding-string question         


Author: Dmitry Dzhus
Date: Aug 15, 2008 09:06

Eli Zaretskii wrote:
>> From: Dmitry Dzhus sphinx.net.ru>
>> Date: Fri, 15 Aug 2008 02:19:11 +0400
>>
>> Ted Zlatanov wrote:
>>
>>> This should decode to нуль but doesn't (I get the same string instead):
>>>
>>> (decode-coding-string "íîëü" 'cp1251)
>>>
>>> Am I missing something obvious? Do I need to encode the string to
>>> something else?
>>
>> 0. «íóëü», not «íîëü»
>
> That's not important, the original problem remains, even with a
> different spelling of the word.
Show full article (0.84Kb)
no comments
Re: decode-coding-string question         


Author: Eli Zaretskii
Date: Aug 15, 2008 10:04

> From: Ted Zlatanov lifelogs.com>
> Date: Fri, 15 Aug 2008 10:54:20 -0500
>
> There should probably be a specific function for this:
>
> (decode-coding-string-as-unibyte "íóëü" 'cp1251)
>
> ditto for decode-coding-region. Should I add it or is that not
> generally useful?

Personally, I think it's not useful, since decode-coding-region and
decode-coding-string are used only on unibyte text. But feel free to
raise this on emacs-devel.
no comments
Re: decode-coding-string question         


Author: Ted Zlatanov
Date: Aug 18, 2008 06:58

On Fri, 15 Aug 2008 20:04:42 +0300 Eli Zaretskii gnu.org> wrote:
>> From: Ted Zlatanov lifelogs.com>
>> Date: Fri, 15 Aug 2008 10:54:20 -0500
>>
>> There should probably be a specific function for this:
>>
>> (decode-coding-string-as-unibyte "íóëü" 'cp1251)
>>
>> ditto for decode-coding-region. Should I add it or is that not
>> generally useful?
EZ> Personally, I think it's not useful, since decode-coding-region and
EZ> decode-coding-string are used only on unibyte text. But feel free to
EZ> raise this on emacs-devel.

How would you recommend decoding text from particular encodings? Given
text like the one shown above in a buffer, only decode-coding-region
seems to DTRT, and it's not interactive.

Context: I have a file full of CP1251 data and don't want to use Perl's
Encode module because I'm stubborn and think Emacs should handle it :)

On Fri, 15 Aug 2008 20:06:59 +0400 Dmitry Dzhus sphinx.net.ru> wrote:
Show full article (1.38Kb)
no comments
Re: decode-coding-string question         


Author: David Golden
Date: Aug 18, 2008 10:45

Ted Zlatanov wrote:
> Context: I have a file full of CP1251 data and don't want to use
> Perl's Encode module because I'm stubborn and think Emacs should
> handle it :)

Just in case: If you have a file full of cp1251, and you know it's
cp1251, it' s usually best to just open it as cp1251 in the first
place!

C-x RET c cp1251 C-x C-f myfile.txt

It's typically only if you've got a file full of fragments in
different encodings (horrible mail spool formats and the like) that
you want to decode and reencode particular subregions of whole files.

If you've already opened a file and its encoding is misdetected,
you can also hit
C-x RET r cp1251
to "revert" the buffer to the file reopened in the specified encoding.
no comments
Re: decode-coding-string question         


Author: Eli Zaretskii
Date: Aug 18, 2008 12:11

> From: Ted Zlatanov lifelogs.com>
> Date: Mon, 18 Aug 2008 08:58:55 -0500
>
> How would you recommend decoding text from particular encodings? Given
> text like the one shown above in a buffer, only decode-coding-region
> seems to DTRT, and it's not interactive.

If you mean interactively, i.e. you visited a buffer and then
discovered that it was decoded incorrectly, and the actual encoding is
different, then "C-x RET c cp1251 RET M-x revert-buffer RET" should do
what you want, I think.
> Context: I have a file full of CP1251 data and don't want to use Perl's
> Encode module because I'm stubborn and think Emacs should handle it :)

What about the rest of the file? is it encoded in some other encoding?
If not, then the above recipe should do. If it doesn't, please tell
more details.
Show full article (1.33Kb)
no comments
1 2