|
|
Up |
|
|
  |
Author: Ted ZlatanovTed Zlatanov Date: Aug 14, 2008 14:02
This should decode to нуль but doesn't (I get the same string instead):
(decode-coding-string "íîëü" 'cp1251)
Am I missing something obvious? Do I need to encode the string to
something else?
Ted
|
| |
|
| | 14 Comments |
|
  |
Author: David GoldenDavid Golden Date: Aug 14, 2008 15:20
Ted Zlatanov wrote:
> This should decode to нуль but doesn't (I get the same string
> instead):
>
> (decode-coding-string "íîëü" 'cp1251)
>
> Am I missing something obvious? Do I need to encode the string to
> something else?
>
Guessing you're using a new multibyte/unicode emacs, and noting that I
do not currently fully understand emacs encoding handling, but...
probably - try:
(decode-coding-string (encode-coding-string "íîëü" 'iso-8859-1) 'cp1251)
|
| |
|
| | no comments |
|
  |
Author: Dmitry DzhusDmitry Dzhus Date: Aug 14, 2008 15:19
Ted Zlatanov wrote:
> This should decode to нуль but doesn't (I get the same string instead):
>
> (decode-coding-string "íîëü" 'cp1251)
>
> Am I missing something obvious? Do I need to encode the string to
> something else?
0. «íóëü», not «íîëü»
1. (decode-coding-string (string-make-unibyte "íóëü") 'cp1251)
|
| |
| no comments |
|
  |
Author: Eli ZaretskiiEli Zaretskii Date: Aug 15, 2008 00:37
> From: Dmitry Dzhus sphinx.net.ru>
> Date: Fri, 15 Aug 2008 02:19:11 +0400
>
> Ted Zlatanov wrote:
>
>> This should decode to нуль but doesn't (I get the same string instead):
>>
>> (decode-coding-string "íîëü" 'cp1251)
>>
>> Am I missing something obvious? Do I need to encode the string to
>> something else?
>
> 0. «íóëü», not «íîëü»
That's not important, the original problem remains, even with a
different spelling of the word.
> 1. (decode-coding-string (string-make-unibyte "íóëü") 'cp1251)
|
| Show full article (0.83Kb) |
| no comments |
|
  |
Author: Ted ZlatanovTed Zlatanov Date: Aug 15, 2008 08:54
On Fri, 15 Aug 2008 02:19:11 +0400 Dmitry Dzhus sphinx.net.ru> wrote:
DD> (decode-coding-string (string-make-unibyte "íóëü") 'cp1251)
Thanks.
There should probably be a specific function for this:
(decode-coding-string-as-unibyte "íóëü" 'cp1251)
ditto for decode-coding-region. Should I add it or is that not
generally useful? A flag is not as good because both functions have
several flags already.
Ted
|
| |
| no comments |
|
  |
Author: Dmitry DzhusDmitry Dzhus Date: Aug 15, 2008 09:06
Eli Zaretskii wrote:
>> From: Dmitry Dzhus sphinx.net.ru>
>> Date: Fri, 15 Aug 2008 02:19:11 +0400
>>
>> Ted Zlatanov wrote:
>>
>>> This should decode to нуль but doesn't (I get the same string instead):
>>>
>>> (decode-coding-string "íîëü" 'cp1251)
>>>
>>> Am I missing something obvious? Do I need to encode the string to
>>> something else?
>>
>> 0. «íóëü», not «íîëü»
>
> That's not important, the original problem remains, even with a
> different spelling of the word.
|
| Show full article (0.84Kb) |
| no comments |
|
  |
Author: Eli ZaretskiiEli Zaretskii Date: Aug 15, 2008 10:04
> From: Ted Zlatanov lifelogs.com>
> Date: Fri, 15 Aug 2008 10:54:20 -0500
>
> There should probably be a specific function for this:
>
> (decode-coding-string-as-unibyte "íóëü" 'cp1251)
>
> ditto for decode-coding-region. Should I add it or is that not
> generally useful?
Personally, I think it's not useful, since decode-coding-region and
decode-coding-string are used only on unibyte text. But feel free to
raise this on emacs-devel.
|
| |
| no comments |
|
  |
Author: Ted ZlatanovTed Zlatanov Date: Aug 18, 2008 06:58
On Fri, 15 Aug 2008 20:04:42 +0300 Eli Zaretskii gnu.org> wrote:
>> From: Ted Zlatanov lifelogs.com>
>> Date: Fri, 15 Aug 2008 10:54:20 -0500
>>
>> There should probably be a specific function for this:
>>
>> (decode-coding-string-as-unibyte "íóëü" 'cp1251)
>>
>> ditto for decode-coding-region. Should I add it or is that not
>> generally useful?
EZ> Personally, I think it's not useful, since decode-coding-region and
EZ> decode-coding-string are used only on unibyte text. But feel free to
EZ> raise this on emacs-devel.
How would you recommend decoding text from particular encodings? Given
text like the one shown above in a buffer, only decode-coding-region
seems to DTRT, and it's not interactive.
Context: I have a file full of CP1251 data and don't want to use Perl's
Encode module because I'm stubborn and think Emacs should handle it :)
On Fri, 15 Aug 2008 20:06:59 +0400 Dmitry Dzhus sphinx.net.ru> wrote:
|
| Show full article (1.38Kb) |
| no comments |
|
  |
Author: David GoldenDavid Golden Date: Aug 18, 2008 10:45
Ted Zlatanov wrote:
> Context: I have a file full of CP1251 data and don't want to use
> Perl's Encode module because I'm stubborn and think Emacs should
> handle it :)
Just in case: If you have a file full of cp1251, and you know it's
cp1251, it' s usually best to just open it as cp1251 in the first
place!
C-x RET c cp1251 C-x C-f myfile.txt
It's typically only if you've got a file full of fragments in
different encodings (horrible mail spool formats and the like) that
you want to decode and reencode particular subregions of whole files.
If you've already opened a file and its encoding is misdetected,
you can also hit
C-x RET r cp1251
to "revert" the buffer to the file reopened in the specified encoding.
|
| |
| no comments |
|
  |
|
|
  |
Author: Eli ZaretskiiEli Zaretskii Date: Aug 18, 2008 12:11
> From: Ted Zlatanov lifelogs.com>
> Date: Mon, 18 Aug 2008 08:58:55 -0500
>
> How would you recommend decoding text from particular encodings? Given
> text like the one shown above in a buffer, only decode-coding-region
> seems to DTRT, and it's not interactive.
If you mean interactively, i.e. you visited a buffer and then
discovered that it was decoded incorrectly, and the actual encoding is
different, then "C-x RET c cp1251 RET M-x revert-buffer RET" should do
what you want, I think.
> Context: I have a file full of CP1251 data and don't want to use Perl's
> Encode module because I'm stubborn and think Emacs should handle it :)
What about the rest of the file? is it encoded in some other encoding?
If not, then the above recipe should do. If it doesn't, please tell
more details.
|
| Show full article (1.33Kb) |
| no comments |
|
|
|
|