I wrote a small java program to read filenames from stdin (produced by
Linux' "find"), and then to divide those files up into like groups.
Actually, it was originally a python program, but I've been wanting to
expand my horizons a little, so I rewrote it in perl, and now I'm trying
to redo it in java to celebrate java going opensource, and I'll likely
rewrite it in Haskell and/or Objective Caml after the java version.
The java version of the program seems to work pretty well, and I have a
feeling it's going to prove faster than the python or perl versions
(which are at
http://stromberg.dnsalias.org/~strombrg/equivalence-
classes.html - and I hope to put the java version there too after it's
working a little better).
However, to my disappointment, the java version of the program can't seem
to deal with filenames that have umlauts in them. Filenames using only
characters in the English alphabet seem fine.
I suspect the problem is that the file_name_, as it appears in a Linux
ext3 filesystem, has an 8 bit per character representation, but java
wants to convert the string I read from stdin to a 16 bit per character
representation, and then doesn't reverse the conversion when I go to open
the file by its name.