| Re: aus and chars (was: CMOVE wrong?) |
|
 |
|
 |
|
 |
|
 |
Group: comp.lang.forth · Group Profile
Author: Bruce McFarlingBruce McFarling Date: Mar 4, 2007 11:50
On Mar 4, 12:52 pm, Albert van der Horst
wrote:
> I'm thinking in this direction because I hate to have
> 4 levels: address units, then bytes, then chars, then
> the XCHARS or wide chars.
There are never four levels associated with CHARS:
* address units then CHARS
* address units then WCHARS (if 1 CHARS = 1 is added)
* address units then CHARS then XCHARS for UTF-8
* if 1 CHARS = 1, au's then WCHARS then XCHARS for UTF-16
* if 1 CHARS = 2, au's then CHARS then XCHARS for UTF-16
BYTES exist in Forth-94 as such in the sense that there is
an ENVIRONMENT? query for the bits per AU, and applying
CHARS and CELLS to that result tells you the bits per char
and the bits per cell, so you can work out from there how to
handle bytes in that implementation.
For many people, that would be:
"hey, I assume 1 byte = 1 char = 1 au, you better go through
this code with a fine tooth comb to make sure this will work
on your system/implementation".
The thing is, since address-units can at present be less than,
equal to, or greater than bytes, a portable toolkit for handing
byte data that applies across the full range of systems in scope
would perhaps best be defined in terms of working with strings
of bytes, which a single byte being a one-long string.
That is, working a byte at a time is not likely to be normal
practice on a wider-au processor ... rather, bytes are more
often packed into address units, and worked with in chunks,
with individual byte-oriented operations only applying when
a chunk has a ragged beginning and/or end.
So portable code would more likely handle those as distinct
cases at the low level ... which means that the ENVIRONMENT?
query to work out what is the case at hand may well be the
appropriate level of support for bytes in the standard.
In the Niclos vaporware, I've just reserved the subfolder
names WIDEAU, WIDECHAR and MIXEDFLT to hold files that
cope with the situation of address units wider than one
byte, characters wider than one address unit, and a mixed
floating point stack, and would plan on issuing a warning
if any of those situations apply and the subfolders do
not exist. The corresponding BYTEAU, NARRCHAR and SPLITFLT
are also reserved, in case it is more convenient to have
consolidate differences in alternate files, but no warning
if issued if those cases apply and they are not present.
So a wordset collection that assumes split stacks, byte-wide
au's, and 1 CHARS = 1 can simply load, and someone that
wishes to port that collection to the alternate cases has
the structure and built in support for doing so.
|