| Re: C Interoperability - Strings |
|
 |
|
 |
|
 |
|
 |
Group: comp.lang.fortran · Group Profile
Author: Gary ScottGary Scott Date: Sep 3, 2008 18:42
glen herrmannsfeldt wrote:
> Gary Scott wrote:
> (snip)
>
>> I think that I would have done it differently. I would have (if I
>> were sticking with this method of delimiting) required a leading
>> character in addition to a trailing character. The leading and
>> trailing character must match and must be a character not present in
>> the literal string itself. This allows you to change the delimiter,
>> sometimes useful for hardware devices that expect null characters for
>> other purposes (e.g. timing, noops). You could also add a function to
>> set the string delimiter I guess, but it might be messy to make that
>> globally available.
>
>
> That method is used by some systems for delimiting stings as
> command input. I believe some DEC editors used it for string
> search commands, and maybe some unix editors, too. I don't
> know any that use it for strings in storage, though, but it
> does seem a good idea, and is only slightly harder to process.
>
> There are some algorithms that use the constant terminator
> to advantage, with pointers to the middle of a string,
> running until the next null terminator. One that I
> know about is the suffix array algorithm.
>
> Though I believe my choice is still to store the current
> length at the beginning.
That would be my preference. I'd probably also include a single byte at
the beginning to identify the size/kind of the following integer. That
way, you could handle virtually any string length, while being more
space efficient for shorter strings (you could use a one-byte length
(plus the 1-byte type byte)). You could specify a type (maybe in bits)
of a 64 or 128 bit integer for very long strings (128-bit seems unlikely
to be necessary, but flexible anyway). This scheme also allows you to
make a round trip write/read operation for variable length strings or DT
components without foreknowledge of the string length. You could also
include a rudamentary checksum for the length integer within the type
byte (unless you wanted to be able to specify in bits and use the full
range available).
>
> -- glen
>
|