Page 1 of 3

C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 3:25 am
by pini
Both models have advantages and drawbacks :
Pascal strings can only be 255 bytes long, if you're using a single byte to store its size, but C strings don't have size limits.

But strings manipulation function are easier (and quicker ?) to deal with when you're using Pascal strings : strlen is the most obvious., but functions like strcmp can be achieve simply with a

Code: Select all

repne cmpsb
or something like that.

I just wanted to know which kind of model you had chosen is your OS.

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 3:32 am
by Solar
C++ is the language, so I use C style strings. Simple as that. And while I haven't done much in the way of actual coding, I don't expect much string handling to occur in the kernel anyway.

And once you're in user space, it's the user's decision, now is it?

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 4:02 am
by Pype.Clicker
afaik, only the asm-os devers will actually have the choice. If you're using a high-level language, you're bound to how the compiler will issue constant strings ...
I don't expect much string handling to occur in the kernel anyway.
Filenames ? Any naming services (giving the applications the opportunity to use long names rather than magic numbers) ? DNS ?

Most of the time, when you receive a string from user mode, you expect the user to tell you explicitly its size anyway (so that you can quickly check that you won't cross protections barriers of any kind before using its content).

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 4:12 am
by Candy
DNS in kernel? That's odd... Why not make it a full userlevel process, dnscd?

Filenames etc can be treated as binary series terminated by an 8-bit aligned 8-bit series of 0. This also works for UTF-8, but not for UTF-16 (neither for 32). Still, UTF8 seems nice enough, although not very compact.

Just about any kernel thing should treat strings as binary, and keep all calls etc. as magic numbers under strict supervision. If possible, use a mapping system. Strings in a kernel are misplaced and slowing down unnecessarily.

PS for pini, repne cmpsb is slower than a normal loop on some modern processors, because the cmpsb instruction isn't optimized. Let alone the rep prefix...

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 4:25 am
by Solar
You are at the option of using Pascal strings internally and requiring conversion everytime some non-Pascal language wants to use your API, or the other way round (C strings internally, conversion for Pascal strings).

Now, most popular languages use C style strings. You'd force every C, C++, Perl, ... programmer to do some alien string conversion before calling your system API... I don't think this would increase overall system performance.

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 4:31 am
by distantvoices
look at lowlevel windows api - especially if you want to retrieve the text typed into a text edit: wooo, thats pascal to c conversion of a string: BSTR and sorta are welcome friends, because the windows operating system seems to be written in pascal - or at least the GUI subsystem.

For proof, just look at the declarations in crucial header files of Visual C++.

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 4:37 am
by Solar
beyond infinity wrote: BSTR and sorta are welcome friends, because the windows operating system seems to be written in pascal...
BSTR stands for "B string". B was the predecessor of C, and used word addressing instead of byte addressing. Makes for really ugly legacy API's. Hasn't much to do with Pascal, though.

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 4:40 am
by distantvoices
Ups, didn't know that one.

But look at the declarations:

The low level function delcarations look similar to this:

PASCAL int do_x(...);

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 5:25 am
by Schol-R-LEA
Solar wrote: BSTR stands for "B string". B was the predecessor of C, and used word addressing instead of byte addressing. Makes for really ugly legacy API's. Hasn't much to do with Pascal, though.
Solar, you don't make mistakes often, so I'm not going to begrudge you this, but Windows BSTRs have nothing tp do with the B language (which was never used outside of Bell Labs, AFAIK), or even (more plausibly) BCPL; rather, it stands for Basic String, because it's the underlying string type used in Visual Basic. Right idea, wrong language.

BTW, BSTRs were not always word-width; in the older versions of VB, they were byte-length ASCII chars. When VB went to 32-bit (version 4, IIRC), they were extended to wide chars for Unicode support. See here for details.

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 6:23 am
by Solar
Schol-R-LEA wrote: Solar, you don't make mistakes often, so I'm not going to begrudge you this...
I take this as high praise. *bows*
...but Windows BSTRs have nothing tp do with the B language (which was never used outside of Bell Labs, AFAIK), or even (more plausibly) BCPL...
Hm... I never encountered either in the wild, but the O'Reilly Language Poster lists it as CPL -> BCPL -> B -> C. I know AmigaOS dos.library was BCPL (supplied my Metacomco in form of their "Tripos" DOS subsystem), and I was willing to grant Microsoft having used a later revision of that language family. Didn't know about B being an exclusive Bell Labs plant. Ah, well.
...rather, it stands for Basic String, because it's the underlying string type used in Visual Basic. Right idea, wrong language.
8)

The reason is simple. As I said, AmigaOS had some "alien" BCPL code interspersed in its mainly C/ASM-written code base, and "BSTR" was a typedef for "BCPL string" when talking to dos.library.

And I was lucky in never having to deal with Windows API, so my memories deceived me with "certain" knowledge. ;-)


PS: Now everyone should know why the successor of C++ won't be labelled "P", or even "D", but "L"... ;-)

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 6:29 am
by distantvoices
So, see what the story tells: one never stops learning :-)

I've got to do with windows API at work whilst fiddling with this com thing and making it work together with OpenUTM - and being in the utter need to have some output/input possibility besides console. (remote control of some applications)

now me goes back to gui service implementation - title bar and close button are en train d'etre implemented. *what for a mix*

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 11:28 am
by mystran
Traditional pascal-strings have this stupid limitation of 255 characters, but in real life that does not apply, because you can just as well use 4 bytes (that's 32-bits) for your length, which is hardly a limitation. If you need longer, just use 64-bits instead. Duh.

As for memory requirements, in many cases it's good idea to use wchar_t instead of char anyway if one needs random access (utf-8 random access is O(n)) and wchar_t tends to be 32-bits.

Stock C-strings (char*) are only good for 8-bit charsets or utf-8, and although wide-C-strings (wchar_t*) are still usable ofcourse. In any case, I'd rather pack strings into nice objects, which handle random-access, concat and substring in something close to O(1) and forget about primitive strings representations all together. As for IPC apis and such, I'd use UTF-32/UTF-8 with separate (at least 32-bit) length.

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 11:44 am
by Schol-R-LEA
Solar wrote: Hm... I never encountered either in the wild, but the O'Reilly Language Poster lists it as CPL -> BCPL -> B -> C. I know AmigaOS dos.library was BCPL (supplied my Metacomco in form of their "Tripos" DOS subsystem), and I was willing to grant Microsoft having used a later revision of that language family. Didn't know about B being an exclusive Bell Labs plant. Ah, well.

[[...clip...]]

The reason is simple. As I said, AmigaOS had some "alien" BCPL code interspersed in its mainly C/ASM-written code base, and "BSTR" was a typedef for "BCPL string" when talking to dos.library.

And I was lucky in never having to deal with Windows API, so my memories deceived me with "certain" knowledge. ;-)
I thought that might be the case; I learned a little Amiga programming at one point, so I knew that Tripos was originally written in BCPL (Basic Cambridge Programming Language, a simplified version of CPL, which was the local Algol-60 variant).

As for B, well, AFAIK, it was only an experiment by Thompson and Ritchie, and was dropped once they started working on C. It's not much more than historical footnote, though it did have some impact on early Unix apparently. The Bell Labs Unix history pages talk about it a bit, as do most pages about the history of C itself. I also found an old manual for it on Ritchie's home page at the Labs.

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 11:49 am
by Curufir
I've been thinking about a system that stores a few bytes of information at the head of the string.

I'm looking at something along the lines of:

[String Length] [Character Set] [Character Width] [Flags] [Data....Data]

There are a couple of operations that this helps with (Notably anything that involves strlen), but it's mostly to do with my obsession to have as much data as possible available without computation. If it can be pre-computed then it should be pre-computed in my opinion.

Re:C Strings Vs Pascal Strings

Posted: Thu Aug 19, 2004 11:55 am
by Solar
...which means you have the data ready at hand for "querying" functions (strlen), but have to do additional algorithmics on "modifying" functions (strcat), and are still at a loss for another set of "querying" functions (strstr, strpbrk).

You also significantly blow up space requirements, and require the user to do some serious up-front setup for even the most simple fopen() derivate...

Not that it's a bad idea outright, but I feel the drawbacks are significant.