OS APIs without error codes

rdos · Post by **rdos** » Thu Apr 19, 2012 1:07 am

This is a more suitable thread for discussing OS APIs, and what they contain.

Traditional APIs (like DOS, Windows and *nix) are cluttered with error-codes, but is this really necesary? Isn't error-codes just a bagage from non-object oriented designs that are not really necesary?

First, lets take file oriented functions. These would traditionally return a large list of error-codes that typically are system (and version) dependent. But this is not really necesary.

Typical error-codes for OpenFile and their alternative, object-oriented solutions:
1. File does not exist. Use a directory function to check if the file exists
2. Security issues. Check security settings in some way
3. Filesystem errors. These are typically unrecoverable

Typical error-codes for CreateFile and their alternative, object-oriented solutions:
1. File already exist and cannot be overwritten. Use a directory function to check if the file exists
2. No disk space. Check free disk space
3. Part of path doesn't exist. Check with directory function
4. Security issues. Check security settings in some way
5. Filesystem errors. These are typically unrecoverable

Typical issues with ReadFile:
1. Invalid handle. If you checked the return of OpenFile/CreateFile, you would now if the handle is valid or not.
2. End of file. If you use GetCurrentPosition and GetFileSize you know if this occured
3. Filesystem errors. These are typically unrecoverable

Typical issues with WriteFile:
1. Invalid handle. If you checked the return of OpenFile/CreateFile, you would now if the handle is valid or not.
2. Disk full. Check free disk space
3. Filesystem errors. These are typically unrecoverable

Then, lets look at some other class, like a printer class. Here is a possible set of APIs that replaces the error-code interface (from RDOS API):
RdosIsPrinterJammed
RdosIsPrinterPaperLow
RdosIsPrinterPaperEnd
RdosIsPrinterOk
RdosIsPrintHeadLifted
RdosHasPrinterPaperInPresenter

These APIs easily replaces any inconsistent error-code based API, and the functions have a clear meaning.

Griwes · Post by **Griwes** » Thu Apr 19, 2012 1:34 am

You still have to return the error code from kernel space somehow; if you don't do it at the time when you return from syscall, you will be forced to do another syscall just to recover the error code. Those printer functions you mentioned can still be implemented, when the syscall returns error code (and that is the API; user space library may provide convenient wrappers around simple, numerical error code returned by kernel).

The problem here may be the definition of what we are talking about; if you are saying that syscall (as in "a procedure called in kernel space", I'm not counting user space wrapping code as syscall) shouldn't return error code an you need another syscall just to retrieve it, you are just wrong. But if you are saying that syscall itself should return error code, but user space library function should return boolean and additional userspace-only call should be required to retrieve information about the error, you are right.

Another point here is whether it should be set of functions, like in your example, or a simple GetStatus() function returning some enum values. I would rather provide single call returning every possible status code than call tons of different functions to find out what exactly went wrong.

iansjack · Post by **iansjack** » Thu Apr 19, 2012 2:19 am

You seem to be saying that the onus for error-checking should rest entirely with the user program rather than the OS. I have to disagree. As far as I am concerned the OS should do as much work as possible (and appropriate) on behalf of the user.

Take the file creation example. There are many reasons why a file creation operation may not succeed, ranging from prior existence, through not enough file system space, to access denied (with a number of error in between that I haven't thought of). It seems crazy to ask, and rely on, each user program to check every possible error. Far better to let the Operating System, which presumably is written to take account of all errors - even ones that the application programmer might not think of - check all possible errors and return a value that says "the create succeeded" or "the create failed" and then an error code, which may also be in the return value, specifying the reason for the failure.

This way the application programmer just needs to check if the operation failed. If so, he may make further checks based on the error code, or he may just abort the operation. But we don't rely upon the programmer knowing, and thinking about, all possible causes of failure for all possible operations when he writes a program. He just needs to know that an operation can fail. And, who knows, in a later version of the OS there may be new ways for an operation to fail.

To expect the application programmer to be aware of all possible failure modes is just laziness on the part of the OS developer.

gravaera · Post by **gravaera** » Thu Apr 19, 2012 2:31 am

Yo:

What about this situation?

Code: Select all

VMM_getPages(1, SOME | RELEVANT | FLAGS);

In this case, to satisfy a virtual memory allocation, the kernel also needs to add to a list of metadata which is used to track current allocation count, and size of each allocation. Naturally, if there is 1 page left in the address space, and you call something like "VMM_checkPagesRemaining(1)", you'll get a non error result. But what if this next allocation will require a resize of the metadata structure such that a new page is needed to expand it? In reality 2 pages are needed. The allocation will fail, and your API will say nothing.

The example isn't that solid, but I hope you can see where I'm coming from?

EDIT: There are also timing based considerations to take into account: what if the application calls to ask for the availability of a resource, gets a positive result, then just before it makes the call to acquire the resource, it gets pre-empted? When it comes back and finally executes the call, what guarantee is there that it will have gained the resource? Will you then implement a "post call check" to accompany the "pre-call check"?

--Peace out
gravaera

Combuster · Post by **Combuster** » Thu Apr 19, 2012 2:43 am

Since printer statuses are nothing different than error reasons, and that iteratively determining the error cause by checking each option, the logical output of both method demonstrates that even RDOS' api obviously has them implemented at some level.

At any rate, the service already knows why something went wrong. The service also already passes a status code back to the user to inform him of the difference between success and failure, so the additional cost of passing the reason instead of just "fail" is often zero. Checking each possible failure cause is definitely a performance hit, and not knowing the cause because your checklist is nonexhaustive is typically impacting your economy because you will have no clue as to how to fix it.

Solar · Post by **Solar** » Thu Apr 19, 2012 3:17 am

Error-code-based:

"Do X."
"X failed. Reason: C."
"Thanks."

Error-code-free:

"Do X."
"X failed."
"Because of A?"
"No."
"Because of B?"
"No."
"Because of C?"
"Yes."
"Thanks."

Disadvantages of the latter:

Redundant code (since every app doing "X" has to do the A-B-C tapdance).
Replacing "many error codes" with "many error-checking functions", which are likely to include additional syscalls (performance!).
Consistency. What happens if "C" has been resolved by the time you get around asking for it? Or if suddenly condition "B" holds true, which it didn't when you actually tried to do "X"? You'd be sending the application quite confusing information.

Then there's the point where certain APIs have a massive amount of potential error causes. An Oracle database has thousands of error codes. Getting an error code in an integer is easy, but polling thousands of error conditions?

rdos · Post by **rdos** » Thu Apr 19, 2012 4:00 am

Griwes wrote:You still have to return the error code from kernel space somehow; if you don't do it at the time when you return from syscall, you will be forced to do another syscall just to recover the error code. Those printer functions you mentioned can still be implemented, when the syscall returns error code (and that is the API; user space library may provide convenient wrappers around simple, numerical error code returned by kernel).

I have no internal error codes, so there is no function to "return last error". Additionally, no syscall returns an error-code, and there is no list of error-codes in the API since it doesn't support that. Most functions return success / failure with CY flag (would be converted to a boolean return value in the C/C++ wrapper).

In the case of the printer, the various error-check procedures could be implemented by converting a particular printer's error codes, but the device-error codes themselves are unavailable.

Griwes wrote:Another point here is whether it should be set of functions, like in your example, or a simple GetStatus() function returning some enum values. I would rather provide single call returning every possible status code than call tons of different functions to find out what exactly went wrong.

I wouldn't. You see, if you have 4 different procedures, you need 2^4 different error codes to cover all the possibilities. The number of error-codes grows exponentially with the number of failure conditions. And if you don't do it like that, then you can only recover the major error, not the minor ones.

rdos · Post by **rdos** » Thu Apr 19, 2012 4:07 am

Solar wrote:Error-code-based:

"Do X."
"X failed. Reason: C."
"Thanks."

Corrected:

Code: Select all

    error_code = Do X
    switch (error_code)
    {
        y: dothis
        z: dothis
....

      default:
          ok;
    }

    error_code = Do Y
    switch (error_code)
    {
        y: dothis
        z: dothis
....

      default:
          ok;
    }

....

Solar wrote:Error-code-free:

"Do X."
"X failed."
"Because of A?"
"No."
"Because of B?"
"No."
"Because of C?"
"Yes."
"Thanks."

Corrected:

Code: Select all

    ok = DoX
    if (ok)
        ok = DoY

CheckStatus:
    if (HasA)
       printf("Has A");

    if (HasB)
       printf("Has B");

    if (HasC)
       printf("Has C");

rdos · Post by **rdos** » Thu Apr 19, 2012 4:10 am

Combuster wrote:Since printer statuses are nothing different than error reasons, and that iteratively determining the error cause by checking each option, the logical output of both method demonstrates that even RDOS' api obviously has them implemented at some level.

That's not how it is done. The code that show the "receipt unavailable" button uses RdosIsPrinterOk. The code that prints a receipt to a customer will use the returned status to indicate if the customer got the receipt or not, and is not interested in why it failed.

Combuster · Post by **Combuster** » Thu Apr 19, 2012 5:36 am

rdos wrote:The number of error-codes grows exponentially with the number of failure conditions.

Which implies the false and irrelevant assumption that every state (not error) can occur simultaneously with all other states (not errors). Prove that again in a Vulcan-friendly manner, please. I'm 100% sure you can't.

Solar · Post by **Solar** » Thu Apr 19, 2012 6:33 am

rdos wrote:Corrected:

If you "correct" perfectly-good pseudocode into (pseudo-) source, don't selectively disable your brain (*), and at least use a somewhat-correct syntax, will you?

Code: Select all

    int rc = do( X );
    if ( rc == 0 )
        rc = do( Y );

    switch ( rc )
    {
        case 0:
            break;
        case a:
            handleError_A();
            break;
        case b:
            handleError_B();
            break;
        default:
            unexpectedError( rc );
            break;
    }

vs.

Code: Select all

    bool rc = do( X );
    if ( rc )
        rc = do( Y );

    if ( ! rc )
    {
        if ( isErrorA() )
            handleError_A();
        else if ( isErrorB() )
            handleError_B();
        else
            unexpectedError();
    }

Note how the second snippet has a larger number of function (system?) calls, increasing with the number of possible error causes, whereas the first snippet is a simple table lookup. The second snippet also can't tell you anything in case of an unexpected error condition, whereas the first snippet can at least tell you what the error was (quite verbosely, even, if you add something like perror() to the equation). Indeed, if you intend to handle all failures via "log and exit" (which is fully appropriate for many small tools and even some bigger applications), the former snippet degrades to:

Code: Select all

    int rc = do( X );
    if ( rc == 0 )
        rc = do( Y );

    if ( rc != 0 )
    {
        puts( errorText( rc ) );
        exit( EXIT_FAILURE );
    }

Try that with a boolean.

rdos wrote:The number of error-codes grows exponentially with the number of failure conditions.

That's complete BS. We're talking error codes, not error flags.

----

(*): That is the reason why you should never believe those "language X vs. all others" benchmarks written by advocates, or long-time-X-only users, of language X: Selective intelligence.

Griwes · Post by **Griwes** » Thu Apr 19, 2012 7:05 am

rdos wrote:
Griwes wrote:You still have to return the error code from kernel space somehow; if you don't do it at the time when you return from syscall, you will be forced to do another syscall just to recover the error code. Those printer functions you mentioned can still be implemented, when the syscall returns error code (and that is the API; user space library may provide convenient wrappers around simple, numerical error code returned by kernel).
I have no internal error codes, so there is no function to "return last error". Additionally, no syscall returns an error-code, and there is no list of error-codes in the API since it doesn't support that. Most functions return success / failure with CY flag (would be converted to a boolean return value in the C/C++ wrapper).

This implies that every time you want to know whether something is wrong, you have to do another syscall. One additional syscall for every possible broken thing. That is just wrong; why don't you start having an internal error message, that you can present to the user without having him poll every possible thing that could go wrong?

rdos wrote:
Griwes wrote:Another point here is whether it should be set of functions, like in your example, or a simple GetStatus() function returning some enum values. I would rather provide single call returning every possible status code than call tons of different functions to find out what exactly went wrong.
I wouldn't. You see, if you have 4 different procedures, you need 2^4 different error codes to cover all the possibilities. The number of error-codes grows exponentially with the number of failure conditions. And if you don't do it like that, then you can only recover the major error, not the minor ones.

Wrong. For example, given function File::Open(), you would need few error codes - file not found, file is locked, you don't have permissions to open the file, filepath is not valid VFS path and so on. Each of them is one of 256 (or 2^16, ^32 or ^64, depending on which integer type you want to return) possible values.

Now, imagine that you just return true/false.

Code: Select all

File file;
if (!file.Open("/some/path/in/filesystem")
{
    if (file.DoesTheFileExist())
        //...
    else if (file.IsTheFileLocked())
        //...
    else if (file.DoIHavePermissionsToAccessThisFile())
        //...
}

else
{
    //...
}

Now, error-code version.

Code: Select all

File file("/some/path/in/filesystem");

if (file) // nice operator bool() overload in beloved C++
{
    //...
}

else
{
    switch (file.Error())
    {
        case FileNotFound: //...
        case FileLocked: //...
        case FileAccessNotPermitted: //...
    }
}

Now, even if you think both snippets are similar, notice that insane number of syscalls (user-space library only knows whether a syscall was successful - so let's ask a kernel many times about "what is wrong now" - note difference between "what is wrong now?" and "what went wrong during that syscall?").

rdos wrote:
Solar wrote: Error-code-free:

"Do X."
"X failed."
"Because of A?"
"No."
"Because of B?"
"No."
"Because of C?"
"Yes."
"Thanks."
Corrected:
Code: Select all
    ok = DoX
    if (ok)
        ok = DoY

CheckStatus:
    if (HasA)
       printf("Has A");

    if (HasB)
       printf("Has B");

    if (HasC)
       printf("Has C");

Here you have same problem as I (and others) mentioned few times already (besides that "do a syscall for every possible source of problem") - you only know what's the status right now, not what it was back then. Let's say you tried to open a locked file. Syscalls returns only "false". You run every possible check now, but in the meantime, scheduler taken your process' time and given it to process/thread that unlocked the file. When control is returned to app that tried to open that formerly-locked file, it won't be able to determine what happened - and will probably keep trying to open file without knowing what the reason was (and, again, some other process can acquire the file lock after we check it, but before we re-try to open the file), instead of just calling "Thread::Current::WaitForFile(file);".

bluemoon · Post by **bluemoon** » Thu Apr 19, 2012 7:18 am

Then, how about the error-polling function itself failed? Since it's a system call, it can fail at some point.

Solar · Post by **Solar** » Thu Apr 19, 2012 7:32 am

The whole discussion is ridiculous. The system knows what has failed. Not telling the caller what it was is "data hiding" bass-ackwards.

Note how "HasA()", "HasB()" etc. either are polling an error code held by the system (why not return that in the first place?), or having each and every one of them a non-zero cost themselves in addition to the syscall itself (stat if file present, stat if permissions set, stat if file non-empty, ...).

iansjack · Post by **iansjack** » Thu Apr 19, 2012 8:25 am

Solar wrote:The whole discussion is ridiculous. The system knows what has failed. Not telling the caller what it was is "data hiding" bass-ackwards.

Exactly. A little analogy (I like analogies):

I go into my local BMW dealer, have a look round the showroom then, go up to the guy in the sharp suit. "I'd like to buy a new 3 Series". "Sorry Sir, I can't sell you one."

"Why not - you are a BMW dealer aren't you?" "Oh, yes sir - the only one in the district."

"You have the 3 Series in stock?" "We certainly do, Sir. Lots of them."

"Is there something wrong with the 3 Series?" "No, Sir. It's a very fine car. We sell half a dozen each week?"

"Well, is my credit no good?" "No, Sir. I checked your credit whilst you were browsing. You're good for $1,000,000."

And so on, and so on, until, stumped, I give up and wander into the Audi dealer across the road. Meanwhile back in the BMW dealer the manager has come out of the office and is talking to my friend. "That was Mr Jack; he's one of our best customers. Buys a new car ever six months. What did he want?"

"He wanted me to sell him a new 3 Series. I told him I couldn't do that."

"Of course you couldn't. Your'e the chief accountant, not a salesman. Why didn't you tell him that?"

"That's not up to me. It's the customer's responsibility to find out why I can't sell him a car."

If I owned a car dealership, that guy would get the sack. If I owned an OS, any API that acted like that would get the sack too. It's a crazy way to behave.

OSDev.org

OS APIs without error codes

OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes

Re: OS APIs without error codes