Funny (NOT!) piece of C strangeness.

Programming, for all ages and all languages.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Solar »

Kevin wrote:
Solar wrote:(Edit: Not sure with the value, -1 could be right as well)
Matching failure means no value is assigned.
Every good solution is obvious once you've found it.
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Kevin »

Solar wrote:
Kevin wrote:(Edit: Not sure with the value, -1 could be right as well)
Matching failure means no value is assigned.
Hm, I still haven't found the place where this is defined.

But otherwise you agree? So 1 of 2 for MSVC (assuming that you are right), glibc and BSD?
Developer of tyndur - community OS of Lowlevel (German)
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Solar »

Basically, whether it is allowable to consume non-matching input is the core of the whole question. If it were clearly stated in the standard, we wouldn't have this discussion...

I'd say no, it is not allowable (as it leaves the input in an undefined state).

But because all we can do here is mere guesswork, I mailed a former co-worker of mine who's sitting in the C++ standards committee. Some of the people there also sit in the C committee. I might get an authoritative response from there.
Every good solution is obvious once you've found it.
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Funny (NOT!) piece of C strangeness.

Post by jal »

Solar wrote:Basically, whether it is allowable to consume non-matching input is the core of the whole question. If it were clearly stated in the standard, we wouldn't have this discussion...
If it is clearly stated that fscanf can push back at most one character, and if a prefix can be more than one character (which it can), we can be certain that consuming non-matching input is allowable.


JAL
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Solar »

Can we? Or does it mean that "0x" has to be considered matching?

I wouldn't really bother if it weren't for my work on PDCLib being currently stuck at precisely this point. I simply don't want to work on assumptions.

PS: During the course of this discussion, I came up with a couple more border cases, which I will summarize in an extended test program. Will post that in a couple of hours.
Every good solution is obvious once you've found it.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Solar »

OK, updated test program including results.

Code: Select all

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

int main()
{
    int i, count, rc;
    unsigned u;
    char * endptr = NULL;
    char culprit[] = "+0xz";

    /* File I/O to assert fscanf == sscanf */
    FILE * fh = fopen( "testfile", "w+" );
    fprintf( fh, "%s", culprit );
    rewind( fh );

    /* sscanf base 16 */
    u = -1; count = -1;
    rc = sscanf( culprit, "%x%n", &u, &count );
    printf( "sscanf:  Returned %d, result %2d, consumed %d\n", rc, u, count );

    /* fscanf base 16 */
    u = -1; count = -1;
    rc = fscanf( fh, "%x%n", &u, &count );
    printf( "fscanf:  Returned %d, result %2d, consumed %d\n", rc, u, count );
    rewind( fh );

    /* strtoul base 16 */
    u = strtoul( culprit, &endptr, 16 );
    printf( "strtoul:             result %2d, consumed %d\n", u, endptr - culprit );

    puts( "" );

    /* sscanf base 0 */
    i = -1; count = -1;
    rc = sscanf( culprit, "%i%n", &i, &count );
    printf( "sscanf:  Returned %d, result %2d, consumed %d\n", rc, i, count );

    /* fscanf base 0 */
    i = -1; count = -1;
    rc = fscanf( fh, "%i%n", &i, &count );
    printf( "fscanf:  Returned %d, result %2d, consumed %d\n", rc, i, count );
    rewind( fh );

    /* strtol base 0 */
    i = strtol( culprit, &endptr, 0 );
    printf( "strtoul:             result %2d, consumed %d\n", i, endptr - culprit );

    fclose( fh );
    return 0;
}

/* newlib 1.14

sscanf:  Returned 1, result  0, consumed 2
fscanf:  Returned 1, result  0, consumed 2
strtoul:             result  0, consumed 0

sscanf:  Returned 1, result  0, consumed 2
fscanf:  Returned 1, result  0, consumed 2
strtoul:             result  0, consumed 0
*/

/* glibc-2.8

sscanf:  Returned 1, result  0, consumed 3
fscanf:  Returned 1, result  0, consumed 3
strtoul:             result  0, consumed 2

sscanf:  Returned 1, result  0, consumed 3
fscanf:  Returned 1, result  0, consumed 3
strtoul:             result  0, consumed 2
*/

/* Microsoft MSVC
sscanf:  Returned 0, result -1, consumed -1
fscanf:  Returned 0, result -1, consumed -1
strtoul:             result  0, consumed 0

sscanf:  Returned 0, result  0, consumed -1
fscanf:  Returned 0, result  0, consumed -1
strtoul:             result  0, consumed 0
*/

/* IBM AIX
sscanf:  Returned 0, result -1, consumed -1
fscanf:  Returned 0, result -1, consumed -1
strtoul:             result  0, consumed 2

sscanf:  Returned 0, result  0, consumed -1
fscanf:  Returned 0, result  0, consumed -1
strtoul:             result  0, consumed 2
*/
Every good solution is obvious once you've found it.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Solar »

Note that my question on StackOverflow regarding "what is considered standard-compliant behaviour" now has a +150 reputation bonus attached to it...
Every good solution is obvious once you've found it.
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Kevin »

Most answers there are pretty useless as they are based on a feeling what could be right rather than on the standard.

However, there is one really good pointer in there, and this is the reference to the Red Hat bugzilla. Parsing "100ergs of energy" as a float is pretty much the same as parsing "0xz" as a hex number. Example 3 in 7.19.6.2 says that this results in a return value of 0, which supports the theory of a matching error.
Developer of tyndur - community OS of Lowlevel (German)
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Combuster »

Which leads to the observation that you need at least three characters of pushback to make fscanf work with things like "100e+f(x)" (which reads e, +, f)

"work" with disputable meaning, of course :wink:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Solar »

I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading. Hell, even something as trivial as "-hello" would render the input string in undefined state when parsed with %i: You read "-", which is quite valid, you read "h" which isn't, and you can only push back the "h" but don't have any way of properly reporting you consumed the "-"...
Every good solution is obvious once you've found it.
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Funny (NOT!) piece of C strangeness.

Post by jal »

Solar wrote:I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading.
It does, but then again, scanf was meant to parse whitespace-delimited input, not any arbitrary input string.


JAL
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Solar »

Whether whitespace-delimited or not. You cannot "safely" scan anything prefixed, be it with a sign (+, -) or "0x", because once you get to the sign / "x", you cannot read on for fear of not being able to put the sign / "x" back.

So scanf would be unable to scan back the output of %+d or %#x ??
Every good solution is obvious once you've found it.
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness.

Post by Kevin »

Solar wrote:I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading.
Well, I thought we discuss what the standard says, not how to improve it? If I were to define the semantics of scanf, I think I would define the FreeBSD way as the correct one. However, unfortunately we already have a standard and my opinion doesn't matter.
Combuster wrote:Which leads to the observation that you need at least three characters of pushback to make fscanf work with things like "100e+f(x)" (which reads e, +, f)

"work" with disputable meaning, of course :wink:
Why need? It works with pushing back only one character - for possibly even more disputable values of "work", but I think these are the values the standard uses. It doesn't say anything about pushing back characters on a matching failure, but it says "The first character, if any, after the input item remains unread." I read this as "z" being the next character to be read in our "0xz" example, or "f" for "100e+f(x)".
Developer of tyndur - community OS of Lowlevel (German)
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Funny (NOT!) piece of C strangeness.

Post by jal »

Solar wrote:Whether whitespace-delimited or not. You cannot "safely" scan anything prefixed, be it with a sign (+, -) or "0x", because once you get to the sign / "x", you cannot read on for fear of not being able to put the sign / "x" back.
Absolutely true, but scanf was never meant to read arbitrary input, I recon, just well-formed input strings (and if not welformed, returning an error is enough).


JAL
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Funny (NOT!) piece of C strangeness. [SOLVED]

Post by Solar »

Communication with Fred J. Tydeman, Vice-char of PL22.11 (ANSI "C"), on comp.std.c shed some light on this:

fscanf
An input item is defined as the longest sequence of input characters [...] which is, or is a prefix of, a matching input sequence. (7.19.6.2 P9)
This makes "0x" the longest sequence that is a prefix of a matching input sequence. (Even with `%i` conversion, as the hex "0x" is a longer sequence than the decimal "0".)
The first character, if any, after the input item remains unread. (7.19.6.2 P9)
This makes `fscanf` read the "z", and put it back as not-matching (honoring the one-character pushback limit of footnote 251)).
If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. (7.19.6.2 P10)
This makes "0x" fail to match, i.e. `fscanf` should assign no value, return zero (if the %x or %i was the first conv. specifier), and leave "z" as the first unread character in the input stream.

strtol

The definition of `strtol` (and `strtoul`) differs in one crucial point:
The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form. (7.20.1.4 P4, emphasis mine)
Which means that strtol should look for the longest valid sequence, in this case the "0". It should point endptr to the "x", and return zero as result.
Every good solution is obvious once you've found it.
Post Reply