Page 2 of 3

Re: Funny (NOT!) piece of C strangeness.

Posted: Wed Sep 16, 2009 3:54 am
by Solar
Kevin wrote:
Solar wrote:(Edit: Not sure with the value, -1 could be right as well)
Matching failure means no value is assigned.

Re: Funny (NOT!) piece of C strangeness.

Posted: Wed Sep 16, 2009 4:06 am
by Kevin
Solar wrote:
Kevin wrote:(Edit: Not sure with the value, -1 could be right as well)
Matching failure means no value is assigned.
Hm, I still haven't found the place where this is defined.

But otherwise you agree? So 1 of 2 for MSVC (assuming that you are right), glibc and BSD?

Re: Funny (NOT!) piece of C strangeness.

Posted: Wed Sep 16, 2009 5:38 am
by Solar
Basically, whether it is allowable to consume non-matching input is the core of the whole question. If it were clearly stated in the standard, we wouldn't have this discussion...

I'd say no, it is not allowable (as it leaves the input in an undefined state).

But because all we can do here is mere guesswork, I mailed a former co-worker of mine who's sitting in the C++ standards committee. Some of the people there also sit in the C committee. I might get an authoritative response from there.

Re: Funny (NOT!) piece of C strangeness.

Posted: Wed Sep 16, 2009 5:59 am
by jal
Solar wrote:Basically, whether it is allowable to consume non-matching input is the core of the whole question. If it were clearly stated in the standard, we wouldn't have this discussion...
If it is clearly stated that fscanf can push back at most one character, and if a prefix can be more than one character (which it can), we can be certain that consuming non-matching input is allowable.


JAL

Re: Funny (NOT!) piece of C strangeness.

Posted: Wed Sep 16, 2009 6:42 am
by Solar
Can we? Or does it mean that "0x" has to be considered matching?

I wouldn't really bother if it weren't for my work on PDCLib being currently stuck at precisely this point. I simply don't want to work on assumptions.

PS: During the course of this discussion, I came up with a couple more border cases, which I will summarize in an extended test program. Will post that in a couple of hours.

Re: Funny (NOT!) piece of C strangeness.

Posted: Wed Sep 16, 2009 7:38 am
by Solar
OK, updated test program including results.

Code: Select all

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

int main()
{
    int i, count, rc;
    unsigned u;
    char * endptr = NULL;
    char culprit[] = "+0xz";

    /* File I/O to assert fscanf == sscanf */
    FILE * fh = fopen( "testfile", "w+" );
    fprintf( fh, "%s", culprit );
    rewind( fh );

    /* sscanf base 16 */
    u = -1; count = -1;
    rc = sscanf( culprit, "%x%n", &u, &count );
    printf( "sscanf:  Returned %d, result %2d, consumed %d\n", rc, u, count );

    /* fscanf base 16 */
    u = -1; count = -1;
    rc = fscanf( fh, "%x%n", &u, &count );
    printf( "fscanf:  Returned %d, result %2d, consumed %d\n", rc, u, count );
    rewind( fh );

    /* strtoul base 16 */
    u = strtoul( culprit, &endptr, 16 );
    printf( "strtoul:             result %2d, consumed %d\n", u, endptr - culprit );

    puts( "" );

    /* sscanf base 0 */
    i = -1; count = -1;
    rc = sscanf( culprit, "%i%n", &i, &count );
    printf( "sscanf:  Returned %d, result %2d, consumed %d\n", rc, i, count );

    /* fscanf base 0 */
    i = -1; count = -1;
    rc = fscanf( fh, "%i%n", &i, &count );
    printf( "fscanf:  Returned %d, result %2d, consumed %d\n", rc, i, count );
    rewind( fh );

    /* strtol base 0 */
    i = strtol( culprit, &endptr, 0 );
    printf( "strtoul:             result %2d, consumed %d\n", i, endptr - culprit );

    fclose( fh );
    return 0;
}

/* newlib 1.14

sscanf:  Returned 1, result  0, consumed 2
fscanf:  Returned 1, result  0, consumed 2
strtoul:             result  0, consumed 0

sscanf:  Returned 1, result  0, consumed 2
fscanf:  Returned 1, result  0, consumed 2
strtoul:             result  0, consumed 0
*/

/* glibc-2.8

sscanf:  Returned 1, result  0, consumed 3
fscanf:  Returned 1, result  0, consumed 3
strtoul:             result  0, consumed 2

sscanf:  Returned 1, result  0, consumed 3
fscanf:  Returned 1, result  0, consumed 3
strtoul:             result  0, consumed 2
*/

/* Microsoft MSVC
sscanf:  Returned 0, result -1, consumed -1
fscanf:  Returned 0, result -1, consumed -1
strtoul:             result  0, consumed 0

sscanf:  Returned 0, result  0, consumed -1
fscanf:  Returned 0, result  0, consumed -1
strtoul:             result  0, consumed 0
*/

/* IBM AIX
sscanf:  Returned 0, result -1, consumed -1
fscanf:  Returned 0, result -1, consumed -1
strtoul:             result  0, consumed 2

sscanf:  Returned 0, result  0, consumed -1
fscanf:  Returned 0, result  0, consumed -1
strtoul:             result  0, consumed 2
*/

Re: Funny (NOT!) piece of C strangeness.

Posted: Thu Sep 17, 2009 3:35 am
by Solar
Note that my question on StackOverflow regarding "what is considered standard-compliant behaviour" now has a +150 reputation bonus attached to it...

Re: Funny (NOT!) piece of C strangeness.

Posted: Thu Sep 17, 2009 6:23 am
by Kevin
Most answers there are pretty useless as they are based on a feeling what could be right rather than on the standard.

However, there is one really good pointer in there, and this is the reference to the Red Hat bugzilla. Parsing "100ergs of energy" as a float is pretty much the same as parsing "0xz" as a hex number. Example 3 in 7.19.6.2 says that this results in a return value of 0, which supports the theory of a matching error.

Re: Funny (NOT!) piece of C strangeness.

Posted: Thu Sep 17, 2009 7:07 am
by Combuster
Which leads to the observation that you need at least three characters of pushback to make fscanf work with things like "100e+f(x)" (which reads e, +, f)

"work" with disputable meaning, of course :wink:

Re: Funny (NOT!) piece of C strangeness.

Posted: Thu Sep 17, 2009 7:59 am
by Solar
I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading. Hell, even something as trivial as "-hello" would render the input string in undefined state when parsed with %i: You read "-", which is quite valid, you read "h" which isn't, and you can only push back the "h" but don't have any way of properly reporting you consumed the "-"...

Re: Funny (NOT!) piece of C strangeness.

Posted: Thu Sep 17, 2009 8:55 am
by jal
Solar wrote:I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading.
It does, but then again, scanf was meant to parse whitespace-delimited input, not any arbitrary input string.


JAL

Re: Funny (NOT!) piece of C strangeness.

Posted: Thu Sep 17, 2009 9:15 am
by Solar
Whether whitespace-delimited or not. You cannot "safely" scan anything prefixed, be it with a sign (+, -) or "0x", because once you get to the sign / "x", you cannot read on for fear of not being able to put the sign / "x" back.

So scanf would be unable to scan back the output of %+d or %#x ??

Re: Funny (NOT!) piece of C strangeness.

Posted: Thu Sep 17, 2009 9:23 am
by Kevin
Solar wrote:I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading.
Well, I thought we discuss what the standard says, not how to improve it? If I were to define the semantics of scanf, I think I would define the FreeBSD way as the correct one. However, unfortunately we already have a standard and my opinion doesn't matter.
Combuster wrote:Which leads to the observation that you need at least three characters of pushback to make fscanf work with things like "100e+f(x)" (which reads e, +, f)

"work" with disputable meaning, of course :wink:
Why need? It works with pushing back only one character - for possibly even more disputable values of "work", but I think these are the values the standard uses. It doesn't say anything about pushing back characters on a matching failure, but it says "The first character, if any, after the input item remains unread." I read this as "z" being the next character to be read in our "0xz" example, or "f" for "100e+f(x)".

Re: Funny (NOT!) piece of C strangeness.

Posted: Thu Sep 17, 2009 1:18 pm
by jal
Solar wrote:Whether whitespace-delimited or not. You cannot "safely" scan anything prefixed, be it with a sign (+, -) or "0x", because once you get to the sign / "x", you cannot read on for fear of not being able to put the sign / "x" back.
Absolutely true, but scanf was never meant to read arbitrary input, I recon, just well-formed input strings (and if not welformed, returning an error is enough).


JAL

Re: Funny (NOT!) piece of C strangeness. [SOLVED]

Posted: Fri Sep 18, 2009 11:37 pm
by Solar
Communication with Fred J. Tydeman, Vice-char of PL22.11 (ANSI "C"), on comp.std.c shed some light on this:

fscanf
An input item is defined as the longest sequence of input characters [...] which is, or is a prefix of, a matching input sequence. (7.19.6.2 P9)
This makes "0x" the longest sequence that is a prefix of a matching input sequence. (Even with `%i` conversion, as the hex "0x" is a longer sequence than the decimal "0".)
The first character, if any, after the input item remains unread. (7.19.6.2 P9)
This makes `fscanf` read the "z", and put it back as not-matching (honoring the one-character pushback limit of footnote 251)).
If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. (7.19.6.2 P10)
This makes "0x" fail to match, i.e. `fscanf` should assign no value, return zero (if the %x or %i was the first conv. specifier), and leave "z" as the first unread character in the input stream.

strtol

The definition of `strtol` (and `strtoul`) differs in one crucial point:
The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form. (7.20.1.4 P4, emphasis mine)
Which means that strtol should look for the longest valid sequence, in this case the "0". It should point endptr to the "x", and return zero as result.