Matching failure means no value is assigned.Kevin wrote:Solar wrote:(Edit: Not sure with the value, -1 could be right as well)
Funny (NOT!) piece of C strangeness.
Re: Funny (NOT!) piece of C strangeness.
Every good solution is obvious once you've found it.
Re: Funny (NOT!) piece of C strangeness.
Hm, I still haven't found the place where this is defined.Solar wrote:Matching failure means no value is assigned.Kevin wrote:(Edit: Not sure with the value, -1 could be right as well)
But otherwise you agree? So 1 of 2 for MSVC (assuming that you are right), glibc and BSD?
Re: Funny (NOT!) piece of C strangeness.
Basically, whether it is allowable to consume non-matching input is the core of the whole question. If it were clearly stated in the standard, we wouldn't have this discussion...
I'd say no, it is not allowable (as it leaves the input in an undefined state).
But because all we can do here is mere guesswork, I mailed a former co-worker of mine who's sitting in the C++ standards committee. Some of the people there also sit in the C committee. I might get an authoritative response from there.
I'd say no, it is not allowable (as it leaves the input in an undefined state).
But because all we can do here is mere guesswork, I mailed a former co-worker of mine who's sitting in the C++ standards committee. Some of the people there also sit in the C committee. I might get an authoritative response from there.
Every good solution is obvious once you've found it.
Re: Funny (NOT!) piece of C strangeness.
If it is clearly stated that fscanf can push back at most one character, and if a prefix can be more than one character (which it can), we can be certain that consuming non-matching input is allowable.Solar wrote:Basically, whether it is allowable to consume non-matching input is the core of the whole question. If it were clearly stated in the standard, we wouldn't have this discussion...
JAL
Re: Funny (NOT!) piece of C strangeness.
Can we? Or does it mean that "0x" has to be considered matching?
I wouldn't really bother if it weren't for my work on PDCLib being currently stuck at precisely this point. I simply don't want to work on assumptions.
PS: During the course of this discussion, I came up with a couple more border cases, which I will summarize in an extended test program. Will post that in a couple of hours.
I wouldn't really bother if it weren't for my work on PDCLib being currently stuck at precisely this point. I simply don't want to work on assumptions.
PS: During the course of this discussion, I came up with a couple more border cases, which I will summarize in an extended test program. Will post that in a couple of hours.
Every good solution is obvious once you've found it.
Re: Funny (NOT!) piece of C strangeness.
OK, updated test program including results.
Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main()
{
int i, count, rc;
unsigned u;
char * endptr = NULL;
char culprit[] = "+0xz";
/* File I/O to assert fscanf == sscanf */
FILE * fh = fopen( "testfile", "w+" );
fprintf( fh, "%s", culprit );
rewind( fh );
/* sscanf base 16 */
u = -1; count = -1;
rc = sscanf( culprit, "%x%n", &u, &count );
printf( "sscanf: Returned %d, result %2d, consumed %d\n", rc, u, count );
/* fscanf base 16 */
u = -1; count = -1;
rc = fscanf( fh, "%x%n", &u, &count );
printf( "fscanf: Returned %d, result %2d, consumed %d\n", rc, u, count );
rewind( fh );
/* strtoul base 16 */
u = strtoul( culprit, &endptr, 16 );
printf( "strtoul: result %2d, consumed %d\n", u, endptr - culprit );
puts( "" );
/* sscanf base 0 */
i = -1; count = -1;
rc = sscanf( culprit, "%i%n", &i, &count );
printf( "sscanf: Returned %d, result %2d, consumed %d\n", rc, i, count );
/* fscanf base 0 */
i = -1; count = -1;
rc = fscanf( fh, "%i%n", &i, &count );
printf( "fscanf: Returned %d, result %2d, consumed %d\n", rc, i, count );
rewind( fh );
/* strtol base 0 */
i = strtol( culprit, &endptr, 0 );
printf( "strtoul: result %2d, consumed %d\n", i, endptr - culprit );
fclose( fh );
return 0;
}
/* newlib 1.14
sscanf: Returned 1, result 0, consumed 2
fscanf: Returned 1, result 0, consumed 2
strtoul: result 0, consumed 0
sscanf: Returned 1, result 0, consumed 2
fscanf: Returned 1, result 0, consumed 2
strtoul: result 0, consumed 0
*/
/* glibc-2.8
sscanf: Returned 1, result 0, consumed 3
fscanf: Returned 1, result 0, consumed 3
strtoul: result 0, consumed 2
sscanf: Returned 1, result 0, consumed 3
fscanf: Returned 1, result 0, consumed 3
strtoul: result 0, consumed 2
*/
/* Microsoft MSVC
sscanf: Returned 0, result -1, consumed -1
fscanf: Returned 0, result -1, consumed -1
strtoul: result 0, consumed 0
sscanf: Returned 0, result 0, consumed -1
fscanf: Returned 0, result 0, consumed -1
strtoul: result 0, consumed 0
*/
/* IBM AIX
sscanf: Returned 0, result -1, consumed -1
fscanf: Returned 0, result -1, consumed -1
strtoul: result 0, consumed 2
sscanf: Returned 0, result 0, consumed -1
fscanf: Returned 0, result 0, consumed -1
strtoul: result 0, consumed 2
*/
Every good solution is obvious once you've found it.
Re: Funny (NOT!) piece of C strangeness.
Note that my question on StackOverflow regarding "what is considered standard-compliant behaviour" now has a +150 reputation bonus attached to it...
Every good solution is obvious once you've found it.
Re: Funny (NOT!) piece of C strangeness.
Most answers there are pretty useless as they are based on a feeling what could be right rather than on the standard.
However, there is one really good pointer in there, and this is the reference to the Red Hat bugzilla. Parsing "100ergs of energy" as a float is pretty much the same as parsing "0xz" as a hex number. Example 3 in 7.19.6.2 says that this results in a return value of 0, which supports the theory of a matching error.
However, there is one really good pointer in there, and this is the reference to the Red Hat bugzilla. Parsing "100ergs of energy" as a float is pretty much the same as parsing "0xz" as a hex number. Example 3 in 7.19.6.2 says that this results in a return value of 0, which supports the theory of a matching error.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Funny (NOT!) piece of C strangeness.
Which leads to the observation that you need at least three characters of pushback to make fscanf work with things like "100e+f(x)" (which reads e, +, f)
"work" with disputable meaning, of course
"work" with disputable meaning, of course
Re: Funny (NOT!) piece of C strangeness.
I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading. Hell, even something as trivial as "-hello" would render the input string in undefined state when parsed with %i: You read "-", which is quite valid, you read "h" which isn't, and you can only push back the "h" but don't have any way of properly reporting you consumed the "-"...
Every good solution is obvious once you've found it.
Re: Funny (NOT!) piece of C strangeness.
It does, but then again, scanf was meant to parse whitespace-delimited input, not any arbitrary input string.Solar wrote:I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading.
JAL
Re: Funny (NOT!) piece of C strangeness.
Whether whitespace-delimited or not. You cannot "safely" scan anything prefixed, be it with a sign (+, -) or "0x", because once you get to the sign / "x", you cannot read on for fear of not being able to put the sign / "x" back.
So scanf would be unable to scan back the output of %+d or %#x ??
So scanf would be unable to scan back the output of %+d or %#x ??
Every good solution is obvious once you've found it.
Re: Funny (NOT!) piece of C strangeness.
Well, I thought we discuss what the standard says, not how to improve it? If I were to define the semantics of scanf, I think I would define the FreeBSD way as the correct one. However, unfortunately we already have a standard and my opinion doesn't matter.Solar wrote:I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading.
Why need? It works with pushing back only one character - for possibly even more disputable values of "work", but I think these are the values the standard uses. It doesn't say anything about pushing back characters on a matching failure, but it says "The first character, if any, after the input item remains unread." I read this as "z" being the next character to be read in our "0xz" example, or "f" for "100e+f(x)".Combuster wrote:Which leads to the observation that you need at least three characters of pushback to make fscanf work with things like "100e+f(x)" (which reads e, +, f)
"work" with disputable meaning, of course
Re: Funny (NOT!) piece of C strangeness.
Absolutely true, but scanf was never meant to read arbitrary input, I recon, just well-formed input strings (and if not welformed, returning an error is enough).Solar wrote:Whether whitespace-delimited or not. You cannot "safely" scan anything prefixed, be it with a sign (+, -) or "0x", because once you get to the sign / "x", you cannot read on for fear of not being able to put the sign / "x" back.
JAL
Re: Funny (NOT!) piece of C strangeness. [SOLVED]
Communication with Fred J. Tydeman, Vice-char of PL22.11 (ANSI "C"), on comp.std.c shed some light on this:
fscanf
strtol
The definition of `strtol` (and `strtoul`) differs in one crucial point:
fscanf
This makes "0x" the longest sequence that is a prefix of a matching input sequence. (Even with `%i` conversion, as the hex "0x" is a longer sequence than the decimal "0".)An input item is defined as the longest sequence of input characters [...] which is, or is a prefix of, a matching input sequence. (7.19.6.2 P9)
This makes `fscanf` read the "z", and put it back as not-matching (honoring the one-character pushback limit of footnote 251)).The first character, if any, after the input item remains unread. (7.19.6.2 P9)
This makes "0x" fail to match, i.e. `fscanf` should assign no value, return zero (if the %x or %i was the first conv. specifier), and leave "z" as the first unread character in the input stream.If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. (7.19.6.2 P10)
strtol
The definition of `strtol` (and `strtoul`) differs in one crucial point:
Which means that strtol should look for the longest valid sequence, in this case the "0". It should point endptr to the "x", and return zero as result.The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form. (7.20.1.4 P4, emphasis mine)
Every good solution is obvious once you've found it.