Re: Funny (NOT!) piece of C strangeness.
Posted: Wed Sep 16, 2009 3:54 am
Matching failure means no value is assigned.Kevin wrote:Solar wrote:(Edit: Not sure with the value, -1 could be right as well)
The Place to Start for Operating System Developers
https://f.osdev.org/
Matching failure means no value is assigned.Kevin wrote:Solar wrote:(Edit: Not sure with the value, -1 could be right as well)
Hm, I still haven't found the place where this is defined.Solar wrote:Matching failure means no value is assigned.Kevin wrote:(Edit: Not sure with the value, -1 could be right as well)
If it is clearly stated that fscanf can push back at most one character, and if a prefix can be more than one character (which it can), we can be certain that consuming non-matching input is allowable.Solar wrote:Basically, whether it is allowable to consume non-matching input is the core of the whole question. If it were clearly stated in the standard, we wouldn't have this discussion...
Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main()
{
int i, count, rc;
unsigned u;
char * endptr = NULL;
char culprit[] = "+0xz";
/* File I/O to assert fscanf == sscanf */
FILE * fh = fopen( "testfile", "w+" );
fprintf( fh, "%s", culprit );
rewind( fh );
/* sscanf base 16 */
u = -1; count = -1;
rc = sscanf( culprit, "%x%n", &u, &count );
printf( "sscanf: Returned %d, result %2d, consumed %d\n", rc, u, count );
/* fscanf base 16 */
u = -1; count = -1;
rc = fscanf( fh, "%x%n", &u, &count );
printf( "fscanf: Returned %d, result %2d, consumed %d\n", rc, u, count );
rewind( fh );
/* strtoul base 16 */
u = strtoul( culprit, &endptr, 16 );
printf( "strtoul: result %2d, consumed %d\n", u, endptr - culprit );
puts( "" );
/* sscanf base 0 */
i = -1; count = -1;
rc = sscanf( culprit, "%i%n", &i, &count );
printf( "sscanf: Returned %d, result %2d, consumed %d\n", rc, i, count );
/* fscanf base 0 */
i = -1; count = -1;
rc = fscanf( fh, "%i%n", &i, &count );
printf( "fscanf: Returned %d, result %2d, consumed %d\n", rc, i, count );
rewind( fh );
/* strtol base 0 */
i = strtol( culprit, &endptr, 0 );
printf( "strtoul: result %2d, consumed %d\n", i, endptr - culprit );
fclose( fh );
return 0;
}
/* newlib 1.14
sscanf: Returned 1, result 0, consumed 2
fscanf: Returned 1, result 0, consumed 2
strtoul: result 0, consumed 0
sscanf: Returned 1, result 0, consumed 2
fscanf: Returned 1, result 0, consumed 2
strtoul: result 0, consumed 0
*/
/* glibc-2.8
sscanf: Returned 1, result 0, consumed 3
fscanf: Returned 1, result 0, consumed 3
strtoul: result 0, consumed 2
sscanf: Returned 1, result 0, consumed 3
fscanf: Returned 1, result 0, consumed 3
strtoul: result 0, consumed 2
*/
/* Microsoft MSVC
sscanf: Returned 0, result -1, consumed -1
fscanf: Returned 0, result -1, consumed -1
strtoul: result 0, consumed 0
sscanf: Returned 0, result 0, consumed -1
fscanf: Returned 0, result 0, consumed -1
strtoul: result 0, consumed 0
*/
/* IBM AIX
sscanf: Returned 0, result -1, consumed -1
fscanf: Returned 0, result -1, consumed -1
strtoul: result 0, consumed 2
sscanf: Returned 0, result 0, consumed -1
fscanf: Returned 0, result 0, consumed -1
strtoul: result 0, consumed 2
*/
It does, but then again, scanf was meant to parse whitespace-delimited input, not any arbitrary input string.Solar wrote:I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading.
Well, I thought we discuss what the standard says, not how to improve it? If I were to define the semantics of scanf, I think I would define the FreeBSD way as the correct one. However, unfortunately we already have a standard and my opinion doesn't matter.Solar wrote:I, too, think that that fscanf() limitation, if taken literally, severely handicaps formatted reading.
Why need? It works with pushing back only one character - for possibly even more disputable values of "work", but I think these are the values the standard uses. It doesn't say anything about pushing back characters on a matching failure, but it says "The first character, if any, after the input item remains unread." I read this as "z" being the next character to be read in our "0xz" example, or "f" for "100e+f(x)".Combuster wrote:Which leads to the observation that you need at least three characters of pushback to make fscanf work with things like "100e+f(x)" (which reads e, +, f)
"work" with disputable meaning, of course
Absolutely true, but scanf was never meant to read arbitrary input, I recon, just well-formed input strings (and if not welformed, returning an error is enough).Solar wrote:Whether whitespace-delimited or not. You cannot "safely" scan anything prefixed, be it with a sign (+, -) or "0x", because once you get to the sign / "x", you cannot read on for fear of not being able to put the sign / "x" back.
This makes "0x" the longest sequence that is a prefix of a matching input sequence. (Even with `%i` conversion, as the hex "0x" is a longer sequence than the decimal "0".)An input item is defined as the longest sequence of input characters [...] which is, or is a prefix of, a matching input sequence. (7.19.6.2 P9)
This makes `fscanf` read the "z", and put it back as not-matching (honoring the one-character pushback limit of footnote 251)).The first character, if any, after the input item remains unread. (7.19.6.2 P9)
This makes "0x" fail to match, i.e. `fscanf` should assign no value, return zero (if the %x or %i was the first conv. specifier), and leave "z" as the first unread character in the input stream.If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. (7.19.6.2 P10)
Which means that strtol should look for the longest valid sequence, in this case the "0". It should point endptr to the "x", and return zero as result.The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form. (7.20.1.4 P4, emphasis mine)