Page 1 of 1

Reading strings in-between markers

Posted: Wed Oct 29, 2003 6:14 pm
by Guest
Hi.

If you want to read a string in-between two markers into a buffer, ie:

[MARKER]this is the string i want[MARKER]

Is it possible to do this (in C) with a library function - sscanf() doesn't seem to work so well on strings such as this - or would I have to use pointers?

Thanks.

Re:Reading strings in-between markers

Posted: Wed Oct 29, 2003 6:34 pm
by Tim
I can't think of anything off-hand which will do this in one line of code.

I would do this:
1. Start in state 0 (int state = 0)
2. Scan through the string character by character
3. If in state 0:
a. If you see a [, enter state 1
b. Else, add the character to a buffer, or do whatever you want with it
4. If in state 1:
a. If you see a ], enter state 0
b. Else, ignore the character

This is a very simple state machine. The machine is in state 0 when it's looking in between two markers, or outside a marker (you can't tell the difference unless you differentiate between "open markers" and "close markers", like HTML/XML). The machine is in state 1 when it's looking inside a marker.

Re:Reading strings in-between markers

Posted: Wed Oct 29, 2003 6:47 pm
by Schol-R-LEA
You may be able to use strtok() to parse the string. For example, to parse a line from semicolon delimited database, you could write something like:

Code: Select all

#include <string.h>
// ...

char *db_record, *curr, *next;
char *field[BUFSIZE];

// ...

curr = db_record;
next =  strtok(db_record, ";");

 while (NULL != next)
  {
         strncpy(field, curr, ((next - curr) * sizeof(char))); 
         // do whatever you're doing to the data in field[]
         // ...
         curr = next;
         next = strtok(NULL, ";");
  }
(This code has not been tested, and likely contains errors, but it should correctly reflect how it works. Comments and corrections welcome.)

The strtok() function keeps it's last result in a static pointer, and if the str1 argument is NULL, it uses that so as to continue parsing the string.

Re:Reading strings in-between markers

Posted: Mon Nov 03, 2003 12:30 pm
by Pype.Clicker
i would strongly discourage the use of "strtok". It tends to behave in a very strange way and becomes completely useless as soon as you enter multithreaded programming.

Instead, i would advocate for "sscanf" use (very powerful when used accurately).

Code: Select all

if (sscanf(buffer,"[MARKER]%a[^[][MARKER] ",&target)==1) // you have your string in target.
will do what you want provided that you have no nested markers and that you know what the marker is a priori.

Re:Reading strings in-between markers

Posted: Mon Nov 03, 2003 7:07 pm
by ark
I'm not aware of strtok having tendencies toward strange behavior, although I believe the second argument to strtok is a delimiter set, not a delimiter string, so passing in something like "[MARKER]" would stop at every "[", "M", "A", "R", "K", "E", and "]". And I don't think the delimiters are returned so a string such as:

char str[] = "[MARKER]A Man Walked Off A Sidewalk[MARKER]" would tokenize with strtok(str, "[MARKER]") as:

Code: Select all

" "
"an Walked Off "
" Sidewalk"
The results are strange, but I wouldn't describe that as strange behavior. It's behaving exactly as it is defined. It would actually be better to call it in a manner that would sort of automate what Tim was talking about:

strtok(str, "[");

followed by

strtok(NULL, "]");

and so on.

A well-written version of the standard library should work fine with multi-threading, as well. Much more dangerous is using strtok in a loop that calls a function that uses strtok.

Re:Reading strings in-between markers

Posted: Mon Nov 03, 2003 7:50 pm
by sonneveld
I thought the problem was that strtok stored an internal pointer to the string to keep track of the tokens. If you have multiple threads that call strtok, then the internal pointer will be changed by the multiple threads.

The GNU library among others supports a strtok_r function with a double pointer to store extra data.

Joel's right tho. It wouldn't split the text based on tokens.

- Nick

Re:Reading strings in-between markers

Posted: Tue Nov 04, 2003 6:23 am
by ark
Most compilers that support multi-threading I think come with a thread-safe version of the standard library, and that version should implement strtok in a thread-safe manner. Microsoft's multi-threaded library is supposed to make strtok thread-safe, at least.

Re:Reading strings in-between markers

Posted: Tue Nov 18, 2003 3:40 am
by Pype.Clicker
another weirdness from strtok comes from the fact it manipulates the input string and puts '\0' everywhere it feels so (still keeping a copy of the removed character), which can be veeery confusing from times to times.

Quoting "man strtok" :
BUGS
Never use these functions. If you do, note that:

These functions modify their first argument.

These functions cannot be used on constant strings.

The identity of the delimiting character is lost.

The strtok() function uses a static buffer while
parsing, so it's not thread safe. Use strtok_r() if
this matters to you.

Re:Reading strings in-between markers

Posted: Wed Nov 19, 2003 6:56 am
by ark
Again, strtok is defined to put '\0' wherever it finds a delimiter, because it returns a pointer to the token and not a copied version of it.

I'm not sure what the actual standard says about strtok. This is from MSDN:
Warning Each of these functions uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these function from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.

Re:Reading strings in-between markers

Posted: Wed Nov 19, 2003 7:18 am
by Brian_Provinciano
Manual char scanning in INCREDIBLY fast, and would do the trick best, with no need for external routines. You wouldn't scan for the strings, but simply scan through for '[', then if found, handle the "MARKER]" and whatnot.

eg.

char *pstr=<stringbuffer>;
while(*pstr) {
if(*pstr++=='[') {
// check for marker
}
}

That will do it in no time. For more complex things, such as if you have many different '[' type of cases, you could use a switch statement with *pstr++. For the most complex stuff, you can use a 256 entry (for ASCII, unicode's larger), jump (function) table with an entry for each character. It will get through it in no time!

Re:Reading strings in-between markers

Posted: Thu Nov 20, 2003 2:47 am
by sonneveld
I think some compilers will optimise switch statements to a jump table if they're dense enough. I noticed a few switch statements used this in sierra's agi interpreter. (in code that was obviously written with a compiler)

- Nick