Reading strings in-between markers

Programming, for all ages and all languages.
Post Reply
Guest

Reading strings in-between markers

Post by Guest »

Hi.

If you want to read a string in-between two markers into a buffer, ie:

[MARKER]this is the string i want[MARKER]

Is it possible to do this (in C) with a library function - sscanf() doesn't seem to work so well on strings such as this - or would I have to use pointers?

Thanks.
Tim

Re:Reading strings in-between markers

Post by Tim »

I can't think of anything off-hand which will do this in one line of code.

I would do this:
1. Start in state 0 (int state = 0)
2. Scan through the string character by character
3. If in state 0:
a. If you see a [, enter state 1
b. Else, add the character to a buffer, or do whatever you want with it
4. If in state 1:
a. If you see a ], enter state 0
b. Else, ignore the character

This is a very simple state machine. The machine is in state 0 when it's looking in between two markers, or outside a marker (you can't tell the difference unless you differentiate between "open markers" and "close markers", like HTML/XML). The machine is in state 1 when it's looking inside a marker.
Schol-R-LEA

Re:Reading strings in-between markers

Post by Schol-R-LEA »

You may be able to use strtok() to parse the string. For example, to parse a line from semicolon delimited database, you could write something like:

Code: Select all

#include <string.h>
// ...

char *db_record, *curr, *next;
char *field[BUFSIZE];

// ...

curr = db_record;
next =  strtok(db_record, ";");

 while (NULL != next)
  {
         strncpy(field, curr, ((next - curr) * sizeof(char))); 
         // do whatever you're doing to the data in field[]
         // ...
         curr = next;
         next = strtok(NULL, ";");
  }
(This code has not been tested, and likely contains errors, but it should correctly reflect how it works. Comments and corrections welcome.)

The strtok() function keeps it's last result in a static pointer, and if the str1 argument is NULL, it uses that so as to continue parsing the string.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Reading strings in-between markers

Post by Pype.Clicker »

i would strongly discourage the use of "strtok". It tends to behave in a very strange way and becomes completely useless as soon as you enter multithreaded programming.

Instead, i would advocate for "sscanf" use (very powerful when used accurately).

Code: Select all

if (sscanf(buffer,"[MARKER]%a[^[][MARKER] ",&target)==1) // you have your string in target.
will do what you want provided that you have no nested markers and that you know what the marker is a priori.
ark

Re:Reading strings in-between markers

Post by ark »

I'm not aware of strtok having tendencies toward strange behavior, although I believe the second argument to strtok is a delimiter set, not a delimiter string, so passing in something like "[MARKER]" would stop at every "[", "M", "A", "R", "K", "E", and "]". And I don't think the delimiters are returned so a string such as:

char str[] = "[MARKER]A Man Walked Off A Sidewalk[MARKER]" would tokenize with strtok(str, "[MARKER]") as:

Code: Select all

" "
"an Walked Off "
" Sidewalk"
The results are strange, but I wouldn't describe that as strange behavior. It's behaving exactly as it is defined. It would actually be better to call it in a manner that would sort of automate what Tim was talking about:

strtok(str, "[");

followed by

strtok(NULL, "]");

and so on.

A well-written version of the standard library should work fine with multi-threading, as well. Much more dangerous is using strtok in a loop that calls a function that uses strtok.
sonneveld

Re:Reading strings in-between markers

Post by sonneveld »

I thought the problem was that strtok stored an internal pointer to the string to keep track of the tokens. If you have multiple threads that call strtok, then the internal pointer will be changed by the multiple threads.

The GNU library among others supports a strtok_r function with a double pointer to store extra data.

Joel's right tho. It wouldn't split the text based on tokens.

- Nick
ark

Re:Reading strings in-between markers

Post by ark »

Most compilers that support multi-threading I think come with a thread-safe version of the standard library, and that version should implement strtok in a thread-safe manner. Microsoft's multi-threaded library is supposed to make strtok thread-safe, at least.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Reading strings in-between markers

Post by Pype.Clicker »

another weirdness from strtok comes from the fact it manipulates the input string and puts '\0' everywhere it feels so (still keeping a copy of the removed character), which can be veeery confusing from times to times.

Quoting "man strtok" :
BUGS
Never use these functions. If you do, note that:

These functions modify their first argument.

These functions cannot be used on constant strings.

The identity of the delimiting character is lost.

The strtok() function uses a static buffer while
parsing, so it's not thread safe. Use strtok_r() if
this matters to you.
ark

Re:Reading strings in-between markers

Post by ark »

Again, strtok is defined to put '\0' wherever it finds a delimiter, because it returns a pointer to the token and not a copied version of it.

I'm not sure what the actual standard says about strtok. This is from MSDN:
Warning Each of these functions uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these function from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.
Brian_Provinciano

Re:Reading strings in-between markers

Post by Brian_Provinciano »

Manual char scanning in INCREDIBLY fast, and would do the trick best, with no need for external routines. You wouldn't scan for the strings, but simply scan through for '[', then if found, handle the "MARKER]" and whatnot.

eg.

char *pstr=<stringbuffer>;
while(*pstr) {
if(*pstr++=='[') {
// check for marker
}
}

That will do it in no time. For more complex things, such as if you have many different '[' type of cases, you could use a switch statement with *pstr++. For the most complex stuff, you can use a 256 entry (for ASCII, unicode's larger), jump (function) table with an entry for each character. It will get through it in no time!
sonneveld

Re:Reading strings in-between markers

Post by sonneveld »

I think some compilers will optimise switch statements to a jump table if they're dense enough. I noticed a few switch statements used this in sierra's agi interpreter. (in code that was obviously written with a compiler)

- Nick
Post Reply