Regular Expressions

Kon-Tiki · Post by **Kon-Tiki** » Thu Jan 26, 2006 3:11 am

I'm a bit stumped at this. I've made a test-page for testing regexes, but it's giving weird results. This's my code (without the body, HTML, PHP, etc tags):

Code: Select all

  $counter = 0;
  $string = "Best niet [b]bevet [b]Moddervet[/b] Niet vet";
  echo $string, "<br>";
  ereg("\[b\]", $string, $resultaat);
  echo var_dump($resultaat), "<br><br>";
  foreach ($resultaat as $value) {
    $counter++;
???echo $counter, "<br>";
???echo $value, "<br>";
  }

The result of that's this:

Code: Select all

Best niet [b]bevet [b]Moddervet[/b] Niet vet
array(1) { [0]=> string(3) "[b]" }

1
[b]

I don't see why it's only returning one value instead of two.

Edit: Oops, got it. Had to put the regex between brackets, apparently.

Kon-Tiki · Post by **Kon-Tiki** » Thu Jan 26, 2006 3:35 am

New problem. Got this as code:

Code: Select all

  $string = "Best[/b] niet [b]bev[/b] et [b]Moddervet[/b] Niet vet";
  echo $string, "<br>";

  eregi("(\[b\])", $string, $resultaat);
  $open_tags = count($resultaat);

  eregi("(\[/b\])", $string, $resultaat_closed);
  $closed_tags = count($resultaat_closed);

  echo "Open: ", $open_tags, "<br>Closed: ", $closed_tags;

That results in:

Code: Select all

Best[/b] niet [b]bev[/b] et [b]Moddervet[/b] Niet vet
Open: 2
Closed: 2

Same goes for only one tag (either closing or opening, both do that), but when there're no closing-tags, it'll say so. Anybody knows why it does this?

Candy · Post by **Candy** » Wed Feb 01, 2006 1:19 am

Did you try substitute regexps?

s/\[b\](.*)\[\/b\]/\<b\>(1)\<\/b\>/g

Something like that should work. Not sure on how to avoid it skipping [ /b ] tags though, it might go for the longest match and only substitute that one. Also, it might plain ignore loose [ b ] or [ /b ] at the end or begin resp.

[edit] explaining how to use ubb tags on a forum that uses ubb tags is something you need to wake up for. [/edit]

Kon-Tiki · Post by **Kon-Tiki** » Wed Feb 01, 2006 1:54 am

I'll test it later today. Will have time enough anyways (end exercise's up til next tuesday... and we'll be done by noon if we work hard, otherwise definitely by the end of the day)

Solar · Post by **Solar** » Wed Feb 01, 2006 2:56 am

Perl regexp's know the "non-greedy operator", which is '?'.

Taking from a related discussion on a different forum, the regexp below is more efficient, though:

Code: Select all

\[foo\]((?:[^\[]*+(?:(?!\[/?foo\]).)?)*+)\[/foo\]

Candy · Post by **Candy** » Wed Feb 01, 2006 4:16 am

Solar wrote: Perl regexp's know the "non-greedy operator", which is '?'.

Taking from a related discussion on a different forum, the regexp below is more efficient, though:
Code: Select all
\[foo\]((?:[^\[]*+(?:(?!\[/?foo\]).)?)*+)\[/foo\]

...

can you explain that one? I'm puzzled.

Solar · Post by **Solar** » Wed Feb 01, 2006 6:19 am

I can try, but I was only spectator in that thread...

Code: Select all

\[foo\]((?:[^\[]*+(?:(?!\[/?foo\]).)?)*+)\[/foo\]

\[foo\](...)\[/foo\] should be clear - find something enclosed in BB tag "foo".

(?:regex) allows grouping a regex without actually capturing anything or creating backreferences.

(?!regex) is a negative lookahead.

x? means that x is optional.

[^\[] is any character that's not an opening bracket ([).

[^\[]*+ is a minimal (+) series of characters (*) that are not an opening bracket.

(?!\[/?foo\]) means a negative lookahead for either [foo] or [/foo].

(?:(?!\[/?foo\]).)? is an optional character that is not the opening bracket of [foo] or [/foo].

Hmm... at this point I start thinking that the two *+ in the term are somewhat redundant... you could probably do without the [^\[]*+... but then again, I am not a specialist on regexp's.

Solar · Post by **Solar** » Wed Feb 01, 2006 6:28 am

A nice link I found: http://www.regular-expressions.info/

The reference parts are very nice.

Pype.Clicker · Post by **Pype.Clicker** » Wed Feb 01, 2006 6:34 am

if i remember my courses correctly, the main limitation of regular expressions is that they cannot count. As soon as your "automaton" needs something like a stack (e.g. to compare previously seen stuff with currently seen stuff), then you can no longer implement it via a regular expression.

In other words, trying to get a <foo> ... </foo> will work as soon as you don't have occurences of </foo> inside of ..., but you can't expect regular expressions to express something like "catch whatever is between <foo> and the matching </foo> knowing that if a new <foo> is encountered, another </foo> should be skipped".

Yet, i, too, was spectator here so i might be terribly out-topic.

OSDev.org

Regular Expressions

Regular Expressions

Re:Regular Expressions

Re:Regular Expressions

Re:Regular Expressions

Re:Regular Expressions

Re:Regular Expressions

Re:Regular Expressions

Re:Regular Expressions

Re:Regular Expressions