Regular Expressions

Programming, for all ages and all languages.
Post Reply
Kon-Tiki

Regular Expressions

Post by Kon-Tiki »

I'm a bit stumped at this. I've made a test-page for testing regexes, but it's giving weird results. This's my code (without the body, HTML, PHP, etc tags):

Code: Select all

  $counter = 0;
  $string = "Best niet [b]bevet [b]Moddervet[/b] Niet vet";
  echo $string, "<br>";
  ereg("\[b\]", $string, $resultaat);
  echo var_dump($resultaat), "<br><br>";
  foreach ($resultaat as $value) {
    $counter++;
???echo $counter, "<br>";
???echo $value, "<br>";
  }
The result of that's this:

Code: Select all

Best niet [b]bevet [b]Moddervet[/b] Niet vet
array(1) { [0]=> string(3) "[b]" }

1
[b]
I don't see why it's only returning one value instead of two.

Edit: Oops, got it. Had to put the regex between brackets, apparently.
Kon-Tiki

Re:Regular Expressions

Post by Kon-Tiki »

New problem. Got this as code:

Code: Select all

  $string = "Best[/b] niet [b]bev[/b] et [b]Moddervet[/b] Niet vet";
  echo $string, "<br>";

  eregi("(\[b\])", $string, $resultaat);
  $open_tags = count($resultaat);

  eregi("(\[/b\])", $string, $resultaat_closed);
  $closed_tags = count($resultaat_closed);

  echo "Open: ", $open_tags, "<br>Closed: ", $closed_tags;
That results in:

Code: Select all

Best[/b] niet [b]bev[/b] et [b]Moddervet[/b] Niet vet
Open: 2
Closed: 2
Same goes for only one tag (either closing or opening, both do that), but when there're no closing-tags, it'll say so. Anybody knows why it does this?
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Regular Expressions

Post by Candy »

Did you try substitute regexps?

s/\[b\](.*)\[\/b\]/\<b\>(1)\<\/b\>/g

Something like that should work. Not sure on how to avoid it skipping [ /b ] tags though, it might go for the longest match and only substitute that one. Also, it might plain ignore loose [ b ] or [ /b ] at the end or begin resp.

[edit] explaining how to use ubb tags on a forum that uses ubb tags is something you need to wake up for. [/edit]
Kon-Tiki

Re:Regular Expressions

Post by Kon-Tiki »

I'll test it later today. Will have time enough anyways (end exercise's up til next tuesday... and we'll be done by noon if we work hard, otherwise definitely by the end of the day)
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Regular Expressions

Post by Solar »

Perl regexp's know the "non-greedy operator", which is '?'.

Taking from a related discussion on a different forum, the regexp below is more efficient, though:

Code: Select all

\[foo\]((?:[^\[]*+(?:(?!\[/?foo\]).)?)*+)\[/foo\]
Every good solution is obvious once you've found it.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Regular Expressions

Post by Candy »

Solar wrote: Perl regexp's know the "non-greedy operator", which is '?'.

Taking from a related discussion on a different forum, the regexp below is more efficient, though:

Code: Select all

\[foo\]((?:[^\[]*+(?:(?!\[/?foo\]).)?)*+)\[/foo\]
...
:o

can you explain that one? I'm puzzled.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Regular Expressions

Post by Solar »

I can try, but I was only spectator in that thread... :D

Code: Select all

\[foo\]((?:[^\[]*+(?:(?!\[/?foo\]).)?)*+)\[/foo\]
\[foo\](...)\[/foo\] should be clear - find something enclosed in BB tag "foo".

(?:regex) allows grouping a regex without actually capturing anything or creating backreferences.

(?!regex) is a negative lookahead.

x? means that x is optional.

[^\[] is any character that's not an opening bracket ([).

[^\[]*+ is a minimal (+) series of characters (*) that are not an opening bracket.

(?!\[/?foo\]) means a negative lookahead for either [foo] or [/foo].

(?:(?!\[/?foo\]).)? is an optional character that is not the opening bracket of [foo] or [/foo].

Hmm... at this point I start thinking that the two *+ in the term are somewhat redundant... you could probably do without the [^\[]*+... but then again, I am not a specialist on regexp's.
Every good solution is obvious once you've found it.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Regular Expressions

Post by Solar »

A nice link I found: http://www.regular-expressions.info/

The reference parts are very nice.
Every good solution is obvious once you've found it.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Regular Expressions

Post by Pype.Clicker »

if i remember my courses correctly, the main limitation of regular expressions is that they cannot count. As soon as your "automaton" needs something like a stack (e.g. to compare previously seen stuff with currently seen stuff), then you can no longer implement it via a regular expression.

In other words, trying to get a <foo> ... </foo> will work as soon as you don't have occurences of </foo> inside of ..., but you can't expect regular expressions to express something like "catch whatever is between <foo> and the matching </foo> knowing that if a new <foo> is encountered, another </foo> should be skipped".

Yet, i, too, was spectator here so i might be terribly out-topic.
Post Reply