Regular Expressions

Programming, for all ages and all languages.
Post Reply
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Regular Expressions

Post by Neo »

I want to match a text pattern that may or may not have a CR or LF present in the middle of the string. It is of the form

Code: Select all

<tag>.*<end of tag>
. The .* here does not match patterns with CR or LF (which may or may not be present). How can I do this?
Only Human
mystran

Re:Regular Expressions

Post by mystran »

eat characters until you can match the starting tag, then eat characters until you hit CR/LF or the ending tag.

For more information, read about finite state machines.
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:Regular Expressions

Post by Neo »

I was wondering how to do it with egrep not from a programming language.
Only Human
mystran

Re:Regular Expressions

Post by mystran »

ah, ok.. and it seems I misread your original problem, as you seem to want to allow line terminators..

If I am not mistaken, you cannot have grep do that for you, since grep seems to always process it's input as line-by-line (somebody correct me if some variant of grep allows multiline matches). Depending what you want to do, you might be able to use 'sed' though (or if everything else fails, perl if ofcourse your friend ;))

To match those patterns within one line, you can use simply

Code: Select all

egrep '<tag>.*</tag>' filename
or whatever...
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:Regular Expressions

Post by Neo »

I think I should have a look at sed. Never used it before but it seems to be one way of doing it.
Only Human
Post Reply