OSDev.org

Posted: **Thu Oct 14, 2004 11:27 am**

I want to know a single regular expression that can match these 2 words in a file "saxena" and "saksena" they have only the "ks" and "x" different in them i can use

Code: Select all

grep "sa[ksena|xena]" file.txt

but i thought as the "ena" part was also common there must be some other way to match them. also which is the best site to learn about regular expression?
Also how do we grep whitespace and strings which may be regular expressions in a file??

Posted: **Thu Oct 14, 2004 11:37 am**

rich_m wrote: I want to know a single regular expression that can match these 2 words in a file "saxena" and "saksena" they have only the "ks" and "x" different in them i can use
Code: Select all
grep "sa[ksena|xena]" file.txt 
but i thought as the "ena" part was also common there must be some other way to match them. also which is the best site to learn about regular expression?
Also how do we grep whitespace and strings which may be regular expressions in a file??

first one:

Code: Select all

 grep "sa(ks|x)ena" file

The brackets are for alphabets, you defined an alphabet with the characters aeknsx| in them. You match the sa with one of those.

The second one, you could try looking at the unix program strings, it is something similar. Short idea:

Code: Select all

[ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9]+

-> gives all legible pieces of 4 chars or more. Note that it also matches spaces, tabs and newlines (all types).

Posted: **Fri Oct 15, 2004 6:54 am**

i tried that on SunOS.
it didnt work. (the "sa(ks|x)ena part i mean)

Posted: **Fri Oct 15, 2004 12:10 pm**

it seems the special characters should be escaped:

Code: Select all

grep 'sa\(ks\|x\)ena' file

that ^ i tried on linux and it worked.

p.s. tried on Sun:
with grep, didn't work either way.
with egrep, worked without escapes:

Code: Select all

egrep 'sa(ks|x)ena' file

which is pro'lly what you need.

freaking inconsistency is what this is >:(

p.p.s. our university unix (Sun) environment is a mess. software is severely outdated, doesn't agree with manpages.. webspace is run by incompetent morons..

Posted: **Fri Oct 15, 2004 12:44 pm**

IIRC on Solaris, you should type 'sa$ks|x$ena' but I might remember wrong, and I'm too lazy to find a Solaris-box to test with.

Posted: **Fri Oct 15, 2004 9:52 pm**

does that work on Linux too?

Posted: **Fri Oct 15, 2004 10:19 pm**

on linux, if u use egrep, it works the same way, without escapes:

Code: Select all

egrep 'sa(ks|x)ena' file

man egrep:

Egrep is the same as grep -E.

that's what my Sun manpages say also, but grep refuses -E, prolly because it's so outdated.

bottom line:
egrep works as expected in all cases, so you probably want to use that.

Posted: **Fri Oct 15, 2004 10:55 pm**

yeah only the egrep one works the other doesn't in any other form other than the one i specifed at the beginning.

Posted: **Mon Oct 18, 2004 3:13 am**

Some more questions...
How do you delete lines containing whitespace from a file?
I tried using 'tr' in this way but only lines containing space get deleted (not the ones with tab spaces

Code: Select all

tr -d " \t"<file.txt

What is the right way?

also related to this i have another question
How do we count the lines having at least one space or tab?
the -v option again does not match TABS. Any idea why?

Posted: **Thu Jul 14, 2005 12:32 am**

Popping this thread up because I have a regexp related question
In my file (a.txt) I have this

Code: Select all

0xAF3DBFR
0xAF1
ABCDE
BCDEhello
world
ABCD

when I use this command, this is what i get.

Code: Select all

bash-3.00$ egrep "[:xdigit:]{1,4}" a.txt
0xAF3DBFR
0xAF1
world

how is this happening

Posted: **Thu Jul 14, 2005 4:06 pm**

Remember that grep prints lines with a match; that is, any line that contains a match. If you only want lines that match in full, then you must use anchors to match the beginning and end of the line. Beginning is ^ (which only makes sense in the beginning, since we are matching line at a time) and end is $ (in the end, ofcourse).

What happens is probably that [:xdigit:] matches the same as [0-9a-f]{1,4}. Since you aren't matching anything after that, the upper bound 4 is meaningless (other than for capture purposes for things like sed), and since you have lower bound 1, and no meaningful upperbound, you could just as well just use [:xdigit:], so you get any lines with at least one 0-9 or a-f.

Why :xdigit: doesn't seem to match A-F as well, I don't know, unless ofcourse you are doing this on Solaris, where I've yet to understand the internal logic of :xdigit: as it seems to match numbers, or hexadecimal digits prefixed by 0x... or something like that... which makes little sense to me.

Does that make sense?

Posted: **Fri Jul 15, 2005 12:19 am**

mystran wrote: Why :xdigit: doesn't seem to match A-F as well, I don't know, unless ofcourse you are doing this on Solaris, where I've yet to understand the internal logic of :xdigit: as it seems to match numbers, or hexadecimal digits prefixed by 0x... or something like that... which makes little sense to me.

Does that make sense?

I was using CYGWIN.
and with the pattern you spoke of this is what I got

Code: Select all

bash-3.00$ egrep "[0-9a-f]{1,4}" a.txt
0xAF3DBFR
0xAF1
BCDEhello
world

Posted: **Fri Jul 15, 2005 2:58 am**

Could it be that the support for

Code: Select all

[:xdigit:]

is broken / not enabled, and that it is greping for any line containing the letters x, d, i, g, or t? That would explain the results just nicely, although I have no idea how it could come about.

Posted: **Fri Jul 15, 2005 4:27 am**

Considering that the 'world' contains a valid hex digit ('d') would that explain anything?

Posted: **Fri Jul 15, 2005 4:48 am**

I tried "egrep -o [:xdigit:] a.txt" and got:

x
x
d

So your assumption is probably correct

OSDev.org

An expression

An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression

Re:An expression