Page 1 of 2

An expression

Posted: Thu Oct 14, 2004 11:27 am
by rich_m
I want to know a single regular expression that can match these 2 words in a file "saxena" and "saksena" they have only the "ks" and "x" different in them i can use

Code: Select all

grep "sa[ksena|xena]" file.txt 
but i thought as the "ena" part was also common there must be some other way to match them. also which is the best site to learn about regular expression?
Also how do we grep whitespace and strings which may be regular expressions in a file??

Re:An expression

Posted: Thu Oct 14, 2004 11:37 am
by Candy
rich_m wrote: I want to know a single regular expression that can match these 2 words in a file "saxena" and "saksena" they have only the "ks" and "x" different in them i can use

Code: Select all

grep "sa[ksena|xena]" file.txt 
but i thought as the "ena" part was also common there must be some other way to match them. also which is the best site to learn about regular expression?
Also how do we grep whitespace and strings which may be regular expressions in a file??

first one:

Code: Select all

 grep "sa(ks|x)ena" file 
The brackets are for alphabets, you defined an alphabet with the characters aeknsx| in them. You match the sa with one of those.

The second one, you could try looking at the unix program strings, it is something similar. Short idea:

Code: Select all

[ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9]+
-> gives all legible pieces of 4 chars or more. Note that it also matches spaces, tabs and newlines (all types).

Re:An expression

Posted: Fri Oct 15, 2004 6:54 am
by Neo
i tried that on SunOS.
it didnt work. (the "sa(ks|x)ena part i mean)

Re:An expression

Posted: Fri Oct 15, 2004 12:10 pm
by zloba
it seems the special characters should be escaped:

Code: Select all

grep 'sa\(ks\|x\)ena' file
that ^ i tried on linux and it worked.

p.s. tried on Sun:
with grep, didn't work either way.
with egrep, worked without escapes:

Code: Select all

egrep 'sa(ks|x)ena' file
which is pro'lly what you need.

freaking inconsistency is what this is >:(

p.p.s. our university unix (Sun) environment is a mess. software is severely outdated, doesn't agree with manpages.. webspace is run by incompetent morons..

Re:An expression

Posted: Fri Oct 15, 2004 12:44 pm
by mystran
IIRC on Solaris, you should type 'sa\(ks|x\)ena' but I might remember wrong, and I'm too lazy to find a Solaris-box to test with.

Re:An expression

Posted: Fri Oct 15, 2004 9:52 pm
by Neo
does that work on Linux too?

Re:An expression

Posted: Fri Oct 15, 2004 10:19 pm
by zloba
on linux, if u use egrep, it works the same way, without escapes:

Code: Select all

egrep 'sa(ks|x)ena' file
man egrep:
Egrep is the same as grep -E.
that's what my Sun manpages say also, but grep refuses -E, prolly because it's so outdated.

bottom line:
egrep works as expected in all cases, so you probably want to use that.

Re:An expression

Posted: Fri Oct 15, 2004 10:55 pm
by Neo
yeah only the egrep one works the other doesn't in any other form other than the one i specifed at the beginning.

Re:An expression

Posted: Mon Oct 18, 2004 3:13 am
by Neo
Some more questions...
How do you delete lines containing whitespace from a file?
I tried using 'tr' in this way but only lines containing space get deleted (not the ones with tab spaces

Code: Select all

tr -d " \t"<file.txt
What is the right way?

also related to this i have another question
How do we count the lines having at least one space or tab?
the -v option again does not match TABS. Any idea why?

Re:An expression

Posted: Thu Jul 14, 2005 12:32 am
by Neo
Popping this thread up because I have a regexp related question
In my file (a.txt) I have this

Code: Select all

0xAF3DBFR
0xAF1
ABCDE
BCDEhello
world
ABCD
when I use this command, this is what i get.

Code: Select all

bash-3.00$ egrep "[:xdigit:]{1,4}" a.txt
0xAF3DBFR
0xAF1
world
how is this happening

Re:An expression

Posted: Thu Jul 14, 2005 4:06 pm
by mystran
Remember that grep prints lines with a match; that is, any line that contains a match. If you only want lines that match in full, then you must use anchors to match the beginning and end of the line. Beginning is ^ (which only makes sense in the beginning, since we are matching line at a time) and end is $ (in the end, ofcourse).

What happens is probably that [:xdigit:] matches the same as [0-9a-f]{1,4}. Since you aren't matching anything after that, the upper bound 4 is meaningless (other than for capture purposes for things like sed), and since you have lower bound 1, and no meaningful upperbound, you could just as well just use [:xdigit:], so you get any lines with at least one 0-9 or a-f.

Why :xdigit: doesn't seem to match A-F as well, I don't know, unless ofcourse you are doing this on Solaris, where I've yet to understand the internal logic of :xdigit: as it seems to match numbers, or hexadecimal digits prefixed by 0x... or something like that... which makes little sense to me.

Does that make sense?

Re:An expression

Posted: Fri Jul 15, 2005 12:19 am
by Neo
mystran wrote: Why :xdigit: doesn't seem to match A-F as well, I don't know, unless ofcourse you are doing this on Solaris, where I've yet to understand the internal logic of :xdigit: as it seems to match numbers, or hexadecimal digits prefixed by 0x... or something like that... which makes little sense to me.

Does that make sense?
I was using CYGWIN.
and with the pattern you spoke of this is what I got

Code: Select all

bash-3.00$ egrep "[0-9a-f]{1,4}" a.txt
0xAF3DBFR
0xAF1
BCDEhello
world

Re:An expression

Posted: Fri Jul 15, 2005 2:58 am
by Solar
Could it be that the support for

Code: Select all

[:xdigit:]
is broken / not enabled, and that it is greping for any line containing the letters x, d, i, g, or t? That would explain the results just nicely, although I have no idea how it could come about.

Re:An expression

Posted: Fri Jul 15, 2005 4:27 am
by Neo
Considering that the 'world' contains a valid hex digit ('d') would that explain anything?

Re:An expression

Posted: Fri Jul 15, 2005 4:48 am
by AR
I tried "egrep -o [:xdigit:] a.txt" and got:
x
x
d
So your assumption is probably correct