An expression

Programming, for all ages and all languages.
rich_m

An expression

Post by rich_m »

I want to know a single regular expression that can match these 2 words in a file "saxena" and "saksena" they have only the "ks" and "x" different in them i can use

Code: Select all

grep "sa[ksena|xena]" file.txt 
but i thought as the "ena" part was also common there must be some other way to match them. also which is the best site to learn about regular expression?
Also how do we grep whitespace and strings which may be regular expressions in a file??
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:An expression

Post by Candy »

rich_m wrote: I want to know a single regular expression that can match these 2 words in a file "saxena" and "saksena" they have only the "ks" and "x" different in them i can use

Code: Select all

grep "sa[ksena|xena]" file.txt 
but i thought as the "ena" part was also common there must be some other way to match them. also which is the best site to learn about regular expression?
Also how do we grep whitespace and strings which may be regular expressions in a file??

first one:

Code: Select all

 grep "sa(ks|x)ena" file 
The brackets are for alphabets, you defined an alphabet with the characters aeknsx| in them. You match the sa with one of those.

The second one, you could try looking at the unix program strings, it is something similar. Short idea:

Code: Select all

[ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9][ \t\n\ra-zA-Z0-9]+
-> gives all legible pieces of 4 chars or more. Note that it also matches spaces, tabs and newlines (all types).
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:An expression

Post by Neo »

i tried that on SunOS.
it didnt work. (the "sa(ks|x)ena part i mean)
Only Human
zloba

Re:An expression

Post by zloba »

it seems the special characters should be escaped:

Code: Select all

grep 'sa\(ks\|x\)ena' file
that ^ i tried on linux and it worked.

p.s. tried on Sun:
with grep, didn't work either way.
with egrep, worked without escapes:

Code: Select all

egrep 'sa(ks|x)ena' file
which is pro'lly what you need.

freaking inconsistency is what this is >:(

p.p.s. our university unix (Sun) environment is a mess. software is severely outdated, doesn't agree with manpages.. webspace is run by incompetent morons..
mystran

Re:An expression

Post by mystran »

IIRC on Solaris, you should type 'sa\(ks|x\)ena' but I might remember wrong, and I'm too lazy to find a Solaris-box to test with.
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:An expression

Post by Neo »

does that work on Linux too?
Only Human
zloba

Re:An expression

Post by zloba »

on linux, if u use egrep, it works the same way, without escapes:

Code: Select all

egrep 'sa(ks|x)ena' file
man egrep:
Egrep is the same as grep -E.
that's what my Sun manpages say also, but grep refuses -E, prolly because it's so outdated.

bottom line:
egrep works as expected in all cases, so you probably want to use that.
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:An expression

Post by Neo »

yeah only the egrep one works the other doesn't in any other form other than the one i specifed at the beginning.
Only Human
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:An expression

Post by Neo »

Some more questions...
How do you delete lines containing whitespace from a file?
I tried using 'tr' in this way but only lines containing space get deleted (not the ones with tab spaces

Code: Select all

tr -d " \t"<file.txt
What is the right way?

also related to this i have another question
How do we count the lines having at least one space or tab?
the -v option again does not match TABS. Any idea why?
Only Human
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:An expression

Post by Neo »

Popping this thread up because I have a regexp related question
In my file (a.txt) I have this

Code: Select all

0xAF3DBFR
0xAF1
ABCDE
BCDEhello
world
ABCD
when I use this command, this is what i get.

Code: Select all

bash-3.00$ egrep "[:xdigit:]{1,4}" a.txt
0xAF3DBFR
0xAF1
world
how is this happening
Only Human
mystran

Re:An expression

Post by mystran »

Remember that grep prints lines with a match; that is, any line that contains a match. If you only want lines that match in full, then you must use anchors to match the beginning and end of the line. Beginning is ^ (which only makes sense in the beginning, since we are matching line at a time) and end is $ (in the end, ofcourse).

What happens is probably that [:xdigit:] matches the same as [0-9a-f]{1,4}. Since you aren't matching anything after that, the upper bound 4 is meaningless (other than for capture purposes for things like sed), and since you have lower bound 1, and no meaningful upperbound, you could just as well just use [:xdigit:], so you get any lines with at least one 0-9 or a-f.

Why :xdigit: doesn't seem to match A-F as well, I don't know, unless ofcourse you are doing this on Solaris, where I've yet to understand the internal logic of :xdigit: as it seems to match numbers, or hexadecimal digits prefixed by 0x... or something like that... which makes little sense to me.

Does that make sense?
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:An expression

Post by Neo »

mystran wrote: Why :xdigit: doesn't seem to match A-F as well, I don't know, unless ofcourse you are doing this on Solaris, where I've yet to understand the internal logic of :xdigit: as it seems to match numbers, or hexadecimal digits prefixed by 0x... or something like that... which makes little sense to me.

Does that make sense?
I was using CYGWIN.
and with the pattern you spoke of this is what I got

Code: Select all

bash-3.00$ egrep "[0-9a-f]{1,4}" a.txt
0xAF3DBFR
0xAF1
BCDEhello
world
Only Human
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:An expression

Post by Solar »

Could it be that the support for

Code: Select all

[:xdigit:]
is broken / not enabled, and that it is greping for any line containing the letters x, d, i, g, or t? That would explain the results just nicely, although I have no idea how it could come about.
Every good solution is obvious once you've found it.
User avatar
Neo
Member
Member
Posts: 842
Joined: Wed Oct 18, 2006 9:01 am

Re:An expression

Post by Neo »

Considering that the 'world' contains a valid hex digit ('d') would that explain anything?
Only Human
AR

Re:An expression

Post by AR »

I tried "egrep -o [:xdigit:] a.txt" and got:
x
x
d
So your assumption is probably correct
Post Reply