출처 : http://gnosis.cx/publish/programming/regular_expressions.html

텍스트내에 일치하는 패턴 : 기본형

1. Character literals

/a/


Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

/Mary/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

2. "Escaped" characters literals

/.*/


Special characters must be escaped.*

/\.\*/
Special characters must be escaped.*

3. Positional special characters

/^Mary/


Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

/Mary$/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

4. The "wildcard" character

/.a/


Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

5. Grouping regular expressions

/(Mary)( )(had)/


Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

6. Character classes

/[a-z]a/


Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

7. Complement operator

/[^a-z]a/


Mary had a little lamb.
And everywhere that Mary went, the lamb was sureto go.

8. Alternation of patterns

/cat|dog|bird/

The pet store sold cats, dogs, and birds.

/=first|second=/

=first first= # =second second= # =first= # =second=

/(=)(first)|(second)(=)/

=first first= # =second second= # =first= # =second=

/=(first|second)=/

=first first= # =second second= # =first= # =second=

9. The basic abstract quantifier

/@(=+=)*@/


Match with zero in the middle: @@
Subexpresion occurs, but...: @=+=ABC@
Lots of occurrences: @=+==+==+==+==+=@
Must repeat entire pattern: @=+==+=+==+=@


텍스트내에 일치하는 패턴 : 중간형


1. More abstract quantifiers

/A+B*C?D/


AAAD
ABBBBCD
BBBCD
ABCCD
AAABBBC

2. Numeric quantifiers

/a{5} b{,6} c{4,8}/


aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc

/a+ b{3,} c?/

aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc

/a{5} b{6,} c{4,8}/

aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc


3. Backreferences

/(abc|xyz) \1/


jkl abc xyz
jkl xyz abc
jkl abc abc
jkl xyz xyz

/(abc|xyz) (abc|xyz)/

jkl abc xyz
jkl xyz abc
jkl abc abc
jkl xyz xyz

4. Don't match more than you want to

/th.*s/


-- I want to match the words that start
-- with 'th' and end with 's'.
this
thus
thistle
this line matches too much

5. Tricks for restraining matches

/th[^s]*./


-- I want to match the words that start
-- with 'th' and end with 's'.
this
thus
thistle
this line matches too much


A literal-string modification example

s/cat/dog/g

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had wild dogs, bobdogs, lions, and other wild dogs.

A pattern-match modification example

s/cat|dog/snake/g

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had wild snakes, bobsnakes, lions, and other wild snakes.

s/[a-z]+i[a-z]*/nice/g

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had nice dogs, bobcats, nice, and other nice cats.


Modification using backreferences

s/([A-Z])([0-9]{2,4}) /\2:\1 /g

< A37 B4 C107 D54112 E1103 XXX
> 37:A B4 107:C D54112 1103:E XXX
   


고급 정규 표현식의 확장

Non-greedy quantifiers

/th.*s/

-- I want to match the words that start
-- with 'th' and end with 's'.
this line matches just right
this # thus # thistle

/th.*?s/

-- I want to match the words that start
-- with 'th' and end with 's'.
this # thus # thistle
this line matches just right

/th.*?s /

-- I want to match the words that start
-- with 'th' and end with 's'. (FINALLY!)
this # thus # thistle
this line matches just right
   

Pattern-match modifiers

/M.*[ise] /

MAINE # Massachusetts # Colorado #
mississippi # Missouri # Minnesota #

/M.*[ise] /i

MAINE # Massachusetts # Colorado #
mississippi # Missouri # Minnesota #

/M.*[ise] /gis

MAINE # Massachusetts # Colorado #
mississippi # Missouri
# Minnesota #
   

Changing backreference behavior

s/([A-Z])(?:-[a-z]{3}-)([0-9]*)/\1\2/g

< A-xyz-37 # B:abcd:142 # C-wxy-66 # D-qrs-93
> A37 # B:abcd:42 # C66 # D93
   

Naming backreferences

import re
txt = "A-xyz-37 # B:abcd:142 # C-wxy-66 # D-qrs-93"
print re.sub("(?P<prefix>[A-Z])(-[a-z]{3}-)(?P<id>[0-9]*)",
             "\g<prefix>\g<id>", txt)


A37 # B:abcd:42 # C66 # D93

Lookahead assertions

s/([A-Z]-)(?=[a-z]{3})([a-z0-9]* )/\2\1/g

< A-xyz37 # B-ab6142 # C-Wxy66 # D-qrs93
> xyz37A- # B-ab6142 # C-Wxy66 # qrs93D-

s/([A-Z]-)(?![a-z]{3})([a-z0-9]* )/\2\1/g

< A-xyz37 # B-ab6142 # C-Wxy66 # D-qrs93
> A-xyz37 # ab6142B- # Wxy66C- # D-qrs93

Making regular expressions more readable

/               # identify URLs within a text file
          [^="] # do not match URLs in IMG tags like:
                # <img src="http://mysite.com/mypic.png">
http|ftp|gopher # make sure we find a resource type
          :\/\/ # ...needs to be followed by colon-slash-slash
      [^ \n\r]+ # stuff other than space, newline, tab is in URL
    (?=[\s\.,]) # assert: followed by whitespace/period/comma
/

The URL for my site is: http://mysite.com/mydoc.html.  You
might also enjoy ftp://yoursite.com/index.html for a good
place to download files.
이올린에 북마크하기(0) 이올린에 추천하기(0)
Posted by 백성용 헬로우보이