python的正则（二）：一些高级用法

以下部分不学也问题不大，不用焦虑。

特殊字符

(?aiLmsux) 等价于re.A, re.I, re.L, re.M, re.S, re.U, re.X
(?:...) 不分组
(?aiLmsux-imsx:...) 去除少量模式
(?P<name>...) 给分组起个名字
(?P=name) 按名字引用
(?#...) 只是个注释
(?=...) 后面包含
(?!...) 后面不包含
(?<=...) 前面包含
(?<!...) 前面不包含
(?(id/name)yes-pattern|no-pattern) yes pattern不满足时，再匹配no pattern

一些例子：

#1. 不区分大小写
re.findall(r'(?i)abc', 'abcABCabc')  #['abc', 'ABC', 'abc']
等同于：
re.findall(r'abc', 'abcABCabc'， re.I)

#2. 给分组起个名字，看起来比\1\2更容易阅读
m = re.search(r'\.(?P<port>\w+)\s*\((?=port)\)', '.abc (abc),')
print(m.group('port') # abc
print(m.group(0)) # .abc (abc)

#3. 向前看
m = re.search(r'(?<=input\swire\s)(?P<input>\w+)', 'input wire abc')
print(m.group('input') # abc

#4. 正则不满足时，尝试匹配另外一个正则
m = re.search(r'(output)?\s+(?(1)reg|wire)\s+(\w+)', 'output reg abc')
print(m.group(2)) # abc
m = re.search(r'(output)?\s+(?(1)reg|wire)\s+(\w+)', 'input wire abc')
print(m.group(2)) # abc

re的几个函数

re.escape

re.escape(pattern)，自动把特殊字符转义。

print(re.escape('[7:0]')) # \[7:0\]

re.compile

re.compile(pattern, flags=0)，正则编译，一次编译可以多处使用，加快正则执行速度。

re.finditer

re.finditer(pattern, string, flags=0)，返回iterator，就是可以用for循环依次处理的数据类型，还可以获得每个匹配字符串的开始start()和结束end()位置。例如，

s = """
    input wire a,
    input wire b,
    output wire c,
    output wire d
"""
m = re.finditer(r'(?:input|output)\s+wire\s+(\w+)', s)
for i in m:
    print("start=", i.start(), "end=", i.end(), "match=", i.group(1))

# start= 1 end= 13 match= a
# start= 15 end= 27 match= b
# start= 29 end= 42 match= c
# start= 44 end= 57 match= d

re.sub处理复杂的查找替换

re.sub(pattern, repl, string, count=0, flags=0) 的repl不仅仅可以是字符串，也可以是一个函数。例如，下面是一个改变端口顺序的正则例子，

s = """
module test
  (a, b, c, d, e);

  //...
  
endmodule
"""

def rep(m):
    # 获取待处理的字符串
    s1 = m.group(0)
    # 一大堆复杂的处理
    port = re.search(r'\((.*)\)', s1).group(1)
    port = port.strip()
    port = re.sub('\s+', "", port)
    port_list = port.split(',')
    port_list = port_list[::-1];
    s2 = re.sub(r'(?<=\().*(?=\))', ','.join(port_list), s1)
    # 返回处理完的字符串，用于替换
    return s2

s3 = re.sub(r'module.*?;', rep, 0, re.S)
print(s3)
               
#module test   
#  (e,d,c,b,a);
#              
#  //...       
#              
#endmodule