python re模块
Charmersix

初识re

findall

findall : 匹配字符串中所有的符合正则内容

1
2
3
import re
list = re.findall(r"\d+","my phone number is : 10086 ; my girlfriend's phonenumber is : 10010")
print(list)

finditer

finditer : 匹配字符串中所有内容(返回的是迭代器);从迭代器里拿到内容需要.group()

1
2
3
4
import re
it = re.finditer(r"\d+","my phone number is : 10086 ; my girlfriend's phonenumber is : 10010")
for i in it :
print(i.group())

search : 找到一个结果就返回,返回结果是match对象,需要.group()

1
2
3
import re
s = re.search(r"\d+","my phone number is : 10086 ; my girlfriend's phonenumber is : 10010")
print(s.group())

match

match : 从头开始匹配

1
2
3
import re
m = re.match(r"\d+","my phone number is : 10086 ; my girlfriend's phonenumber is : 10010")
print(m.group())

预加载正则表达

先正则匹配,再re

1
2
3
4
5
6
import re
obj = re.compile(r"\d+")

ret = obj.finditer("my phone number is : 10086 ; my girlfriend's phonenumber is : 10010")
for it in ret :
print(it.group())

一个鲜明的预加载例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import re
a = """
<div class= 'jay'><span id= '1'>郭麒麟</span></div>
<div class= 'jj'><span id= '2'>郭德纲</span></div>
<div class= 'mike'><span id= '3'>于谦</span></div>
<div class= 'jack'><span id= '4'>岳云鹏</span></div>
"""
# (?P<分组名字>正则) 可以单独从正则匹配内容中提取所需
obj = re.compile(r"<div class= '(?P<english>.*?)'><span id= '(?P<id>\d+)'>(?P<德云社>.*?)</span></div>", re.S)#让. 能匹配到换行符
result = obj.finditer(a)
for it in result :
print(it.group("english"))
print(it.group("id"))
print(it.group("德云社"))
 Comments