Python 正则表达式匹配字符串以及分组用法
导入正则表达式模块
import re
基本匹配
使用 re.match()、re.search() 和 re.findall() 方法进行基本的字符串匹配。
re.match()从字符串的开头进行匹配。re.search()在字符串的任意位置进行匹配。re.findall()返回所有匹配的子字符串。
示例
pattern = r'\d+' # 匹配一个或多个数字
text = "There are 123 apples and 456 oranges."
# 从字符串开头进行匹配
match = re.match(pattern, text)
if match:
print("Match found:", match.group())
else:
print("No match at the beginning of the string.")
# 在字符串任意位置进行匹配
search = re.search(pattern, text)
if search:
print("Search found:", search.group())
# 返回所有匹配的子字符串
findall = re.findall(pattern, text)
print("Findall found:", findall)
分组匹配
使用圆括号 () 在正则表达式中定义分组。分组可以提取匹配的子字符串。
示例
pattern = r'(\d+) apples and (\d+) oranges'
text = "There are 123 apples and 456 oranges."
match = re.search(pattern, text)
if match:
print("Group 1:", match.group(1))
print("Group 2:", match.group(2))
命名分组
可以使用 (?P<name>...) 语法为分组命名。
示例
pattern = r'(?P<apples>\d+) apples and (?P<oranges>\d+) oranges'
text = "There are 123 apples and 456 oranges."
match = re.search(pattern, text)
if match:
print("Apples:", match.group('apples'))
print("Oranges:", match.group('oranges'))
分组捕获所有匹配
使用 re.finditer() 返回一个迭代器,捕获所有匹配并可以访问每个匹配的分组。
示例
pattern = r'(\d+)'
text = "There are 123 apples and 456 oranges."
matches = re.finditer(pattern, text)
for match in matches:
print("Match found:", match.group())
替换匹配的子字符串
使用 re.sub() 方法替换匹配的子字符串。
示例
pattern = r'(\d+) apples'
text = "There are 123 apples and 456 apples."
# 将所有匹配的 'apples' 替换为 'pears'
result = re.sub(pattern, r'\1 pears', text)
print("Substitution result:", result)
复杂示例
组合使用分组和替换来处理更复杂的文本。
示例
pattern = r'(\d+)\s+(apples|oranges)'
text = "There are 123 apples and 456 oranges."
def repl(match):
quantity = int(match.group(1))
fruit = match.group(2)
return f'{quantity * 2} {fruit}'
result = re.sub(pattern, repl, text)
print("Complex substitution result:", result)