如何创建包含多个正则表达式的元组列表



因此,我目前正在执行一项任务,要求我们从文本文档中提取电话号码、电子邮件和网站。讲师要求我们将其输出到元组列表中,每个元组都包含初始索引、长度和匹配项。以下是一些示例:[(1,10'9099000008'),(35,16'contact@viva.com')]因为有三种不同的要求需要实现。如何将它们全部放入元组列表中?我已经想到了三个正则表达式,但我不能真正将它们放在一个列表中。我应该创建一个新的表达式来描述这三种情况吗?谢谢你的帮助。

result = []
# Match with RE
email_pattern = r'[w.-]+@[w.-]+(?:.[w]+)+'
email = re.findall(email_pattern, string)
for match in re.finditer(email_pattern, string):
print(match.start(), match.end() - match.start(), match.group())
phone_pattern = r'(?d{3})?[-.s]?d{3}[-.s]?d{4}'
phone = re.findall(phone_pattern, string)
for match in re.finditer(phone_pattern, string):
print(match.start(), match.end() - match.start(), match.group())
website_pattern = '(https?://(?:www.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9].[^s]{2,}|www.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9].[^s]{2,}|https?://(?:www.|(?!www))[a-zA-Z0-9]+.[^s]{2,}|www.[a-zA-Z0-9]+.[^s]{2,})'
web = re.findall(website_pattern, string)
for match in re.finditer(website_pattern, string):
print(match.start(), match.end() - match.start(), match.group())

我的输出:

# Text document
should we use regex more often? let me know at 012345678@student.eng or bbx@gmail.com. To further notice, contact Khoi at 0957507468 or accessing
https://web.de or maybe www.google.com, or Mr.Q at 0912299922.
# Output
47 21 012345678@student.eng
72 13 bbx@gmail.com
122 10 0957507468
197 10 0912299922
146 14 https://web.de
170 15 www.google.com,

不是printing而是appending到resultlist然后printit,即更改

print(match.start(), match.end() - match.start(), match.group())

result.append((match.start(), match.end() - match.start(), match.group()))

其他人也是如此,然后在结束时

print(result)

最新更新