如何提取电子邮件地址字符串

我的python脚本当前将一个电子邮件地址作为列表，但我只需要获取文本部分。在本例中golfshop@3lakesgolf.com.我尝试过使用text属性(gc_email.text(，但没有成功。

gc_email=web.select('a[href^=mailto]')
print(gc_email)

输出：

[<a href="mailto:golfshop@3lakesgolf.com">golfshop@3lakesgolf.com</a>]

救命！如何仅提取mailto地址？

您可以使用正则表达式捕获来提取此字符串

import re
str = '<a href="mailto:golfshop@3lakesgolf.com">golfshop@3lakesgolf.com</a>'
regex = '<a href="mailto:(.*?)".*'
try:
match = re.match(regex, str).group(1)
except:
match = None
x=1
if match is not None:
print(match)

输出

golfshop@3lakesgolf.com

假设每一行都遵循您提供的格式，您可以对一系列字符使用".split(("函数，然后从返回的列表中选择适当的项。

line = '<a href="mailto:golfshop@3lakesgolf.com">golfshop@3lakesgolf.com</a>]'
sections1 = line.split(':')
email = sections1[1].split('.com')[0]+'.com'

输出

golfshop@3lakesgolf.com

如果格式不同，并且不是每次都是这样，那么我会使用正则表达式。

相关内容

最新更新

热门标签：