如何将以下电子邮件选项解析为预期输出。这些不在数据帧中,它们是单独的字符串。我有一个循环,通过每个字符串循环。
示例输入
Louis.Stevens@hotmail.com
Louis.a.Stevens@hotmail.com
Louis.Stevens@stackoverflow.com
Louis.Stevens2@hotmail.com
Mike.Williams2@hotmail.com
Lebron.A.James@hotmail.com
预期输出:
Louis Stevens
Louis Stevens
Louis Stevens
Louis Stevens
Mike Williams
Lebron James
感谢
使用正则表达式@.*
:删除@
之后的所有内容
s = pd.Series("""Louis.Stevens@hotmail.com
Louis.a.Stevens@hotmail.com
Louis.Stevens@stackoverflow.com
Louis.Stevens2@hotmail.com""".splitlines())
s.str.replace('@.*', '', regex=True)
#0 Louis.Stevens
#1 Louis.a.Stevens
#2 Louis.Stevens
#3 Louis.Stevens2
#dtype: object
使用正则表达式的findall
提取句子开头的字母数字和紧接在@
之前的字母数字。然后继续,将数字替换为零。下方的代码
email
0 Louis.Stevens@hotmail.com
1 Louis.a.Stevens@hotmail.com
2 Louis.Stevens@stackoverflow.com
3 Louis.Stevens2@hotmail.com
4 Mike.Williams2@hotmail.com
5 Lebron.A.James@hotmail.com
df= df.assign(email_new =df['email'].str.findall('^w+|w+(?=@)').str.join(' ').str.replace('d','', regex=True))
email email_new
0 Louis.Stevens@hotmail.com Louis Stevens
1 Louis.a.Stevens@hotmail.com Louis Stevens
2 Louis.Stevens@stackoverflow.com Louis Stevens
3 Louis.Stevens2@hotmail.com Louis Stevens
4 Mike.Williams2@hotmail.com Mike Williams
5 Lebron.A.James@hotmail.com Lebron James
假设s
为输入系列,并使用str.replace
:
import re
s.str.replace(r'^([a-z]+).(?:..)?([a-z]+).*', r'1 2', regex=True, flags=re.I)
输出:
0 Louis Stevens
1 Louis Stevens
2 Louis Stevens
3 Louis Stevens
4 Mike Williams
5 Lebron James
dtype: object
对于单个字符串:
import re
s = 'Louis.a.Stevens2@hotmail.com'
out = re.sub(r'^([a-z]+).(?:..)?([a-z]+).*', r'1 2', s, flags=re.I)