我想从每一行中提取单词"用户"+紧随其后的数字来自熊猫系列。其他的东西都可以丢弃。你将如何执行此操作?谢谢
这是一个系列的例子:
0 1 - Unassigned, 2 - User 397335
1 1 - Unassigned, 2 - User 525767, 3 - Unassigned
2 1 - Unassigned
3 1 - Unassigned
4 1 - Unassigned
...
163678 1 - Unassigned
163679 1 - Unassigned, 2 - User 347991, 3 - Unassigned
163680 1 - Unassigned
163681 1 - Unassigned
163682 1 - Unassigned, 2 - User 663455, 3 - Unassigned
使用str.findall
:
>>> df['A'].str.findall(r'User d+').str[-1]
0 User 397335
1 User 525767
2 NaN
3 NaN
4 NaN
163678 NaN
163679 User 347991
163680 NaN
163681 NaN
163682 User 663455
Name: A, dtype: object