我有一个excel文件,包含如下所示的三列,
<表类>
project_id
名称
日期
tbody><<tr>755 war 2019-04-08 755 Wabern 2020-06-16 755 Wabern (FTTH geplan) 2020-07-24 755 Wabern FTTH DTAG 2020-08-15 755 Wabern (FTTH DTAG gg) 2021-03-05 755 Wabern 2021-09-13 134 Lerbeck 2019-04-18 134 坏oyehausen FttH(圣) 2020-06-26 134 Werre公园 2020-07-14 134 Werre公园FTTH (ssd) 2020-08-25 134 Werre公园(FTTH) 2021-03-15 134 坏oyehausen 2021-09-23 584 kitern 2019-04-08 584 Lausen ftth (los) 2020-06-16 584 kitener (FTTH geplan) 2020-07-24 584 Lausern 2020-08-15 584 Lausern (FTTH DTAG gg) 2021-03-05 表类>
通过循环唯一id,每次通过特定id获得所有名称,在仅保留姓和名之后,将其转换为str并检查是否有单词'FTTH'
代码:
import numpy as np
[i for i in set(df.id.values) if 'FTTH' not in str(np.array(df[df['id']==i]['NM'])[[0,-1]])]
#[755, 134]
Using pandas:
def custom_function(series, pattern='FTTH'):
"""Identify if the first and last items do not have a pattern"""
first = pattern not in series.iat[0].upper()
last = pattern not in series.iat[-1].upper()
return first and last
df.groupby('project_id').Name.apply(custom_function)
输出:
project_id
134 True
584 False
755 True
Name: Name, dtype: bool
与pandas不同的方法:
res = df.groupby('project_id').apply(lambda x: ~x.Name.take([0,-1]).str
.contains('ftth',case=False).any())
res[res].reset_index().drop(0,axis=1)
>>>
'''
project_id
0 134
1 755