DATE PARTICULARS DR CR
0 04-03-2017 UPI PA 10000 0.0
1 04-03-2017 UPI PA 10000 0.0
2 04-03-2017 POS BPCL COCO PU 1000 0.0
3 04-03-2017 BRN PYMT CARD 2536 0.0
4 04-03-2017 UPI PA 10050 0.0
5 04-03-2017 POS BPCL COCO PU 1000 0.0
6 04-04-2017 INB IFT PARTY TRANSFER 60000 0.0
我有一个如上的数据框架。
我有三个列表:
fuel = ['hpcl', 'bpcl']
cc = ['BRN', 'DEL', 'RQ']
ref = ['transfer', 'tx']
我想创建一个'Label'列df['Label'],如果list
中的任何单词在PARTICULARS
列中找到,则将name of the list
附加到列中,如下所示:
DATE PARTICULARS DR CR Label
0 04-03-2017 UPI PA 10000 0.0 Nan
1 04-03-2017 UPI PA 10000 0.0 Nan
2 04-03-2017 POS BPCL COCO PU 1000 0.0 fuel
3 04-03-2017 BRN PYMT CARD 2536 0.0 cc
4 04-03-2017 UPI PA 10050 0.0 Nan
5 04-03-2017 POS BPCL COCO PU 1000 0.0 fuel
6 04-04-2017 INB IFT PARTY TRANSFER 60000 0.0 ref
我尝试了以下内容,但它只将NaN附加到我的'Label
'列。
def add_label(lst):
for x in lst:
if x in df.PARTICULARS.str.split():
df.loc[x, 'Label'] == lst.__name__
return df
df['Label'] = df.apply(lambda x: add_label(Fuel))
您可以使用np.select
与pd.Series.str.contains
:
df["Label"] = np.select([df["PARTICULARS"].str.contains("|".join(i), case=False) for i in (fuel, cc, ref)],
["fuel", "cc", "ref"], "Nan")
# or str.contains(r"b"+"|".join(i)+r"b",...) if you want to add a word boundary
print (df)
DATE PARTICULARS DR CR Label
0 04-03-2017 UPI PA 10000 0.0 Nan
1 04-03-2017 UPI PA 10000 0.0 Nan
2 04-03-2017 POS BPCL COCO PU 1000 0.0 fuel
3 04-03-2017 BRN PYMT CARD 2536 0.0 cc
4 04-03-2017 UPI PA 10050 0.0 Nan
5 04-03-2017 POS BPCL COCO PU 1000 0.0 fuel
6 04-04-2017 INB IFT PARTY TRANSFER 60000 0.0 ref