Date Transaction Id ClientId Services Class
01-10-2021 1234 1 ['Lip Threading' , 'Eye brow threading'] Threading
02-10-2021 1235 2 ['Full face Threading', 'Eye Brow threading'] Threading
03-10-2021 2346 3 ['Eyebrow Threading' , 'Facial' , 'waxing'] Thread and oth
04-10-2021 5432 4 ['Hair cut' , 'Facial'] Other
05-10-2021 6578 5 ['Eye brow threading' , 'Haircut', 'facial'] Thread and oth
06-10-2021 3425 6 ['Head Massage', ' hair cut'] Other
我有dataframe与上面的数据和有列叫做服务不同sercvices作为列表。基于这个列表,我想对class列进行分类,我的主要目标是对只有线程的事务,与其他服务的线程,其他没有线程的服务进行分类。
使用apply从Services中派生class列
def classify(lst):
'''
Classify Type of list
'''
threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
other = "Other" if any("threading" not in el.lower() for el in lst) else ""
if threading and other:
return "Threading and Other"
if threading:
return "Threading"
return "Other"
# Derive Class column from Services column
df['Class'] = df.Services.apply(classify)
Transaction Id ClientId Services Class
0 01-10-2021 1234 1 ['Lip Threading' , 'Eye brow threading'] Threading
1 02-10-2021 1235 2 ['Full face Threading', 'Eye Brow threading'] Threading
2 03-10-2021 2346 3 ['Eyebrow Threading' , 'Facial' , 'waxing'] Threading and Other
3 04-10-2021 5432 4 ['Hair cut' , 'Facial'] Other
4 05-10-2021 6578 5 [Eye brow threading' , 'Haircut', 'facial'] Threading and Other
5 06-10-2021 3425 6 ['Head Massage', ' hair cut'] Other
完整代码
from io import StringIO
import pandas as pd
def classify(lst):
threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
other = "Other" if any("threading" not in el.lower() for el in lst) else ""
if threading and other:
return "Threading and Other"
if threading:
return "Threading"
return "Other"
# Derive Dataframe
s = '''Transaction,Id,ClientId,Services
01-10-2021 ,1234,1,"['Lip Threading' , 'Eye brow threading']"
02-10-2021,1235,2,"['Full face Threading', 'Eye Brow threading']"
03-10-2021,2346,3,"['Eyebrow Threading' , 'Facial' , 'waxing']"
04-10-2021,5432,4,"['Hair cut' , 'Facial']"
05-10-2021,6578,5,"'Eye brow threading' , 'Haircut', 'facial']"
06-10-2021,3425,6,"['Head Massage', ' hair cut']"'''
df = pd.read_csv(StringIO(s), sep = ",", quotechar='"')
# Convert Services column to lists
df['Services'] = df.Services.apply(lambda x: x[1:-1].split(','))
# Derive Class column
df['Class'] = df.Services.apply(classify)
可以用np.select
和apply
来做。
Dataframe例子:
colA colB
0 1 [A, B, C]
1 2 [A]
2 3 [B, C]
3 4 [A, C]
4 5 [B]
条件conditions = [
df["colB"].apply(lambda x: ("A" in x) and len(x)==1 ),
df["colB"].apply(lambda x: ("A" in x) and len(x)!=1)
]
创建列df["Result"] = np.select(conditions,["A","A and others"], default="Others")
最后dataframe:
colA colB colC
0 1 [A, B, C] A and others
1 2 [A] A
2 3 [B, C] Others
3 4 [A, C] A and others
4 5 [B] Others