是否有任何方法迭代数据帧中的列表并根据列表中的值进行分类?


Date    Transaction Id  ClientId    Services                                        Class
01-10-2021  1234          1       ['Lip Threading' , 'Eye brow threading']          Threading
02-10-2021  1235          2       ['Full face Threading', 'Eye Brow threading']     Threading
03-10-2021  2346          3       ['Eyebrow Threading' , 'Facial' , 'waxing']       Thread and oth
04-10-2021  5432          4       ['Hair cut' , 'Facial']                           Other
05-10-2021  6578          5       ['Eye brow threading' , 'Haircut', 'facial']      Thread and oth
06-10-2021  3425          6       ['Head Massage', ' hair cut']                     Other

我有dataframe与上面的数据和有列叫做服务不同sercvices作为列表。基于这个列表,我想对class列进行分类,我的主要目标是对只有线程的事务,与其他服务的线程,其他没有线程的服务进行分类。

使用apply从Services中派生class列

def classify(lst):
'''
Classify Type of list
'''
threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
other = "Other" if any("threading" not in el.lower() for el in lst) else ""

if threading and other:
return "Threading and Other"
if threading:
return "Threading"
return "Other"

# Derive Class column from Services column
df['Class'] = df.Services.apply(classify)

Transaction Id  ClientId    Services    Class
0   01-10-2021  1234    1   ['Lip Threading' , 'Eye brow threading']    Threading
1   02-10-2021  1235    2   ['Full face Threading', 'Eye Brow threading']   Threading
2   03-10-2021  2346    3   ['Eyebrow Threading' , 'Facial' , 'waxing'] Threading and Other
3   04-10-2021  5432    4   ['Hair cut' , 'Facial'] Other
4   05-10-2021  6578    5   [Eye brow threading' , 'Haircut', 'facial'] Threading and Other
5   06-10-2021  3425    6   ['Head Massage', ' hair cut']   Other

完整代码

from io import StringIO
import pandas as pd
def classify(lst):
threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
other = "Other" if any("threading" not in el.lower() for el in lst) else ""

if threading and other:
return "Threading and Other"
if threading:
return "Threading"
return "Other"
# Derive Dataframe
s = '''Transaction,Id,ClientId,Services
01-10-2021 ,1234,1,"['Lip Threading' , 'Eye brow threading']"
02-10-2021,1235,2,"['Full face Threading', 'Eye Brow threading']"
03-10-2021,2346,3,"['Eyebrow Threading' , 'Facial' , 'waxing']"
04-10-2021,5432,4,"['Hair cut' , 'Facial']"
05-10-2021,6578,5,"'Eye brow threading' , 'Haircut', 'facial']"
06-10-2021,3425,6,"['Head Massage', ' hair cut']"'''
df = pd.read_csv(StringIO(s), sep = ",", quotechar='"')
# Convert Services column to lists
df['Services'] = df.Services.apply(lambda x: x[1:-1].split(','))
# Derive Class column
df['Class'] = df.Services.apply(classify)

可以用np.selectapply来做。

Dataframe例子:

colA    colB
0   1       [A, B, C]
1   2       [A]
2   3       [B, C]
3   4       [A, C]
4   5       [B]

条件
conditions = [
df["colB"].apply(lambda x: ("A" in x) and len(x)==1 ),
df["colB"].apply(lambda x: ("A" in x) and len(x)!=1)
]

创建列
df["Result"] = np.select(conditions,["A","A and others"], default="Others")

最后dataframe:

colA    colB        colC
0   1       [A, B, C]   A and others
1   2       [A]         A
2   3       [B, C]      Others
3   4       [A, C]      A and others
4   5       [B]         Others

相关内容

  • 没有找到相关文章

最新更新