我有一个名为a
的数据帧。我想获取每行中最常用的两个元素。Input:
import pandas as pd
a=pd.DataFrame({'A1':['food','movie','sport'],'A2':['game','traffic','health'],
'A3':['food','health','education'],'A4':['game','travel','other'],
'A5':['social','other','sport']})
Output:
A1 A2 A3 A4 A5
0 food game food game social
1 movie traffic health travel other
2 sport health education other sport
Expected:
top1 top2
0 food game
1 health movie
2 sport education
如您所见,也许一行中的某些元素以相同的频率出现。对于这样的元素,我只选择其中一个进行排名,例如,行中的所有元素1
出现一次,所以我只是随机选择其中两个进行排名。
希望得到帮助和感谢!
Counter
from collections import Counter
tops = [
[*zip(*Counter(r).most_common(2))][0]
for r in zip(*map(a.get, a))
]
pd.DataFrame(tops, a.index, ['top1', 'top2'])
top1 top2
0 food game
1 movie traffic
2 sport health
您可以尝试使用值计数并将出现的最多单词分配为顶部,依此类推
pd.DataFrame({'top1':a.apply(lambda x: x.value_counts().index[0],1 ).values,
'top2':a.apply(lambda x: x.value_counts().index[1],1 ).values})
外:
top1 top2
0 game food
1 traffic movie
2 sport other
使用:
a.apply(lambda x: pd.Series(x.value_counts().nlargest(2).index.tolist(),
index=['top1','top2']),
axis=1)
输出:
top1 top2
0 game food
1 traffic other
2 sport other
你需要计数器和应用函数,
from collections import Counter
out_df=pd.DataFrame((df.apply(Counter,axis=1).apply(list).str[:2]).values.tolist(),columns=['top1','top2'])