,假设我们就层次结构而言同意以下顺序。
宝贝 ->孩子 ->少年 ->成人
我有此数据集
Name Stage Highest_Stage_Reached
0 Adam Child
1 Barry Child
2 Ben Adult
3 Adam Teenager
4 Barry Adult
5 Ben Baby
我将如何将数据集这样填充最高_stage_reached字段?
Name Stage Highest_Stage_Reached
0 Adam Child Teenager
1 Barry Child Adult
2 Ben Adult Adult
3 Adam Teenager Teenager
4 Barry Adult Adult
5 Ben Baby Adult
您可以使用:
d={'Baby':0,'Child':1,'Teenager':2,'Adult':3}
df['rank']=df.Stage.map(d)
df['Highest_Stage_Reached']=df.groupby('Name')['rank'].transform('max').
map({v: k for k, v in d.items()})
print(df.drop('rank',1))
Name Stage Highest_Stage_Reached
0 Adam Child Teenager
1 Barry Child Adult
2 Ben Adult Adult
3 Adam Teenager Teenager
4 Barry Adult Adult
5 Ben Baby Adult
使用顺序参数将列转换为分类。现在可以让您排序。这还将支持阶段中的变量参数。
df['Stage'] = pd.Categorical(df['Stage'], ordered=True, categories=['Baby', 'Child','Teenager','Adult'])
df['Highest_Stage_Reached'] = df.groupby('Name').Stage.transform('max')
Name Stage Highest_Stage_Reached
0 Adam Child Teenager
1 Barry Child Adult
2 Ben Adult Adult
3 Adam Teenager Teenager
4 Barry Adult Adult
5 Ben Baby Adult
将您的层次结构放入列表中使用列表的索引。
l = ['Baby', 'Child', 'Teenager', 'Adult']
df = pd.DataFrame({'Name': ['Adam', 'Barry', 'Ben', 'Adam', 'Barry', 'Ben'], 'Stage': ['Child', 'Child', 'Adult', 'Teenager', 'Adult', 'Baby']})
cond = [df['Stage'] == 'Baby',df['Stage'] == 'Child',df['Stage'] == 'Teenager',df['Stage'] == 'Adult']
df['Highest_Stage_Reached'] = np.select(cond, [0,1,2,3])
Name Stage Highest_Stage_Reached
0 Adam Child 1
1 Barry Child 1
2 Ben Adult 3
3 Adam Teenager 2
4 Barry Adult 3
5 Ben Baby 0
df['Highest_Stage_Reached'] = (df.groupby('Name')['Highest_Stage_Reached'].transform(max))
Name Stage Highest_Stage_Reached
0 Adam Child 2
1 Barry Child 3
2 Ben Adult 3
3 Adam Teenager 2
4 Barry Adult 3
5 Ben Baby 3
df['Highest_Stage_Reached'] = df['Highest_Stage_Reached'].apply(lambda x: l[x])
print(df)
输出:
Name Stage Highest_Stage_Reached
0 Adam Child Teenager
1 Barry Child Adult
2 Ben Adult Adult
3 Adam Teenager Teenager
4 Barry Adult Adult
5 Ben Baby Adult