我有一个数据集df,它包含多个组。我想为每组设置一个阈值。如果阈值高于或低于某个值,则应显示某个文本。
group start end diff percent date
A 2019-04-01 2019-05-01 -160 -11 04-01-2019 to 05-01-2019
A 2019-05-01 2019-06-01 136 8 05-01-2019 to 06-01-2019
B 2020-06-01 2020-07-01 202 5 06-01-2020 to 07-01-2020
B 2020-07-01 2020-08-01 283 7 07-01-2020 to 08-01-2020
我希望将上限阈值设置为任何值>250,并且将较低阈值设置为任何值<0.
所需结果:
group start end diff percent date result
A 2019-04-01 2019-05-01 -160 -11 04-01-2019 to 05-01-2019 unacceptable
A 2019-05-01 2019-06-01 136 8 05-01-2019 to 06-01-2019 acceptable
B 2020-06-01 2020-07-01 202 5 06-01-2020 to 07-01-2020 acceptable
B 2020-07-01 2020-08-01 283 7 07-01-2020 to 08-01-2020 unacceptable
这就是我正在做的:
df['result'] = df.where(df['percent']> 250,'unacceptable')
这不起作用,我正在对此进行研究。欢迎提出任何建议。
让我们尝试合并
df['result']=pd.cut(df.start, [-np.inf, 0, 250,np.inf], labels=['unacceptablelow','acceptable', 'unacceptablehigh'])
group start end diff percent date
A 2019-04-01 2019-05-01 -160 -11 04-01-2019 to 05-01-2019
2019-05-01 2019-06-01 136 8 05-01-2019 to 06-01-2019
B 2020-06-01 2020-07-01 202 5 06-01-2020 to 07-01-2020
2020-07-01 2020-08-01 283 7 07-01-2020 to 08-01-2020
result
A 2019-04-01 unacceptablelow
2019-05-01 acceptable
B 2020-06-01 acceptable
2020-07-01 unacceptablehigh
为什么不使用df.loc
?
df.loc[df['percent']>250,'percent'] = 'unacceptable'