我有以下数据框架:
id date outcome
3 03/05/2019 no
3 29/05/2019 no
3 04/09/2019 no
3 30/10/2019 yes
3 03/05/2020 no
5 03/12/2019 no
5 26/12/2019 no
5 27/01/2020 yes
5 03/06/2020 yes
6 04/05/2019 no
6 27/10/2019 no
6 26/11/2019 yes
6 28/11/2019 yes
6 29/11/2019 yes
6 20/12/2019 yes
6 27/12/2019 yes
6 29/12/2019 yes
6 03/01/2020 yes
6 14/01/2020 yes
6 11/02/2020 yes
6 13/02/2020 yes
6 18/02/2020 yes
6 13/04/2020 yes
6 14/04/2020 yes
6 24/04/2020 yes
6 30/04/2020 yes
6 05/05/2020 no
,其中根据id分组,日期按升序排列。
我想删除一行,如果它之后的行有相同的结果。这是id 3的期望结果:
id date outcome
3 04/09/2019 no
3 30/10/2019 yes
3 03/05/2020 no
现在我已经创建了一个像这样的蒙版:
m1 = (df['alerts'] == df['alerts'].shift(-1))
但不确定下一步需要做什么来应用条件并删除指定的行。我假设我将做一个groupby与id列…
让我们做
m1 = (df['outcome'] !=
df['outcome'].shift()).cumsum()
out = df.groupby([df['id'],m1]).head(1)
id date outcome
0 3 03/05/2019 no
3 3 30/10/2019 yes
4 3 03/05/2020 no
5 5 03/12/2019 no
7 5 27/01/2020 yes
9 6 04/05/2019 no
11 6 26/11/2019 yes
26 6 05/05/2020 no
这解决你的问题了吗?
new_df = pd.DataFrame()
current = "yes"
for i, row in df.iterrows():
if row['outcome'] != current:
current = row['outcome']
new_df = new_df.append(row)