下面的数据框是我试图绘制的,但是在每列中有几个重复的条目。通过只维护最后一个条目,我希望消除每列中最初重复的组件,这样它们就不会出现在图中(忽略如果在中间和最后重复)。
有人能帮我解决这个问题吗?
我试过的代码,这删除了如果重复在整个行:
df = df.drop_duplicates(subset=df.columns[1:], keep='last')
df = df.groupby((df.shift() != df).cumsum()).filter(lambda x: len(x) < 5)
输入:
Date Build1 Build2 Build3 Build4 Build5 Build6
2022-11-26 00:00:00 30 30 30 30 30 30
2022-11-27 00:00:00 30 30 30 30 30 30
2022-11-28 00:00:00 30 30 30 30 30 30
2022-11-29 00:00:00 30 30 30 30 30 30
2022-11-30 00:00:00 30 30 30 30 30 30
2022-12-01 00:00:00 28 30 30 30 30 30
2022-12-02 00:00:00 25 30 30 30 30 30
2022-12-03 00:00:00 25 30 30 30 30 30
2022-12-04 00:00:00 22 28 30 30 30 30
2022-12-05 00:00:00 22 26 30 30 30 30
2022-12-06 00:00:00 22 23 30 30 30 30
2022-12-07 00:00:00 22 22 30 30 30 30
2022-12-08 00:00:00 22 20 30 30 30 30
2022-12-09 00:00:00 22 20 25 30 30 30
2022-12-10 00:00:00 22 20 23 30 30 30
2022-12-11 00:00:00 22 20 23 30 30 30
2022-12-12 00:00:00 22 20 18 30 30 30
2022-12-13 00:00:00 22 20 14 30 30 30
2022-12-14 00:00:00 22 20 11 30 30 30
2022-12-15 00:00:00 22 20 10 27 30 30
2022-12-16 00:00:00 22 20 10 20 30 30
2022-12-17 00:00:00 22 20 10 20 30 30
2022-12-18 00:00:00 22 20 10 20 30 30
2022-12-19 00:00:00 22 20 10 13 30 30
2022-12-20 00:00:00 22 20 10 2 30 30
2022-12-21 00:00:00 22 20 10 2 19 30
2022-12-22 00:00:00 22 20 10 2 11 30
2022-12-23 00:00:00 22 20 10 2 4 30
2022-12-24 00:00:00 22 20 10 2 0 30
2022-12-25 00:00:00 22 20 10 2 0 22
2022-12-26 00:00:00 22 20 10 2 0 15
2022-12-27 00:00:00 22 20 10 2 0 15
2022-12-28 00:00:00 22 20 10 2 0 9
预期输出:
Date Build1 Build2 Build3 Build4 Build5 Build6
2022-11-26 00:00:00
2022-11-27 00:00:00
2022-11-28 00:00:00
2022-11-29 00:00:00
2022-11-30 00:00:00 30
2022-12-01 00:00:00 28
2022-12-02 00:00:00 25
2022-12-03 00:00:00 25 30
2022-12-04 00:00:00 22 28
2022-12-05 00:00:00 22 26
2022-12-06 00:00:00 22 23
2022-12-07 00:00:00 22 22
2022-12-08 00:00:00 22 20 30
2022-12-09 00:00:00 22 20 25
2022-12-10 00:00:00 22 20 23
2022-12-11 00:00:00 22 20 23
2022-12-12 00:00:00 22 20 18
2022-12-13 00:00:00 22 20 14
2022-12-14 00:00:00 22 20 11 30
2022-12-15 00:00:00 22 20 10 27
2022-12-16 00:00:00 22 20 10 20
2022-12-17 00:00:00 22 20 10 20
2022-12-18 00:00:00 22 20 10 20
2022-12-19 00:00:00 22 20 10 13
2022-12-20 00:00:00 22 20 10 2 30
2022-12-21 00:00:00 22 20 10 2 19
2022-12-22 00:00:00 22 20 10 2 11
2022-12-23 00:00:00 22 20 10 2 4
2022-12-24 00:00:00 22 20 10 2 0 30
2022-12-25 00:00:00 22 20 10 2 0 22
2022-12-26 00:00:00 22 20 10 2 0 15
2022-12-27 00:00:00 22 20 10 2 0 15
2022-12-28 00:00:00 22 20 10 2 0 9
你可以直接做
is_duplicate = df.apply(pd.Series.duplicated, axis=1)
df.where(~is_duplicate, np.nan)
,
Date Build1 Build2 Build3 Build4
0 2022-11-26 00:00:00 30 30 NaN NaN NaN
1 2022-11-27 00:00:00 30 30 NaN NaN NaN
2 2022-11-28 00:00:00 30 30 NaN NaN NaN
3 2022-11-29 00:00:00 30 30 NaN NaN NaN
4 2022-11-30 00:00:00 30 30 NaN NaN NaN
5 2022-12-01 00:00:00 28 30 NaN NaN NaN
6 2022-12-02 00:00:00 25 30 NaN NaN NaN
7 2022-12-03 00:00:00 25 30 NaN NaN NaN
8 2022-12-04 00:00:00 22 30 NaN NaN NaN
9 2022-12-05 00:00:00 22 30 NaN NaN NaN
10 2022-12-06 00:00:00 22 30 NaN NaN NaN
11 2022-12-07 00:00:00 22 30 NaN NaN NaN
12 2022-12-08 00:00:00 22 30 NaN NaN NaN
13 2022-12-09 00:00:00 22 25 30.0 NaN NaN
14 2022-12-10 00:00:00 22 23 30.0 NaN NaN
15 2022-12-11 00:00:00 22 23 30.0 NaN NaN
16 2022-12-12 00:00:00 22 18 30.0 NaN NaN
17 2022-12-13 00:00:00 22 14 30.0 NaN NaN
18 2022-12-14 00:00:00 22 11 30.0 NaN NaN
19 2022-12-15 00:00:00 22 10 27.0 30.0 NaN
20 2022-12-16 00:00:00 22 10 20.0 30.0 NaN
21 2022-12-17 00:00:00 22 10 20.0 30.0 NaN
22 2022-12-18 00:00:00 22 10 20.0 30.0 NaN
23 2022-12-19 00:00:00 22 10 13.0 30.0 NaN
24 2022-12-20 00:00:00 22 10 2.0 30.0 NaN
25 2022-12-21 00:00:00 22 10 2.0 19.0 30.0
26 2022-12-22 00:00:00 22 10 2.0 11.0 30.0
27 2022-12-23 00:00:00 22 10 2.0 4.0 30.0
28 2022-12-24 00:00:00 22 10 2.0 0.0 30.0
29 2022-12-25 00:00:00 22 10 2.0 0.0 22.0
30 2022-12-26 00:00:00 22 10 2.0 0.0 15.0
31 2022-12-27 00:00:00 22 10 2.0 0.0 15.0
32 2022-12-28 00:00:00 22 10 2.0 0.0 9.0
或
is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, ''))
给了:
Date Build1 Build2 Build3 Build4
0 2022-11-26 00:00:00 30 30
1 2022-11-27 00:00:00 30 30
2 2022-11-28 00:00:00 30 30
3 2022-11-29 00:00:00 30 30
4 2022-11-30 00:00:00 30 30
5 2022-12-01 00:00:00 28 30
6 2022-12-02 00:00:00 25 30
7 2022-12-03 00:00:00 25 30
8 2022-12-04 00:00:00 22 30
9 2022-12-05 00:00:00 22 30
10 2022-12-06 00:00:00 22 30
11 2022-12-07 00:00:00 22 30
12 2022-12-08 00:00:00 22 30
13 2022-12-09 00:00:00 22 25 30
14 2022-12-10 00:00:00 22 23 30
15 2022-12-11 00:00:00 22 23 30
16 2022-12-12 00:00:00 22 18 30
17 2022-12-13 00:00:00 22 14 30
18 2022-12-14 00:00:00 22 11 30
19 2022-12-15 00:00:00 22 10 27 30
20 2022-12-16 00:00:00 22 10 20 30
21 2022-12-17 00:00:00 22 10 20 30
22 2022-12-18 00:00:00 22 10 20 30
23 2022-12-19 00:00:00 22 10 13 30
24 2022-12-20 00:00:00 22 10 2 30
25 2022-12-21 00:00:00 22 10 2 19 30
26 2022-12-22 00:00:00 22 10 2 11 30
27 2022-12-23 00:00:00 22 10 2 4 30
28 2022-12-24 00:00:00 22 10 2 0 30
29 2022-12-25 00:00:00 22 10 2 0 22
30 2022-12-26 00:00:00 22 10 2 0 15
31 2022-12-27 00:00:00 22 10 2 0 15
32 2022-12-28 00:00:00 22 10 2 0 9