只在开头删除重复的数据框列项,同时保留最后一个条目



下面的数据框是我试图绘制的,但是在每列中有几个重复的条目。通过只维护最后一个条目,我希望消除每列中最初重复的组件,这样它们就不会出现在图中(忽略如果在中间和最后重复)。

有人能帮我解决这个问题吗?

我试过的代码,这删除了如果重复在整个行:

df = df.drop_duplicates(subset=df.columns[1:], keep='last')
df = df.groupby((df.shift() != df).cumsum()).filter(lambda x: len(x) < 5)

输入:

Date    Build1  Build2  Build3  Build4  Build5  Build6
2022-11-26 00:00:00 30  30  30  30  30  30
2022-11-27 00:00:00 30  30  30  30  30  30
2022-11-28 00:00:00 30  30  30  30  30  30
2022-11-29 00:00:00 30  30  30  30  30  30
2022-11-30 00:00:00 30  30  30  30  30  30
2022-12-01 00:00:00 28  30  30  30  30  30
2022-12-02 00:00:00 25  30  30  30  30  30
2022-12-03 00:00:00 25  30  30  30  30  30
2022-12-04 00:00:00 22  28  30  30  30  30
2022-12-05 00:00:00 22  26  30  30  30  30
2022-12-06 00:00:00 22  23  30  30  30  30
2022-12-07 00:00:00 22  22  30  30  30  30
2022-12-08 00:00:00 22  20  30  30  30  30
2022-12-09 00:00:00 22  20  25  30  30  30
2022-12-10 00:00:00 22  20  23  30  30  30
2022-12-11 00:00:00 22  20  23  30  30  30
2022-12-12 00:00:00 22  20  18  30  30  30
2022-12-13 00:00:00 22  20  14  30  30  30
2022-12-14 00:00:00 22  20  11  30  30  30
2022-12-15 00:00:00 22  20  10  27  30  30
2022-12-16 00:00:00 22  20  10  20  30  30
2022-12-17 00:00:00 22  20  10  20  30  30
2022-12-18 00:00:00 22  20  10  20  30  30
2022-12-19 00:00:00 22  20  10  13  30  30
2022-12-20 00:00:00 22  20  10  2   30  30
2022-12-21 00:00:00 22  20  10  2   19  30
2022-12-22 00:00:00 22  20  10  2   11  30
2022-12-23 00:00:00 22  20  10  2   4   30
2022-12-24 00:00:00 22  20  10  2   0   30
2022-12-25 00:00:00 22  20  10  2   0   22
2022-12-26 00:00:00 22  20  10  2   0   15
2022-12-27 00:00:00 22  20  10  2   0   15
2022-12-28 00:00:00 22  20  10  2   0   9

预期输出:

Date    Build1  Build2  Build3  Build4  Build5  Build6
2022-11-26 00:00:00                     
2022-11-27 00:00:00                     
2022-11-28 00:00:00                     
2022-11-29 00:00:00                     
2022-11-30 00:00:00 30                  
2022-12-01 00:00:00 28                  
2022-12-02 00:00:00 25                  
2022-12-03 00:00:00 25  30              
2022-12-04 00:00:00 22  28              
2022-12-05 00:00:00 22  26              
2022-12-06 00:00:00 22  23              
2022-12-07 00:00:00 22  22              
2022-12-08 00:00:00 22  20  30          
2022-12-09 00:00:00 22  20  25          
2022-12-10 00:00:00 22  20  23          
2022-12-11 00:00:00 22  20  23          
2022-12-12 00:00:00 22  20  18          
2022-12-13 00:00:00 22  20  14          
2022-12-14 00:00:00 22  20  11  30      
2022-12-15 00:00:00 22  20  10  27      
2022-12-16 00:00:00 22  20  10  20      
2022-12-17 00:00:00 22  20  10  20      
2022-12-18 00:00:00 22  20  10  20      
2022-12-19 00:00:00 22  20  10  13      
2022-12-20 00:00:00 22  20  10  2   30  
2022-12-21 00:00:00 22  20  10  2   19  
2022-12-22 00:00:00 22  20  10  2   11  
2022-12-23 00:00:00 22  20  10  2   4   
2022-12-24 00:00:00 22  20  10  2   0   30
2022-12-25 00:00:00 22  20  10  2   0   22
2022-12-26 00:00:00 22  20  10  2   0   15
2022-12-27 00:00:00 22  20  10  2   0   15
2022-12-28 00:00:00 22  20  10  2   0   9

你可以直接做

is_duplicate = df.apply(pd.Series.duplicated, axis=1)
df.where(~is_duplicate, np.nan)

,

Date  Build1  Build2  Build3  Build4
0   2022-11-26 00:00:00 30      30     NaN     NaN     NaN
1   2022-11-27 00:00:00 30      30     NaN     NaN     NaN
2   2022-11-28 00:00:00 30      30     NaN     NaN     NaN
3   2022-11-29 00:00:00 30      30     NaN     NaN     NaN
4   2022-11-30 00:00:00 30      30     NaN     NaN     NaN
5   2022-12-01 00:00:00 28      30     NaN     NaN     NaN
6   2022-12-02 00:00:00 25      30     NaN     NaN     NaN
7   2022-12-03 00:00:00 25      30     NaN     NaN     NaN
8   2022-12-04 00:00:00 22      30     NaN     NaN     NaN
9   2022-12-05 00:00:00 22      30     NaN     NaN     NaN
10  2022-12-06 00:00:00 22      30     NaN     NaN     NaN
11  2022-12-07 00:00:00 22      30     NaN     NaN     NaN
12  2022-12-08 00:00:00 22      30     NaN     NaN     NaN
13  2022-12-09 00:00:00 22      25    30.0     NaN     NaN
14  2022-12-10 00:00:00 22      23    30.0     NaN     NaN
15  2022-12-11 00:00:00 22      23    30.0     NaN     NaN
16  2022-12-12 00:00:00 22      18    30.0     NaN     NaN
17  2022-12-13 00:00:00 22      14    30.0     NaN     NaN
18  2022-12-14 00:00:00 22      11    30.0     NaN     NaN
19  2022-12-15 00:00:00 22      10    27.0    30.0     NaN
20  2022-12-16 00:00:00 22      10    20.0    30.0     NaN
21  2022-12-17 00:00:00 22      10    20.0    30.0     NaN
22  2022-12-18 00:00:00 22      10    20.0    30.0     NaN
23  2022-12-19 00:00:00 22      10    13.0    30.0     NaN
24  2022-12-20 00:00:00 22      10     2.0    30.0     NaN
25  2022-12-21 00:00:00 22      10     2.0    19.0    30.0
26  2022-12-22 00:00:00 22      10     2.0    11.0    30.0
27  2022-12-23 00:00:00 22      10     2.0     4.0    30.0
28  2022-12-24 00:00:00 22      10     2.0     0.0    30.0
29  2022-12-25 00:00:00 22      10     2.0     0.0    22.0
30  2022-12-26 00:00:00 22      10     2.0     0.0    15.0
31  2022-12-27 00:00:00 22      10     2.0     0.0    15.0
32  2022-12-28 00:00:00 22      10     2.0     0.0     9.0

is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, ''))

给了:

Date  Build1 Build2 Build3 Build4
0   2022-11-26 00:00:00 30      30                     
1   2022-11-27 00:00:00 30      30                     
2   2022-11-28 00:00:00 30      30                     
3   2022-11-29 00:00:00 30      30                     
4   2022-11-30 00:00:00 30      30                     
5   2022-12-01 00:00:00 28      30                     
6   2022-12-02 00:00:00 25      30                     
7   2022-12-03 00:00:00 25      30                     
8   2022-12-04 00:00:00 22      30                     
9   2022-12-05 00:00:00 22      30                     
10  2022-12-06 00:00:00 22      30                     
11  2022-12-07 00:00:00 22      30                     
12  2022-12-08 00:00:00 22      30                     
13  2022-12-09 00:00:00 22      25     30              
14  2022-12-10 00:00:00 22      23     30              
15  2022-12-11 00:00:00 22      23     30              
16  2022-12-12 00:00:00 22      18     30              
17  2022-12-13 00:00:00 22      14     30              
18  2022-12-14 00:00:00 22      11     30              
19  2022-12-15 00:00:00 22      10     27     30       
20  2022-12-16 00:00:00 22      10     20     30       
21  2022-12-17 00:00:00 22      10     20     30       
22  2022-12-18 00:00:00 22      10     20     30       
23  2022-12-19 00:00:00 22      10     13     30       
24  2022-12-20 00:00:00 22      10      2     30       
25  2022-12-21 00:00:00 22      10      2     19     30
26  2022-12-22 00:00:00 22      10      2     11     30
27  2022-12-23 00:00:00 22      10      2      4     30
28  2022-12-24 00:00:00 22      10      2      0     30
29  2022-12-25 00:00:00 22      10      2      0     22
30  2022-12-26 00:00:00 22      10      2      0     15
31  2022-12-27 00:00:00 22      10      2      0     15
32  2022-12-28 00:00:00 22      10      2      0      9

相关内容

  • 没有找到相关文章

最新更新