Python Pandas:基于前一行的递归值删除行

我已经无计可施了。任何帮助将非常感激!

这是我当前的数据。产品从A.1变为A.2。我想要的是当A.1变成A.2时，我想要剩下的数据只开始计算A.2的周期时间。B.1和B.3也一样。

这是当前数据。

+---------+-----------+--+
| Product | CycleTime |  |
+---------+-----------+--+
| A.1     | 10        |  |
+---------+-----------+--+
| A.1     | 5         |  |
+---------+-----------+--+
| A.1     | 1         |  |
+---------+-----------+--+
| A.1     | 2         |  |
+---------+-----------+--+
| A.1     | 1         |  |
+---------+-----------+--+
| A.2     | 5         |  |
+---------+-----------+--+
| A.2     | 1         |  |
+---------+-----------+--+
| A.1     | 2         |  |
+---------+-----------+--+
| A.1     | 10        |  |
+---------+-----------+--+
| A.2     | 10        |  |
+---------+-----------+--+
| B.1     | 1         |  |
+---------+-----------+--+
| B.1     | 2         |  |
+---------+-----------+--+
| B.1     | 1         |  |
+---------+-----------+--+
| B.3     | 5         |  |
+---------+-----------+--+
| B.1     | 1         |  |
+---------+-----------+--+
| B.3     | 2         |  |
+---------+-----------+--+
| B.1     | 10        |  |
+---------+-----------+--+

这是我想生产的。

+---------+-----------+--+
| Product | CycleTime |  |
+---------+-----------+--+
| A.1     | 10        |  |
+---------+-----------+--+
| A.1     | 5         |  |
+---------+-----------+--+
| A.1     | 1         |  |
+---------+-----------+--+
| A.1     | 2         |  |
+---------+-----------+--+
| A.1     | 1         |  |
+---------+-----------+--+
| A.2     | 5         |  |
+---------+-----------+--+
| A.2     | 1         |  |
+---------+-----------+--+
| A.2     | 10        |  |
+---------+-----------+--+
| B.1     | 1         |  |
+---------+-----------+--+
| B.1     | 2         |  |
+---------+-----------+--+
| B.1     | 1         |  |
+---------+-----------+--+
| B.3     | 5         |  |
+---------+-----------+--+
| B.3     | 2         |  |
+---------+-----------+--+

如果您删除重复项并保留第一个值，则下一个产品的索引将显示您不再可能找到当前产品的位置:

find_pos = lambda x: x.drop_duplicates('Product')['Product'].shift().dropna() 
.rename_axis('not_valid_after').reset_index()
conds = df.groupby(df['Product'].str.split('.').str[0]) 
.apply(find_pos).reset_index(drop=True)
print(conds)
# Output:
not_valid_after Product
0                5     A.1  # 5 is the first index of A.2
1               13     B.1  # 13 is the first index of B.3

现在你可以过滤你的数据框架了:

out = df.drop(conds.apply(lambda x: df.loc[df['Product'] == x['Product']]
.loc[x['not_valid_after']:].index.tolist(),
axis=1).explode().dropna().tolist())

输出:

>>> out
Product  CycleTime
0      A.1         10
1      A.1          5
2      A.1          1
3      A.1          2
4      A.1          1
5      A.2          5
6      A.2          1
9      A.2         10
10     B.1          1
11     B.1          2
12     B.1          1
13     B.3          5
15     B.3          2

setup:

df = pd.DataFrame(
{
"ProductType":["A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "A", "B", "B", "B"],
"Product":["A.1", "A.1", "A.1", "A.1", "A.1", "A.2", "A.2", "A.1", "A.1", "A.2", "B.1", "B.1", "B.1", "B.3", "B.1", "B.3", "B.1"],
"CycleTime":[10,5,1,2,1,5,1,2,10,10,1,2,1,5,1,2,10],
}
)

在产品之间创建订单。不同产品类型之间的顺序不重要，只在每个产品类型内。例如，取几个部分排序并将它们拼接在一起以创建一个总排序。您可能需要手动执行此操作，或者您可以通过编程执行此操作，这取决于您的问题。

order = ["A.1", "A.2", "B.1", "B.2", "B.3"]

创建一个映射及其逆映射，将每个产品与其在订单中的位置关联起来。

mapping = dict(enumerate(order))
inversemapping= {v:k for k,v in mapping.items()}

其思想是，对于每种产品类型，将产品映射到其订购值，并计算这些订购值的累积最大值，然后再映射回产品。

df.groupby("ProductType").apply(lambda d: d["Product"].map(inversemapping).cummax().map(mapping)).values

您将得到以下numpy数组:

array(['A.1', 'A.1', 'A.1', 'A.1', 'A.1', 'A.2', 'A.2', 'A.2', 'A.2',
'A.2', 'B.1', 'B.1', 'B.1', 'B.3', 'B.3', 'B.3', 'B.3'],
dtype=object)

相关内容

最新更新

热门标签：