我已经无计可施了。任何帮助将非常感激!
这是我当前的数据。产品从A.1变为A.2。我想要的是当A.1变成A.2时,我想要剩下的数据只开始计算A.2的周期时间。B.1和B.3也一样。
这是当前数据。
+---------+-----------+--+
| Product | CycleTime | |
+---------+-----------+--+
| A.1 | 10 | |
+---------+-----------+--+
| A.1 | 5 | |
+---------+-----------+--+
| A.1 | 1 | |
+---------+-----------+--+
| A.1 | 2 | |
+---------+-----------+--+
| A.1 | 1 | |
+---------+-----------+--+
| A.2 | 5 | |
+---------+-----------+--+
| A.2 | 1 | |
+---------+-----------+--+
| A.1 | 2 | |
+---------+-----------+--+
| A.1 | 10 | |
+---------+-----------+--+
| A.2 | 10 | |
+---------+-----------+--+
| B.1 | 1 | |
+---------+-----------+--+
| B.1 | 2 | |
+---------+-----------+--+
| B.1 | 1 | |
+---------+-----------+--+
| B.3 | 5 | |
+---------+-----------+--+
| B.1 | 1 | |
+---------+-----------+--+
| B.3 | 2 | |
+---------+-----------+--+
| B.1 | 10 | |
+---------+-----------+--+
这是我想生产的。
+---------+-----------+--+
| Product | CycleTime | |
+---------+-----------+--+
| A.1 | 10 | |
+---------+-----------+--+
| A.1 | 5 | |
+---------+-----------+--+
| A.1 | 1 | |
+---------+-----------+--+
| A.1 | 2 | |
+---------+-----------+--+
| A.1 | 1 | |
+---------+-----------+--+
| A.2 | 5 | |
+---------+-----------+--+
| A.2 | 1 | |
+---------+-----------+--+
| A.2 | 10 | |
+---------+-----------+--+
| B.1 | 1 | |
+---------+-----------+--+
| B.1 | 2 | |
+---------+-----------+--+
| B.1 | 1 | |
+---------+-----------+--+
| B.3 | 5 | |
+---------+-----------+--+
| B.3 | 2 | |
+---------+-----------+--+
如果您删除重复项并保留第一个值,则下一个产品的索引将显示您不再可能找到当前产品的位置:
find_pos = lambda x: x.drop_duplicates('Product')['Product'].shift().dropna()
.rename_axis('not_valid_after').reset_index()
conds = df.groupby(df['Product'].str.split('.').str[0])
.apply(find_pos).reset_index(drop=True)
print(conds)
# Output:
not_valid_after Product
0 5 A.1 # 5 is the first index of A.2
1 13 B.1 # 13 is the first index of B.3
现在你可以过滤你的数据框架了:
out = df.drop(conds.apply(lambda x: df.loc[df['Product'] == x['Product']]
.loc[x['not_valid_after']:].index.tolist(),
axis=1).explode().dropna().tolist())
输出:
>>> out
Product CycleTime
0 A.1 10
1 A.1 5
2 A.1 1
3 A.1 2
4 A.1 1
5 A.2 5
6 A.2 1
9 A.2 10
10 B.1 1
11 B.1 2
12 B.1 1
13 B.3 5
15 B.3 2
setup:
df = pd.DataFrame(
{
"ProductType":["A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "A", "B", "B", "B"],
"Product":["A.1", "A.1", "A.1", "A.1", "A.1", "A.2", "A.2", "A.1", "A.1", "A.2", "B.1", "B.1", "B.1", "B.3", "B.1", "B.3", "B.1"],
"CycleTime":[10,5,1,2,1,5,1,2,10,10,1,2,1,5,1,2,10],
}
)
在产品之间创建订单。不同产品类型之间的顺序不重要,只在每个产品类型内。例如,取几个部分排序并将它们拼接在一起以创建一个总排序。您可能需要手动执行此操作,或者您可以通过编程执行此操作,这取决于您的问题。
order = ["A.1", "A.2", "B.1", "B.2", "B.3"]
创建一个映射及其逆映射,将每个产品与其在订单中的位置关联起来。
mapping = dict(enumerate(order))
inversemapping= {v:k for k,v in mapping.items()}
其思想是,对于每种产品类型,将产品映射到其订购值,并计算这些订购值的累积最大值,然后再映射回产品。
df.groupby("ProductType").apply(lambda d: d["Product"].map(inversemapping).cummax().map(mapping)).values
您将得到以下numpy数组:
array(['A.1', 'A.1', 'A.1', 'A.1', 'A.1', 'A.2', 'A.2', 'A.2', 'A.2',
'A.2', 'B.1', 'B.1', 'B.1', 'B.3', 'B.3', 'B.3', 'B.3'],
dtype=object)