我的数据集有Customer_Code, As_Of_Date和24个产品。乘积的值为0 -1。我根据客户代码和as_of_date来订购数据集。我想从乘积的下一行减去上一行。这里重要的是让每个客户根据他们的as_of_date离开。
我试着
df2.set_index (Customer_Code) .diff ()
和
df2.set_index (As_Of_Date) .diff ()
和
我在新["Customer_Code" .unique ():
df14 = df12.set_index (As_Of_Date) .diff ()
,但不正确。我的代码对第一个客户是正确的,但对第二个客户不是正确的。我该怎么办?
你没有分享任何数据,所以我编了一些你可以使用的东西。你期望的结果也缺乏。为了进一步参考,请不要分享图片。假设你有这样的数据:
id date product
0 12 2008-01-01 1
1 12 2008-01-01 2
2 12 2008-01-01 1
3 12 2008-01-02 4
4 12 2008-01-02 5
5 34 2009-01-01 6
6 34 2009-01-01 7
7 34 2009-01-01 84
8 34 2009-01-02 4
9 34 2009-01-02 3
10 34 2009-01-02 3
11 34 2009-01-03 5
12 34 2009-01-03 6
13 34 2009-01-03 8
根据我的理解,您希望从前一行减去产品值,并按id和日期分组。(如果有其他组,请适应)。然后需要这样做:
mask = df.duplicated(['id', 'date'])
df['product_diff'] = (np.where(mask, (df['product'] - df['product'].shift(1)), np.nan))
返回:
id date product product_diff
0 12 2008-01-01 1 NaN
1 12 2008-01-01 2 1.0
2 12 2008-01-01 1 -1.0
3 12 2008-01-02 4 NaN
4 12 2008-01-02 5 1.0
5 34 2009-01-01 6 NaN
6 34 2009-01-01 7 1.0
7 34 2009-01-01 84 77.0
8 34 2009-01-02 4 NaN
9 34 2009-01-02 3 -1.0
10 34 2009-01-02 3 0.0
11 34 2009-01-03 5 NaN
12 34 2009-01-03 6 1.0
13 34 2009-01-03 8 2.0
或者如果你想让它反过来:
mask = df.duplicated(['id', 'date'])
df['product_diff'] = (np.where(mask, (df['product'] - df['product'].shift(-1)), np.nan))
给了:
id date product product_diff
0 12 2008-01-01 1 NaN
1 12 2008-01-01 2 1.0
2 12 2008-01-01 1 -3.0
3 12 2008-01-02 4 NaN
4 12 2008-01-02 5 -1.0
5 34 2009-01-01 6 NaN
6 34 2009-01-01 7 -77.0
7 34 2009-01-01 84 80.0
8 34 2009-01-02 4 NaN
9 34 2009-01-02 3 0.0
10 34 2009-01-02 3 -2.0
11 34 2009-01-03 5 NaN
12 34 2009-01-03 6 -2.0
13 34 2009-01-03 8 NaN