如何按列计算并在一定条件下停止



我有下面的dataframe

import pandas as pd
data = pd.DataFrame({
'ID':  ['27459', '27459', '27459', '27459', '27459', '27459', '27459', '48002', '48002', '48002'],
'Invoice_Date': ['2020-06-26', '2020-06-29', '2020-06-30', '2020-07-14', '2020-07-25', 
'2020-07-30', '2020-08-02', '2020-05-13', '2020-06-20', '2020-06-28'],
'Payment_Term': [7,8,3,6,4,7,8,5,3,6],
'Payment_Date': ['2020-07-05', '2020-07-05','2020-07-03', '2020-07-21', '2020-07-31', 
'2020-08-15', '2020-08-22', '2020-06-16', '2020-06-23', '2020-07-05'],
})
df = pd.DataFrame(data, columns = ['ID', 'Invoice_Date', 'Payment_Term', 'Payment_Date'])
df['Invoice_Date'] = pd.to_datetime(df['Invoice_Date'].astype(str), format='%Y-%m-%d')
df['Payment_Date'] = pd.to_datetime(df['Payment_Date'].astype(str), format='%Y-%m-%d')
df['Due_Date'] = df['Invoice_Date'] + pd.to_timedelta(df['Payment_Term'], unit = 'd') 
df['Delay'] = df['Payment_Date'] - df['Due_Date']
df['Delay'] = df['Delay'].dt.days                                                
df['diff'] = df.groupby('ID')['Invoice_Date'].diff() / np.timedelta64(1, 'D')
df['diff'] = df['diff'].fillna(0)
df

我需要使days列在30处停止,并开始计算新列。为了使生活变得困难,应按ID分组计算days。我以前使用过:

df.loc[0, 'days'] = df.loc[0, 'diff']
for i in range(1, len(df)):
df.loc[i, 'days'] = df.loc[i-1, 'days'] + df.loc[i, 'diff']

print(df)

因此输出:

ID Invoice_Date  Payment_Term Payment_Date   Due_Date  Delay  diff  days
0  27459   2020-06-26             7   2020-07-05 2020-07-03      2   0.0   0.0
1  27459   2020-06-29             8   2020-07-05 2020-07-07     -2   3.0   3.0
2  27459   2020-06-30             3   2020-07-03 2020-07-03      0   1.0   4.0
3  27459   2020-07-14             6   2020-07-21 2020-07-20      1  14.0  18.0
4  27459   2020-07-25             4   2020-07-31 2020-07-29      2  11.0  29.0
5  27459   2020-07-30             7   2020-08-15 2020-08-06      9   5.0  34.0
6  27459   2020-08-02             8   2020-08-22 2020-08-10     12   3.0  37.0
7  48002   2020-05-13             5   2020-06-16 2020-05-18     29   0.0  37.0
8  48002   2020-06-20             3   2020-06-23 2020-06-23      0  38.0  75.0
9  48002   2020-06-28             6   2020-07-05 2020-07-04      1   8.0  83.0

我需要的结果是:

ID Invoice_Date  Payment_Term Payment_Date   Due_Date  Delay  diff  days
0  27459   2020-06-26             7   2020-07-05 2020-07-03      2   0.0   0.0
1  27459   2020-06-29             8   2020-07-05 2020-07-07     -2   3.0   3.0
2  27459   2020-06-30             3   2020-07-03 2020-07-03      0   1.0   4.0
3  27459   2020-07-14             6   2020-07-21 2020-07-20      1  14.0  18.0
4  27459   2020-07-25             4   2020-07-31 2020-07-29      2  11.0  29.0
5  27459   2020-07-30             7   2020-08-15 2020-08-06      9   5.0   0.0
6  27459   2020-08-02             8   2020-08-22 2020-08-10     12   3.0   8.0
7  48002   2020-05-13             5   2020-06-16 2020-05-18     29   0.0   0.0
8  48002   2020-06-20             3   2020-06-23 2020-06-23      0  38.0   0.0
9  48002   2020-06-28             6   2020-07-05 2020-07-04      1   8.0   0.0

差异是第4行中停止的天数的总和,因为当它添加到下一行时,它将大于30。因此,它在第5行开始新的计算。

它仍然无法捕捉到您想要的确切输出,因为即使没有总和(例如索引8(,一些相应的diff值也大于30,因此它们将保留在数据上。

def func(x):
x = x.values
values = [x[0]]
for i in range(1, len(x)):
value = values[i-1] + x[i]
values.append(value if value < 30 else x[i])
return values
df['days'] = df.groupby("ID")["diff"].transform(func)
print(df[['ID', 'diff', 'days']])

输出:

ID  diff  days
0  27459   0.0   0.0
1  27459   3.0   3.0
2  27459   1.0   4.0
3  27459  14.0  18.0
4  27459  11.0  29.0
5  27459   5.0   5.0
6  27459   3.0   8.0
7  48002   0.0   0.0
8  48002  38.0  38.0
9  48002   8.0   8.0

编辑:

更改功能定义应该有效:

def func(x):
x = x.values
values = [x[0]]
for i in range(1, len(x)):
value = values[i-1] + x[i]
if value < 30:
values.append(value)
elif x[i] >= 30:
values.append(x[i-1])
else:
values.append(x[i])
return values
df['days'] = df.groupby("ID")["diff"].transform(func)
print(df[['ID', 'diff', 'days']])
ID  diff  days
0  27459   0.0   0.0
1  27459   3.0   3.0
2  27459   1.0   4.0
3  27459  14.0  18.0
4  27459  11.0  29.0
5  27459   5.0   5.0
6  27459   3.0   8.0
7  48002   0.0   0.0
8  48002  38.0   0.0
9  48002   8.0   8.0

最新更新