查找具有相同ID和列值条件的前一行以分配权重



我有以下数据框架:

id      date        status
--------------------------------
0    2     2019-02-04   reserve    
1    2     2020-01-01   reserve    
2    2     2020-01-02   sold      
3    3     2020-01-03   reserve    
4    4     2020-01-03   booked     
5    3     2020-02-05   reserve    
6    4     2020-02-06   sold       
7    3     2020-02-07   sold

我需要用以下条件为每一行添加权重:

如果行有status == sold,那么weight就是1
对于其余部分,我必须做以下操作:对于df[i],找到具有相同idstatus的前一行,并根据日差应用权重,如(1 / days difference)。

dataframe是:

id      date        status    weight
----------------------------------------
0    2     2019-02-04   reserve     0 
1    2     2020-01-01   reserve   0.003 
2    2     2020-01-02   sold        1
3    3     2020-01-03   reserve     0 
4    4     2020-01-03   booked      0 
5    3     2020-02-05   reserve   0.030  
6    4     2020-02-06   sold        1 
7    3     2020-02-07   sold        1

这是我的尝试,现在我不能处理查找前一行w/condition:

df.date = pd.to_datetime(df.date)
df.sort_values(['date', 'id'], inplace=True)
df = df.reset_index(drop='index')
df['weight'] = np.where(df['status'] == 'sold', 1, 0)
df = df.reset_index(drop='index')
df1 = df[~(df.status == 'sold')]
retain_index = list(df1.index.values)
df1 = df1.reset_index(drop='index')
df1['diff_in_dates'] = df1.groupby('flat_id')['date'].diff().dt.days.fillna(0)
for index, row in df1.iterrows():
if index == 0:
pass
else:
if df1.loc[index, 'weight'] == 0:
if (df1.loc[index, 'status'] == df1.loc[index - 1, 'status']) & 
(df1.loc[index, 'flat_id'] == df1.loc[index - 1, 'flat_id']):
df1.loc[index, 'weight'] = 1 / df1.loc[index, 'diff_in_dates']
df1.weight.replace([np.inf, -np.inf], np.nan, inplace=True)
df1.weight = df1.weight.fillna(0)
#Here, in df1 I have weight = 0 for df[5] instead of 0.030
df = pd.DataFrame(df.append(df1, ignore_index = False)

您可以使用GroupBy.diff差值,提取天数并获得逆值,然后使用numpy.where在已售出/未售出状态之间进行选择,以了解如何填充默认值:

s = df.groupby(['id', 'status'])['date'].diff().dt.days.rdiv(1)
import numpy as np
df['weight'] = np.where(df['status'].eq('sold'), 1, s.fillna(0))

输出:

id       date   status    weight
0   2 2019-02-04  reserve  0.000000
1   2 2020-01-01  reserve  0.003021
2   2 2020-01-02     sold  1.000000
3   3 2020-01-03  reserve  0.000000
4   4 2020-01-03   booked  0.000000
5   3 2020-02-05  reserve  0.030303
6   4 2020-02-06     sold  1.000000
7   3 2020-02-07     sold  1.000000

最新更新