我有以下数据框架:
id date status
--------------------------------
0 2 2019-02-04 reserve
1 2 2020-01-01 reserve
2 2 2020-01-02 sold
3 3 2020-01-03 reserve
4 4 2020-01-03 booked
5 3 2020-02-05 reserve
6 4 2020-02-06 sold
7 3 2020-02-07 sold
我需要用以下条件为每一行添加权重:
如果行有status == sold
,那么weight
就是1
。
对于其余部分,我必须做以下操作:对于df[i]
,找到具有相同id
和status
的前一行,并根据日差应用权重,如(1 / days difference
)。
的dataframe是:
id date status weight
----------------------------------------
0 2 2019-02-04 reserve 0
1 2 2020-01-01 reserve 0.003
2 2 2020-01-02 sold 1
3 3 2020-01-03 reserve 0
4 4 2020-01-03 booked 0
5 3 2020-02-05 reserve 0.030
6 4 2020-02-06 sold 1
7 3 2020-02-07 sold 1
这是我的尝试,现在我不能处理查找前一行w/condition:
df.date = pd.to_datetime(df.date)
df.sort_values(['date', 'id'], inplace=True)
df = df.reset_index(drop='index')
df['weight'] = np.where(df['status'] == 'sold', 1, 0)
df = df.reset_index(drop='index')
df1 = df[~(df.status == 'sold')]
retain_index = list(df1.index.values)
df1 = df1.reset_index(drop='index')
df1['diff_in_dates'] = df1.groupby('flat_id')['date'].diff().dt.days.fillna(0)
for index, row in df1.iterrows():
if index == 0:
pass
else:
if df1.loc[index, 'weight'] == 0:
if (df1.loc[index, 'status'] == df1.loc[index - 1, 'status']) &
(df1.loc[index, 'flat_id'] == df1.loc[index - 1, 'flat_id']):
df1.loc[index, 'weight'] = 1 / df1.loc[index, 'diff_in_dates']
df1.weight.replace([np.inf, -np.inf], np.nan, inplace=True)
df1.weight = df1.weight.fillna(0)
#Here, in df1 I have weight = 0 for df[5] instead of 0.030
df = pd.DataFrame(df.append(df1, ignore_index = False)
您可以使用GroupBy.diff
差值,提取天数并获得逆值,然后使用numpy.where
在已售出/未售出状态之间进行选择,以了解如何填充默认值:
s = df.groupby(['id', 'status'])['date'].diff().dt.days.rdiv(1)
import numpy as np
df['weight'] = np.where(df['status'].eq('sold'), 1, s.fillna(0))
输出:
id date status weight
0 2 2019-02-04 reserve 0.000000
1 2 2020-01-01 reserve 0.003021
2 2 2020-01-02 sold 1.000000
3 3 2020-01-03 reserve 0.000000
4 4 2020-01-03 booked 0.000000
5 3 2020-02-05 reserve 0.030303
6 4 2020-02-06 sold 1.000000
7 3 2020-02-07 sold 1.000000