我正在尝试为数据框架创建一个新列,但似乎在新列中给出了不正确的结果,数据如下:
df = pd.DataFrame(np.random.randint(0,30,size=10),
columns=["Random"],
index=pd.date_range("20180101", periods=10))
df=df.reset_index()
df.loc[:,'Random'] = '20'
df['Recommandation']=['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff']=[3,2,4,1,6,1,2,2,3,1]
df
我试图通过使用以下条件在'new'中创建另一个列:
If the 'index' is in the first three date, then, 'new'='random',
elif the 'Recommendation' is yes, than 'new'= 'Value of the previous row of the new column'+'diff'
else: 'new'= 'Value of the previous row of the new column'
我的代码如下:
import numpy as np
df['new'] = 0
df['new'] = np.select([df['index'].isin(df['index'].iloc[:3]), df['Recommandation'].eq('Yes')],
[df['new'], df['diff']+df['new'].shift(1)],
df['new'].shift(1)
)
#The expected output
df[new]=[20,20,20,21,27,28,28,28,31,31]
df
try this:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,30,size=10),
columns=["Random"],
index=pd.date_range("20180101", periods=10))
df = df.reset_index()
df.loc[:,'Random'] = 20
df['Recommandation'] = ['No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No']
df['diff'] = [3,2,4,1,6,1,2,2,3,1]
df.loc[5, 'index'] = pd.to_datetime('2018-01-02') # I modified this data
df['new'] = df['diff']
df['new'] = df['new'].where(df.Recommandation.eq('Yes'))
# the mask that 'index' is in the first three date
m = df['index'].isin(df['index'][:3])
df.loc[m, 'new'] = df.Random
idx = m[m].index.drop([df.index.min()], errors='ignore')
df['new'] = pd.concat(map(lambda x: x.cumsum().ffill(), np.split(df.new, idx)))
df
>>>
index Random Recommandation diff new
0 2018-01-01 20 No 3 20.0
1 2018-01-02 20 Yes 2 20.0
2 2018-01-03 20 No 4 20.0
3 2018-01-04 20 Yes 1 21.0
4 2018-01-05 20 Yes 6 27.0
5 2018-01-02 20 Yes 1 20.0
6 2018-01-07 20 No 2 20.0
7 2018-01-08 20 No 2 20.0
8 2018-01-09 20 Yes 3 23.0
9 2018-01-10 20 No 1 23.0