Numpy:熊猫在两列中嵌套的条件



>简介: 我有一个熊猫数据帧列df = [openDate, high, low, open, close, volume, is_eligible]其中:

  • 打开日期是日期时间类型,
  • 高,低,开,收,音量为INT64型
  • is_eligible布尔类型

问题陈述:我想再添加一列end_date该列将根据以下内容进行计算:

如果is_eligibletrue则其end_time将是openDate哪一行的值>=is_eligible true的值最接近high

示例:假设第 3 行有is_eligible == true , high=20, low=10, open=15, close=12,那么我们必须找出下一个具有high值的直接行>=20.

的方法:我尝试了以下解决方案矢量化方法,但不起作用。

temp_var = df[["openDate","is_eligible","high"]].copy()
df["end_date"] = np.where(
temp_var['is_eligible'] == True,
np.where(
df['high']> temp_var["high"],
df["openDate"],
datetime.now()
),
datetime.now()
)

我可能有另一种方法:

#Sort data with openDate column to make sure that the next openDate is the closest one
#Make sure that openDate in a timedate format
df.sort_values(['openDate'])
#Initiate the new column with NULL values
df['end_time'] = np.nan
#Now we will go throw the indexes of eligible rows which is sorted
list_of_indexes = list(df[df['is_eligible'] == True].index)
for i in range(len(list_of_indexes) - 1):

#For each row I find the next one which match the criteria
for j in range(i + 1, len(list_of_indexes)):

if df['high'][j] >= df['high'][i]:
#once I find it, I assign the value to my new column and break the inner loop
df['end_time'][i] = df['openDate'][j]
break

最新更新