如何使用熊猫找到两个日期之间的最小值并放入新列中



我有2数据集

# df1 - minute based dataset
date                  Open
2018-01-01 00:00:00   1.0516 
2018-01-01 00:01:00   1.0516 
2018-01-01 00:02:00   1.0516 
2018-01-01 00:03:00   1.0516 
2018-01-01 00:04:00   1.0516 
....
# df2 - daily based dataset
date_from             date_to
2018-01-01            2018-01-01 02:21:00
2018-01-02            2018-01-02 01:43:00
2018-01-03            NA
2018-01-04            2018-01-04 03:11:00
2018-01-05            2018-01-05 00:19:00

对于df2date_fromdate_to中的每个值,我想在df1中获取Open中的minimum/low值,并将其放入df2的新列中,称为min_value

df1是基于分钟的排序数据集。

对于date_todf2中的NA,我们可以完全跳过这些行并移动到下一行。

我做了什么?

  • 首先,我试图找到两个日期之间的值。

  • 之后我使用了这段代码:

    df2['min_value'] =
    df1[df1['date'].dt.hour.between(df2['date_from'], df2['date_to'])].min()
    

我想搜索两个日期between但我不确定这是否是怎么做的。

但是它不起作用。你能帮忙确定我应该怎么做吗?

这对你有用吗?

df1 = pd.DataFrame({'date':['2018-01-01 00:00:00', '2018-01-01 00:01:00', '2018-01-01 00:02:00', '2018-01-01 00:03:00','2018-01-01 00:04:00'],
'Open':[1.0516, 1.0516, 1.0516, 1.0516, 1.0516]})
df2 = pd.DataFrame({'date_from':['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04','2018-01-05'],
'date_to':['2018-01-01 02:21:00', '2018-01-02 01:43:00', np.nan,
'2018-01-04 03:11:00', '2018-01-05 00:19:00']})
## converting to datetime
df1['date'] = pd.to_datetime(df1['date'])
df1.set_index('date', inplace=True)
df2['date_from'] = pd.to_datetime(df2['date_from'])
df2['date_to'] = pd.to_datetime(df2['date_to'])
def func(val):
minimum_val = np.nan
minimum_date = np.nan
if val['date_from'] is pd.NaT or val['date_to'] is pd.NaT:
pass
minimum_val = df1[val['date_from'] : val['date_to']]['Open'].min()
if  minimum_val is not np.nan:
minimum_date = df1[val['date_from'] : val['date_to']].reset_index().head(1)['date'].values[0]
pass
else:
pass
return pd.DataFrame({'date_from':[val['date_from']], 'date_to':[val['date_to']], 'Open': [minimum_val], 'min_date': [minimum_date]})
df3=pd.concat(list(df2.apply(func, axis=1)))

以下代码快照是可读的。

import pandas as pd
def get_minimum_value(row, df):
temp = df[(df['date'] >= row['date_from']) & (df['date'] <= row['date_to'])]
return temp['value'].min()
df1 = pd.read_csv("test.csv")
df2 = pd.read_csv("test2.csv")
df1['date'] = pd.to_datetime(df1['date'])
df2['date_from'] = pd.to_datetime(df2['date_from'])
df2['date_to'] = pd.to_datetime(df2['date_to'])
df2['value'] = df2.apply(func=get_minimum_value, df=df1, axis=1)

在这里df2.apply()函数将每一行作为第一个参数发送到get_minimum_value函数。将其应用于给定的数据,结果是:

date_from             date_to   value
0 2018-01-01 2018-01-01 02:21:00  1.0512
1 2018-01-02 2018-01-02 01:43:00     NaN
2 2018-01-03                 NaT     NaN
3 2018-01-04 2018-01-04 03:11:00     NaN
4 2018-01-05 2018-01-05 00:19:00     NaN

最新更新