我有两个熊猫数据帧
df1
code start_date tank product
123 2019-02-23 06:30:00 1 MS
123 2019-02-23 11:35:00 2 HS
123 2019-02-24 06:45:00 1 MS
123 2019-05-12 05:39:00 1 HS
df2
code tank_ms product_ms update from_date to_date
123 1 HS Dealer 2019-01-01 00:00:00 2019-03-31 06:00:00
123 1 MS Dealer 2019-03-31 06:00:01 2019-05-30 06:00:00
123 2 HS Dealer 2019-01-01 06:00:01 2019-05-30 06:00:00
现在我想将 df1 与 df2 连接以进行product_ms并使用日期时间比较进行更新。我想要的数据帧如下
df1
code start_date tank product product_ms update
123 2019-02-23 06:30:00 1 MS HS Dealer
123 2019-02-23 11:35:00 2 HS HS Dealer
123 2019-02-24 06:45:00 1 MS HS Dealer
123 2019-05-12 05:39:00 1 HS MS Dealer
df1
的start_date
将与from_date
进行比较,to_date
来自df2
目前我尝试做以下事情,
for x in range(df2.shape[0]):
from_date = df2['from_date'][x]
to_date = df2['to_date'][x]
product_v = tank_data['product_ms'][x]
tank_status_v = tank_data['update'][x]
df1['prodcode_ms'] = [product_v if from_date <= t_time < to_date else s for t_time,s in
zip(df1['start_date'],df2['product_ms'])]
df1['update'] = [tank_status_v if
from_date <= t_time < to_date else s for t_time,s in zip(df1['start_date'],df2['update'])]
将DataFrame.merge
与外部连接一起使用,按boolean indexing
过滤,然后 lasr 删除不必要的列:
df1['start_date']= pd.to_datetime(df1['start_date'])
df2['from_date']= pd.to_datetime(df2['from_date']
df2['to_date']= pd.to_datetime(df2['to_date']
df = df1.merge(df2.rename(columns={'tank_ms':'tank'}), on=['code','tank'], how='outer')
df = df[(df.start_date > df.from_date) & (df.start_date < df.to_date)]
df = df.drop(['from_date','to_date'], axis=1)
print (df)
code start_date tank product product_ms update
0 123 2019-02-23 06:30:00 1 MS HS Dealer
2 123 2019-02-24 06:45:00 1 MS HS Dealer
5 123 2019-05-12 05:39:00 1 HS MS Dealer
6 123 2019-02-23 11:35:00 2 HS HS Dealer