有 2 个数据帧df
和events
如下所示:
import pandas as pd
df = pd.DataFrame({'Place':['university','residential','hospital','university','residential','hospital'],
'Date':['2017-01-01','2017-01-01','2017-01-01','2017-01-02','2017-01-02','2017-01-02'],
'Event':['None','None','None','None','None','None']
})
events = pd.DataFrame({'Place':['university','residential','hospital'], 'Start_Date':['2017-01-01','2017-01-01','2017-01-01'],
'End_Date':['2017-02-26','2017-01-02','2017-01-02'],
'Event':['UniHolidays','PublicHoliday','PublicHoliday']})
#Convert to datetime
events.Start_Date = pd.to_datetime(events.Start_Date.astype(str), format='%Y-%m-%d')
events.End_Date = pd.to_datetime(events.End_Date.astype(str), format='%Y-%m-%d')
df.Date = pd.to_datetime(df.Date.astype(str), format='%Y-%m-%d')
DF在2017年每个地方都有1条记录
df:
Date Place Event
2017-01-01 university None
2017-01-01 residential None
2017-01-01 hospital None
2017-01-02 university None
2017-01-02 residential None
2017-01-02 hospital None
第二个数据帧包含这些地点的事件,但具有日期范围
events:
Place Start_Date End_Date Event
a 2017-01-01 2017-02-26 UniHoliday
b 2017-01-01 2017-01-02 PublicHoliday
c 2017-01-01 2017-01-02 PublicHoliday
任务是使用events
更新df
,以便
如果df.Place
=events.Place
并且df.Date
在范围内(events.Start_Date, events.End_Date
(,则应使用相应的event.Event
更新df.Event
预期输出为:
Date Place Event
2017-01-01 university UniHoliday
2017-01-01 residential PublicHoliday
2017-01-01 hospital PublicHoliday
2017-01-02 university UniHoliday
2017-01-02 residential PublicHoliday
2017-01-02 hospital PublicHoliday
没有重叠的事件,每个地方都有独特的事件记录
到目前为止,我一直在思考: 根据在另一个数据框中找到的范围填充数据框中的列 ,但无法理解它。任何帮助,不胜感激。谢谢!
解决方案 1:
加:
df['Event']=events['Event'].tolist()*2
到代码的末尾。
那么现在:
print(df)
是:
Date Event Place
0 2017-01-01 UniHolidays university
1 2017-01-01 PublicHoliday residential
2 2017-01-01 PublicHoliday hospital
3 2017-01-02 UniHolidays university
4 2017-01-02 PublicHoliday residential
5 2017-01-02 PublicHoliday hospital
----------------------------------------
解决方案 2:
如果希望他们在正确的位置添加,请执行以下操作:
df=df.drop('Event',1)
df.insert(2,'Event',events['Event'].tolist()*2)
在代码的末尾。
那么现在:
print(df)
输出:
Date Place Event
0 2017-01-01 university UniHolidays
1 2017-01-01 residential PublicHoliday
2 2017-01-01 hospital PublicHoliday
3 2017-01-02 university UniHolidays
4 2017-01-02 residential PublicHoliday
5 2017-01-02 hospital PublicHoliday
---------------------------------------------------------------
解决方案1+解决方案 2,将起作用,
但最好还是单打独斗。
更新:
用:
df=df.drop('Event',1)
df.insert(2,'Event',events['Event'].tolist()*(len(df['Event'])/len(events['Event'].tolist())))
在代码的末尾。
那么现在:
print(df)
输出:
Date Place Event
0 2017-01-01 university UniHolidays
1 2017-01-01 residential PublicHoliday
2 2017-01-01 hospital PublicHoliday
3 2017-01-02 university UniHolidays
4 2017-01-02 residential PublicHoliday
5 2017-01-02 hospital PublicHoliday