我有一个数据帧,每一行代表pbs生成的一条记录。现在我想知道每个时间段(30 分钟(的运行核心。我的表格的前 4 行:
datetime walltime ncores
2019-07-18 11:18:27 2:05:10 2
2019-07-18 11:18:45 00:50:27 1
2019-07-18 11:18:46 00:07:20 1
2019-07-18 11:18:50 00:31:34 1
我发现不可能用Peroid
元素制作PeriodIndex
(每条记录中的使用墙时间不一致(。
我想我可以创建一个频率为 30 minutes
的PeriodIndex
,然后将一定Period
内所有记录的内核数分配给相应的Period
。但是我不知道该怎么做。
我期望的是:
datetime cputime ncores
2019-07-18 11:0:00 5
2019-07-18 11:30:00 4
2019-07-18 12:00:00 3
2019-07-18 12:30:00 2
我认为你需要:
#convert to datetimes and timedeltas
df['datetime'] = pd.to_datetime(df['datetime'])
df['walltime'] = pd.to_timedelta(df['walltime'])
#create end time with flooring by 30min
df['end'] = df['datetime'].dt.floor('30min') + df['walltime']
#list by 30minutes period
zipped = zip(df['datetime'], df['end'], df['ncores'])
L = [(i, n) for s, e, n in zipped for i in pd.period_range(s, e, freq='30min')]
#DataFrame is aggregated by sum
df1 = (pd.DataFrame(L, columns=['datetime cputime', 'summed'])
.groupby('datetime cputime', as_index=False)['summed']
.sum())
print (df1)
datetime cputime summed
0 2019-07-18 11:00 5
1 2019-07-18 11:30 4
2 2019-07-18 12:00 3
3 2019-07-18 12:30 2
4 2019-07-18 13:00 2