我有一个dataframe_1
,如下所示:
Index Time Label
0 0.000 ns Segment 1
1 2.749 sec baseline
2 3.459 min begin test
3 7.009 min end of test
我想在dataframe_1
的每一行之间添加多个新行,其中每一新行的Time列将增加一分钟,直到到达dataframe_1
的下一行的时间(以及相应的Label(。例如,上表最终应该是这样的:
Index Time Label
0 0.000 ns Segment 1
1 2.749 sec baseline
2 00:01:02.749000 baseline + 1min
3 00:02:02.749000 baseline + 2min
4 00:03:02.749000 baseline + 3min
5 3.459 min begin test
6 00:04:27.540000 begin test + 1min
7 00:05:27.540000 begin test + 2min
8 00:06:27.540000 begin test + 3min
9 7.009 min end of test
通过pd.to_timedelta()
使用Timedelta
类型是非常好的。
我认为最好的方法是将dataframe_1
的每一行分解为自己的数据帧,然后为每增加一分钟添加行,然后concat
将数据帧重新组合在一起。然而,我不确定如何做到这一点。
我应该使用嵌套的for循环来[首先]迭代dataframe_1
的行,然后[第二]迭代计数器,这样我就可以创建添加了分钟数的新行吗?
我之前没有将单独的行拆分为新的数据帧,我正在进行第二次迭代,如下所示:
baseline_row = df_legend[df_legend['Label'] == 'baseline']
[baseline_index] = baseline_row.index
baseline_time = baseline_row['Time']
interval_mins = 1
new_time = baseline_time + pd.Timedelta(minutes=interval_mins)
cutoff_time_np = df_legend.iloc[baseline_row.index + 1]['Time']
cutoff_time = pd.to_timedelta(cutoff_time_np)
while new_time.reset_index(drop=True).get(0) < cutoff_time.reset_index(drop=True).get(0):
new_row = baseline_row.copy()
new_row['Label'] = f'minute {interval_mins}'
new_row['Time'] = baseline_time + pd.Timedelta(minutes=interval_mins)
new_row.index = [baseline_index + interval_mins - 0.5]
df_legend = df_legend.append(new_row, ignore_index=False)
df_legend = df_legend.sort_index().reset_index(drop=True)
pdb.set_trace()
interval_mins += 1
new_time = baseline_time + pd.Timedelta(minutes=interval_mins)
但由于我想对原始dataframe_1
中的每一行都这样做,所以我想把它拆分成单独的数据帧,然后再放回一起。我只是不确定最好的方法是什么,尤其是因为熊猫在各行中迭代的速度显然非常慢。
我真的很感谢你的指导。
这可能比您的解决方案更快。
df.Time = pd.to_timedelta(df.Time)
df['counts'] = df.Time.diff().apply(lambda x: x.total_seconds()) / 60
df['counts'] = np.floor(df.counts.shift(-1)).fillna(0).astype(int)
df.drop(columns='Index', inplace=True)
df
Time Label counts
0 00:00:00 Segment 1 0
1 00:00:02.749000 baseline 3
2 00:03:27.540000 begin test 3
3 00:07:00.540000 end of test 0
然后使用iterrows
来获得你想要的输出。
new_df = []
for _, row in df.iterrows():
val = row.counts
if val == 0:
new_df.append(row)
else:
new_df.append(row)
new_row = row.copy()
label = row.Label
for i in range(val):
new_row = new_row.copy()
new_row.Time += pd.Timedelta('1 min')
new_row.Label = f'{label} + {i+1}min'
new_df.append(new_row)
new_df = pd.DataFrame(new_df)
new_df
Time Label counts
0 00:00:00 Segment 1 0
1 00:00:02.749000 baseline 3
1 00:01:02.749000 baseline + 1min 3
1 00:02:02.749000 baseline + 2min 3
1 00:03:02.749000 baseline + 3min 3
2 00:03:27.540000 begin test 3
2 00:04:27.540000 begin test + 1min 3
2 00:05:27.540000 begin test + 2min 3
2 00:06:27.540000 begin test + 3min 3
3 00:07:00.540000 end of test 0
我假设您将Time列从";数字单元";格式化为字符串时间的表示。类似于:
Time Label
Index
0 00:00:00.000 Segment 1
1 00:00:02.749 baseline
2 00:03:27.540 begin test
3 00:07:00.540 end of test
然后,得到你的结果:
计算timNxt-时间列移动1个位置并转换至日期时间:
timNxt = pd.to_datetime(df.Time.shift(-1))
定义以下";复制";功能:
def myRepl(row): timCurr = pd.to_datetime(row.Time) timNext = timNxt[row.name] tbl = [[timCurr.strftime('%H:%M:%S.%f'), row.Label]] if pd.notna(timNext): n = (timNext - timCurr) // np.timedelta64(1, 'm') + 1 tbl.extend([ [(timCurr + np.timedelta64(i, 'm')).strftime('%H:%M:%S.%f'), row.Label + f' + {i}min'] for i in range(1, n)]) return pd.DataFrame(tbl, columns=row.index)
将其应用于df的每一行,并连接结果:
result = pd.concat(df.apply(myRepl, axis=1).tolist(), ignore_index=True)
结果是:
Time Label
0 00:00:00.000000 Segment 1
1 00:00:02.749000 baseline
2 00:01:02.749000 baseline + 1min
3 00:02:02.749000 baseline + 2min
4 00:03:02.749000 baseline + 3min
5 00:03:27.540000 begin test
6 00:04:27.540000 begin test + 1min
7 00:05:27.540000 begin test + 2min
8 00:06:27.540000 begin test + 3min
9 00:07:00.540000 end of test
生成的DataFrame具有Time列,也称为字符串,但在秒的小数部分至少有6位数字。