使用pandas date_range创建一个新的DataFrame



我有以下DataFrame:

date_start          date_end
0  2023-01-01 16:00:00 2023-01-01 17:00:00
1  2023-01-02 16:00:00 2023-01-02 17:00:00
2  2023-01-03 16:00:00 2023-01-03 17:00:00
3  2023-01-04 17:00:00 2023-01-04 19:00:00
4  NaN                 NaN

,我想创建一个新的DataFrame,它将包含从date_start开始的值,并在每行的date_end结束。对于第一行,使用下面的代码:

new_df = pd.Series(pd.date_range(start=df['date_start'][0], end=df['date_end'][0], freq= '15min'))

得到如下结果:

0   2023-01-01 16:00:00
1   2023-01-01 16:15:00
2   2023-01-01 16:30:00
3   2023-01-01 16:45:00
4   2023-01-01 17:00:00

我怎样才能得到相同的结果为所有行df合并在一个新的df?

您可以使用列表推导式和concat:

out = pd.concat([pd.DataFrame({'date': pd.date_range(start=start, end=end,
freq='15min')})
for start, end in zip(df['date_start'], df['date_end'])],
ignore_index=True))

输出:

date
0  2023-01-01 16:00:00
1  2023-01-01 16:15:00
2  2023-01-01 16:30:00
3  2023-01-01 16:45:00
4  2023-01-01 17:00:00
5  2023-01-02 16:00:00
6  2023-01-02 16:15:00
7  2023-01-02 16:30:00
8  2023-01-02 16:45:00
9  2023-01-02 17:00:00
10 2023-01-03 16:00:00
11 2023-01-03 16:15:00
12 2023-01-03 16:30:00
13 2023-01-03 16:45:00
14 2023-01-03 17:00:00
15 2023-01-04 17:00:00
16 2023-01-04 17:15:00
17 2023-01-04 17:30:00
18 2023-01-04 17:45:00
19 2023-01-04 18:00:00
20 2023-01-04 18:15:00
21 2023-01-04 18:30:00
22 2023-01-04 18:45:00
23 2023-01-04 19:00:00

处理NAs:

out = pd.concat([pd.DataFrame({'date': pd.date_range(start=start, end=end,
freq='15min')})
for start, end in zip(df['date_start'], df['date_end'])
if pd.notna(start) and pd.notna(end)
],
ignore_index=True)

加上前面的答案,date_range有一个to_series()方法,您也可以这样进行:

pd.concat(
[
pd.date_range(start=row['date_start'], end=row['date_end'], freq= '15min').to_series()
for _, row in df.iterrows()
], ignore_index=True
)

相关内容

  • 没有找到相关文章

最新更新