在pandas中每小时选择前一行



我试图在pandas数据帧中每小时获得最近的先前数据点。例如:

time  value
0   14:59:58     15
1   15:00:10     20
2   15:57:42     14
3   16:00:30      9

回来
time  value
0   15:00:00     15
1   16:00:00     14

。原始数据帧的第0和第2行。我该怎么做呢?谢谢!

使用以下玩具数据框架:

import pandas as pd
df = pd.DataFrame(
{"time": ["14:59:58", "15:00:10", "15:57:42", "16:00:30"], "value": [15, 20, 14, 9]}
)

有一种方法:

# Setup
df["time"] = pd.to_datetime(df["time"], format="%H:%M:%S")
temp_df = pd.DataFrame(df["time"].dt.round("H").drop_duplicates()).assign(value=pd.NA)
# Add round hours to df, find nearest data points and drop previous hours
new_df = (
pd.concat([df, temp_df])
.sort_values(by="time")
.fillna(method="ffill")
.pipe(lambda df_: df_[~df_["time"].isin(df["time"])])
.reset_index(drop=True)
)
# Cleanup
new_df["time"] = new_df["time"].dt.time
print(new_df)
# Output
time  value
0  15:00:00     15
1  16:00:00     14

最新更新