如何过滤给定特定小时的数据帧?



给定两个数据帧:

df1:

datetime              v
2020-10-01 12:00:00   15
2020-10-02            4
2020-10-03 07:00:00   3
2020-10-03 08:01:00   51
2020-10-03 09:00:00   9

df2:

datetime              p
2020-10-01 11:00:00   1
2020-10-01 12:00:00   2
2020-10-02 13:00:00   14
2020-10-02 13:01:00   5
2020-10-03 20:00:00   12
2020-10-03 02:01:00   30
2020-10-03 07:00:00   7

我想将这两个数据帧合并为一个,并且策略是每天查找08:00周围最近的值。最后的结果应该是

datetime              v       p
2020-10-01 08:00:00   15      1
2020-10-02 08:00:00   4       14
2020-10-03 08:00:00   51      7

我如何实现这个?

给定以下数据帧:

import pandas as pd
df1 = pd.DataFrame(
{
"datetime": [
"2020-10-01 12:00:00",
"2020-10-02",
"2020-10-03 07:00:00",
"2020-10-03 08:01:00",
"2020-10-03 09:00:00",
],
"v": [15, 4, 3, 51, 9],
}
)
df2 = pd.DataFrame(
{
"datetime": [
"2020-10-01 11:00:00",
"2020-10-01 12:00:00",
"2020-10-02 13:00:00",
"2020-10-02 13:01:00",
"2020-10-03 20:00:00",
"2020-10-03 02:01:00",
"2020-10-03 07:00:00",
],
"p": [1, 2, 14, 5, 12, 30, 7],
}
)

你可以定义一个辅助函数:

def align(df):
# Set proper type
df["datetime"] = pd.to_datetime(df["datetime"])
# Slice df by day
dfs = [
df.copy().loc[df["datetime"].dt.date == item, :]
for item in df["datetime"].dt.date.unique()
]
# Evaluate distance in seconds between given hour and 08:00:00 and filter on min
for i, df in enumerate(dfs):
df["target"] = pd.to_datetime(df["datetime"].dt.date.astype(str) + " 08:00:00")
df["distance"] = (
df["target"].map(lambda x: x.hour * 3600 + x.minute * 60 + x.second)
- df["datetime"].map(lambda x: x.hour * 3600 + x.minute * 60 + x.second)
).abs()
dfs[i] = df.loc[df["distance"].idxmin(), :]
# Concatenate filtered dataframes
df = (
pd.concat(dfs, axis=1)
.T.drop(columns=["datetime", "distance"])
.rename(columns={"target": "datetime"})
.set_index("datetime")
)
return df

应用于df1和df2,然后合并:

df = pd.merge(
right=align(df1), left=align(df2), how="outer", right_index=True, left_index=True
).reindex(columns=["v", "p"])
print(df)
# Output
v   p
datetime
2020-10-01 08:00:00  15   1
2020-10-02 08:00:00   4  14
2020-10-03 08:00:00  51   7

最新更新