Python初学者。
我找不到类似的东西,但我觉得应该没有那么难。
我有一个大的excel表格,其中有来自不同传感器的值,但由于测量中的错误,有些值丢失了。因此,当我将所有内容放入pandas数据框时,我有这样的内容:
Sensor2南南
@ti7的建议是正确的;将数据帧拆分为单独的帧,合并并填充。
sensor1 = df.filter(like='1')
sensor2 = df.filter(like='2')
(sensor1.merge(sensor2,
how = 'outer',
left_on='TimeStamp1',
right_on = 'TimeStamp2',
sort = True)
.fillna({"TimeStamp2" : df.TimeStamp1})
.dropna(subset=['TimeStamp1'])
)
TimeStamp1 Sensor1 TimeStamp2 Sensor2
0 08:00 100.0 08:00 60.0
1 08:05 102.0 08:05 NaN
2 08:10 105.0 08:10 40.0
3 08:15 101.0 08:15 50.0
4 08:20 103.0 08:20 NaN
5 08:25 104.0 08:25 31.0
如果您的数据设置与示例完全相同,则此操作将有效,否则您将不得不适应您的数据。
# change timestamps columns to datetime. You don't say if there's a date component, so you may have to get your timestamps in order before moving on.
timestamps = df.filter(regex='TimeStamp').columns.tolist()
for t in timestamps:
df[t] = pd.to_datetime(df[t])
# get the max and min of all datetimes in the timestamp columns
end = df.filter(regex='TimeStamp').max().max()
start = df.filter(regex='TimeStamp').min().min()
# create a new date range
new_dates = pd.date_range(start=start, end=end, freq='5Min')
# get columns for iterations - should only be even and contain timestamp and sensor columns as your example shows
num_columns = df.shape[1]
# iterate and concat
dflist = []
for i in range(0, num_columns, 2):
print(i)
d = df.iloc[:, i:i+2].set_index(df.iloc[:, i].name).dropna().reindex(new_dates)
dflist.append(d)
pd.concat(dflist, axis=1)
Sensor1 Sensor2
2021-10-18 08:00:00 100 60.0
2021-10-18 08:05:00 102 NaN
2021-10-18 08:10:00 105 40.0
2021-10-18 08:15:00 101 50.0
2021-10-18 08:20:00 103 NaN
2021-10-18 08:25:00 104 31.0