移动特定的行以纠正Pandas Dataframe中缺失的值



Python初学者。

我找不到类似的东西,但我觉得应该没有那么难。

我有一个大的excel表格,其中有来自不同传感器的值,但由于测量中的错误,有些值丢失了。因此,当我将所有内容放入pandas数据框时,我有这样的内容:

Sensor2

@ti7的建议是正确的;将数据帧拆分为单独的帧,合并并填充。

sensor1 = df.filter(like='1')
sensor2 = df.filter(like='2')
(sensor1.merge(sensor2, 
how = 'outer', 
left_on='TimeStamp1', 
right_on = 'TimeStamp2', 
sort = True)
.fillna({"TimeStamp2" : df.TimeStamp1})
.dropna(subset=['TimeStamp1'])
) 
TimeStamp1  Sensor1 TimeStamp2  Sensor2
0      08:00    100.0      08:00     60.0
1      08:05    102.0      08:05      NaN
2      08:10    105.0      08:10     40.0
3      08:15    101.0      08:15     50.0
4      08:20    103.0      08:20      NaN
5      08:25    104.0      08:25     31.0

如果您的数据设置与示例完全相同,则此操作将有效,否则您将不得不适应您的数据。

# change timestamps columns to datetime. You don't say if there's a date component, so you may have to get your timestamps in order before moving on.
timestamps = df.filter(regex='TimeStamp').columns.tolist()
for t in timestamps:
df[t] = pd.to_datetime(df[t])
# get the max and min of all datetimes in the timestamp columns
end = df.filter(regex='TimeStamp').max().max()
start = df.filter(regex='TimeStamp').min().min()
# create a new date range
new_dates = pd.date_range(start=start, end=end, freq='5Min')
# get columns for iterations - should only be even and contain timestamp and sensor columns as your example shows
num_columns = df.shape[1]
# iterate and concat
dflist = []
for i in range(0, num_columns, 2):
print(i)
d = df.iloc[:, i:i+2].set_index(df.iloc[:, i].name).dropna().reindex(new_dates)
dflist.append(d)
pd.concat(dflist, axis=1)
Sensor1  Sensor2
2021-10-18 08:00:00      100     60.0
2021-10-18 08:05:00      102      NaN
2021-10-18 08:10:00      105     40.0
2021-10-18 08:15:00      101     50.0
2021-10-18 08:20:00      103      NaN
2021-10-18 08:25:00      104     31.0

相关内容

  • 没有找到相关文章

最新更新