假设传感器连接在3个攀爬者上,这些传感器在随机时间捕获某个测量值。数据被捕获到下面的数据帧中(数据帧比这长得多):
df = pd.DataFrame({
'Name': ['Cody', 'Dustin', 'Dustin', 'Cody', 'Ryan', 'Dustin', 'Ryan', 'Cody'],
'Timestamp': ['08:10:23', '08:12:58', '08:15:02', '08:19:43', '08:21:00', '08:30:17', '08:34:01', '08:34:59'],
'Category': ['Body Temp', 'Altitude', 'Heart Rate', 'Body Temp', 'Heart Rate', 'Heart Rate', 'Altitude', 'Altitude'],
'Body Temp': [35.9, np.nan, np.nan, 36.2, np.nan, np.nan, np.nan, np.nan],
'Altitude': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 12, 6],
'Heart Rate': [np.nan, np.nan, 75, np.nan, 71, 69, np.nan, np.nan]
})
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN NaN 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN NaN 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 NaN
7 Cody 08:34:59 Altitude NaN 6.0 NaN
目的是根据每个攀登者和时间戳不断更新每一行的测量值,这样每个攀登者的每一行都将更新其测量值。
所以结果应该是这样的:
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN 7.0 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN 7.0 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 71.0
7 Cody 08:34:59 Altitude 36.2 6.0 NaN
到目前为止,我已经想到使用.sort_value()
来分离登山者和工作。但我很难弄清楚如何不断更新每一行。这需要函数或迭代吗?
如果在每个攀登者的测量中存在先前的值,则该作业基本上似乎是用先前的值填充缺失的值,因此groupby.ffill
应该完成该工作:
out = df[['Name']].join(df.groupby('Name').ffill())
输出:
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN 7.0 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN 7.0 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 71.0
7 Cody 08:34:59 Altitude 36.2 6.0 NaN