我希望将True和False转换为DataFrame中的特定值。我希望在";时间";以秒为单位的变量小于300;1〃;。任何数字在任何数字之后(小于300秒(超过300秒将得到相同的特定数字"1"。在该数字之后的任何数字(高于300秒(应该总是小于300秒,并得到另一个特定的数字,例如"0";2〃;等等。
这是我的代码:
import time
from datetime import datetime, date, time, timedelta
from datetime import datetime as dt
import numpy as np
df['timestamp'] = pd.to_datetime (df['timestamp'])
df['delta'] = (df['timestamp']-df['timestamp'].shift())
df['time'] = df['delta'].dt.total_seconds()
df['outlier'] = df['time'] > 300
df['Column1'] = np.where(df['outlier'], np.where(df['time'] > 300, '1','1'),'na')
这是输入。这是我拥有的DataFrame的示例:
timestamp delta time outlier output
0 2020-11-08 17:54:53 NaT NaN False na
1 2020-11-08 17:54:56 0 days 00:00:03 3.0 False na
2 2020-11-08 17:54:57 0 days 00:00:01 1.0 False na
3 2020-11-08 21:04:41 0 days 03:09:44 11384.0 True 1
4 2020-11-08 21:04:52 0 days 00:00:11 11.0 False na
5 2020-11-08 21:04:53 0 days 00:00:01 1.0 False na
6 2020-11-10 20:36:32 1 days 23:31:39 171099.0 True 1
7 2020-11-10 20:37:01 0 days 00:00:29 29.0 False na
8 2020-11-10 20:37:04 0 days 00:00:03 3.0 False na
这是我正在寻找的实际输出:
timestamp delta time outlier output
0 2020-11-08 17:54:53 NaT NaN False NaN
1 2020-11-08 17:54:56 0 days 00:00:03 3.0 False 1
2 2020-11-08 17:54:57 0 days 00:00:01 1.0 False 1
3 2020-11-08 21:04:41 0 days 03:09:44 11384.0 True 1
4 2020-11-08 21:04:52 0 days 00:00:11 11.0 False 2
5 2020-11-08 21:04:53 0 days 00:00:01 1.0 False 2
6 2020-11-10 20:36:32 1 days 23:31:39 171099.0 True 2
7 2020-11-10 20:37:01 0 days 00:00:29 29.0 False 3
8 2020-11-10 20:37:04 0 days 00:00:03 3.0 False 3
请注意,这只是Dataframe的一个示例,所以请帮助我修复上面的代码,并使其适用于具有大量行的Dataframe。
类似的东西?
df['output'] = (df.outlier.cumsum() + 1).map(str).shift()
如果您喜欢整数:
df['output'] = (df.outlier.cumsum() + 1).map(int).astype(object).shift()