Pandas数据帧屏蔽问题:引用前几行并选择值



我是Pandas的新手,我试图避免在DataFrame上迭代,而是尝试使用矢量化。我不能得到我想要的结果;我需要帮助在更复杂的掩蔽和选择语句

这是我的代码:

import random
from datetime import datetime, timedelta
import pandas as pd
dates = []
temp = []
press = []
vel   = []
fmt = '%Y-%m-%d %H:%M:%S'
stime = datetime.strptime('2020-01-06 10:28:16', fmt)
etime = datetime.strptime('2020-04-10 03:43:12', fmt)
td = etime - stime
l = set([random.random() for x in range(0, 1000)])
dates = [((td * x) + stime) for x in random.sample(l, 100)]
for i in range(100):    
press.append(random.uniform(14,95.5))
temp.append(random.uniform(-15,45))
vel.append(random.uniform(50,153))

measurements = {
'date' :     dates,
'pressure' : press,
'velocity' : vel,
'temperature': temp
}
df = pd.DataFrame(measurements)
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df = df.sort_index()
df2 = pd.DataFrame()
# if temp increased from previous row, set flag      
df2['temp_inc'] =  df['temperature'] - df.shift(1)['temperature'] > 0
df2['temp_inc'] = df2['temp_inc'].replace({True: 1, False: 0})
# need to fetch velocity where pressure has increased from previous row, else 0
press_up_mask    =  df.where( (df['pressure'] - df.shift(1)['pressure']) > 0)
#df2['press_spike_velocity']  =  df[press_up_mask]['velocity']
# Need to perform calc based on 'temp_inc' column: if 'temp_inc' column is 1: calculate pressure * velocity, else 0
temp_inc_mask = df2['temp_inc'] == 1
df2['boyle_fact']  =  df[temp_inc_mask]['pressure'] * df[temp_inc_mask]['velocity']
# Get some stats
df2['short_max_temp']     =  df['temperature'].rolling(3).max()
df2['long_min_pressure']  =  df['pressure'].rolling(30).min()
print(df.head())
print(df2.head())

如何正确计算列"press_spike_velocity"one_answers"boyle_fact"?

从计算开始:

# if temp increased from previous row, set flag      
df2['temp_inc'] =  df['temperature'] - df.shift(1)['temperature'] > 0
# setting int type instead of replace
df2['temp_inc'] = df2['temp_inc'].astype(int)
# need to fetch velocity where pressure has increased from previous row, else 0
press_up_mask = df.where( (df['pressure'] - df['pressure'].shift(1)) > 0)
# set column to velocity then mask in zeros via assignment
df2['press_spike_velocity'] = df['velocity'].copy()
df2['press_spike_velocity'][~press_up_mask] = 0
# Need to perform calc based on 'temp_inc' column: if 'temp_inc' column is 1: calculate pressure * velocity, else 0
temp_inc_mask = df2['temp_inc'] == 1
# same masking approach as above
df2['boyle_fact'] = df['pressure'] * df['velocity']
df2['boyle_fact'][~temp_inc_mask] = 0

这是解决问题的最简单方法,只需对代码本身进行最小的更改。如果你更深入地研究panda,你可能会找到通过就地操作在少1-2行中实现这一点的方法,但我不知道你会从中获得多少性能或可读性。

最新更新