我有一个函数peak_value,它接受两个iuput区域和数据,并返回一列新的数据,其中有潜在的峰值作为输出。实际上,我想在数据帧列表上应用这个峰值函数,例如data=[df1,df2,df3…dfn2]每个数据帧都有各自的面积值,例如area=[a1,a2,a3…an]。我已经应用了argrelaxtrema函数来加快处理速度,但到目前为止还没有成功。有什么办法让它变快吗?
def peak_value(data,area):
lag = np.round(5 + np.log10(area))
data_tmp = data.loc[data['loc_max']==1]
data_sorted = data_tmp.sort_values(by='value',ascending=False)
data_sorted['idx'] = data_sorted.index
data_sorted = data_sorted.reset_index(drop = True)
flag = 0
i = 0
updated = len(data_sorted)
while i < updated and flag == 0:
lag_pre = np.arange(data_sorted['date'][i]-lag,data_sorted['date'][i])
lag_post = np.arange(data_sorted['date'][i]+1,data_sorted['date'][i]+lag+1)
lag_interval = np.concatenate((lag_pre,lag_post))
ind_del = data_sorted.iloc[np.isin(data_sorted['date'],lag_interval)].index
data_sorted = data_sorted.drop(data_sorted.index[ind_del])
data_sorted = data_sorted.reset_index(drop=True)
updated = len(data_sorted)
if i < updated:
flag = 0
else:
flag = 1
i = i+1
#adds a column that says which are the potential peaks
data['Potential_peaks'] = np.zeros(len(data))
data['Potential_peaks'].loc[data_sorted['idx']] = 1
return data
def max_new(data):
loc_opt_ind = argrelextrema(df, np.greater)
Potential_peaks = np.zeros(len(data))
Potential_peaks[loc_opt_ind] = 1
data['Potential_peaks']= Potential_peaks
return data
new_max= []
for index, df in enumerate(data):
max_values = max_new(df).Potential_peaks
new_max.append(max_values)
如果可以并行运行解决方案,那么我认为Joblib是一个可行的解决方案。
我自己试过了,我很喜欢。这项工作所需的修改量非常低。
下面是一个关于它如何工作的例子:
from joblib import Parallel, delayed
numbers = list(range(10))
def square(x):
return x ** 2
result = Parallel(n_jobs=3)(delayed(square)(number) for number in numbers)
print(result) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
如果此解决方案不适用于您,请分享有关您的问题的更多详细信息。