Pandas:根据条件添加计算列



我想根据上一步中计算的meanstdev列的值来计算列。我无法正确使用lambda函数。

#Import necessary modules
import pandas as pd
data = {
'A':[1, 2, 3],
'B':[4, 5, 6],
'C':[7, 8, 9] }

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
data_mean = df.mean(axis=1)
data_stdev = df.std(axis=1)
#Calculate LV column for data df
df['LV'] = df.apply(
lambda row : 0
if data_mean < 55.5:
LV = (55.5-data_mean)+(3.1*data_stdev)
elif data_mean > 57.5:
LV = (data_mean-57.5)+(3.1*data_stdev)
else:
LV = (3.1*data_stdev), 
axis = 1)
display(df)

您可以尝试的另一种方法-与另一种答案的速度相似(如果不是稍快的话(:


#Import necessary modules
import pandas as pd
def calculate_lv(x):
if x['MEAN'] < 55.5:
return (55.5 - x['MEAN']) + (3.1 * x['STDEV'])
elif x['MEAN'] > 57.5:
return (x['MEAN'] - 57.5) + (3.1 * x['STDEV'])
else:
return x['STDEV'] * 3.1
data = {
'A':[1, 2, 3],
'B':[4, 5, 6],
'C':[7, 8, 9] }

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
df['MEAN'] = df.mean(axis=1)
df['STDEV'] = df.std(axis=1)

df['LV'] = df.apply(lambda x: calculate_lv(x), axis=1)

我建议使用矢量化方法,因为它可以更快地工作:


#Import necessary modules
import pandas as pd
data = {
'A':[1, 2, 3],
'B':[4, 5, 6],
'C':[7, 8, 9] }

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
data_mean = df.mean(axis=1)
data_stdev = df.std(axis=1)
#Calculate LV column for data df
# base value
df['LV'] = 3.1 * data_stdev
# different values
df.loc[data_mean < 55.5, 'LV'] = (55.5 - data_mean) + (3.1 * data_stdev)
df.loc[data_mean > 57.5, 'LV'] = (data_mean - 57.5) + (3.1 * data_stdev)
display(df)

最新更新