根据列中的值运行模拟



我编写了一些代码,可以根据许多条件模拟panda数据帧中的值。现在,我只想对名为df['Use Type']的列中的特定值运行此代码。我目前有以下几种:

def l_sim():
n = 100
for i in range(n)
df['RAND'] = np.random.uniform(0, 1, size=df.index.size)
conditions = [df['RAND'] >= (1 - 0.8062), (df['RAND'] < (1 - 0.8062)) & (df['RAND'] >= 0.1),
(df['RAND'] < 0.1) & (df['RAND'] >= 0.05), (df['RAND'] < 0.05) &
(df['RAND'] >= 0.025), (df['RAND'] < 0.025) & (df['RAND'] >= 0.0125),
(df['RAND'] < 0.0125)]
choices = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5']
df['L'] = np.select(conditions, choices)
conditions = [df['L'] == 'L0', df['L'] == 'L1', df['L'] == 'L2', df['L'] == 'L3',
df['L'] == 'L4', df['L'] == 'L5']
choices = [df['A'] * 0.02, df['A'] * 0.15, df['A'] * 0.20, df['A'] * 0.50,
df['A'] * 1, df['A'] * 1]
df['AL'] = np.select(conditions, choices)

l_sim()

如何才能仅对具有df.loc[df['Use Type'] == 'Commercial Property']的行运行此代码?

提前谢谢。

我认为您需要以不同的方式构建代码。但一般来说,您可以使用df.apply和lambda函数。这种模式:

df['L'] = df.apply(lambda row: l_sim(row), axis=1)

我将把你的代码分成三个函数,一个用于df['L']:

def l_logic():
random_num = np.random.uniform(0, 1)
conditions = [random_num >= (1 - 0.8062), (random_num < (1 - 0.8062)) & (random_num >= 0.1),
(random_num < 0.1) & (random_num >= 0.05), (random_num < 0.05) &
(random_num >= 0.025), (random_num < 0.025) & (random_num >= 0.0125),
(random_num < 0.0125)]
choices = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5']
L = np.select(conditions, choices)
return L

一个用于df['AL']。由于您在分配之前使用了df[A],所以我将其更改为some_number

def al_logic(row):
some_number = 1
conditions = [row['L'] == 'L0', row['L'] == 'L1', row['L'] == 'L2', row['L'] == 'L3', row['L'] == 'L4', row['L'] == 'L5']
choices = [some_number * 0.02, some_number * 0.15, some_number * 0.20, some_number * 0.50, some_number * 1, some_number * 1]
AL = np.select(conditions, choices)
return AL

第三个用于仅在row['Use Type'] =='Commercial Property':时创建值的逻辑

def l_sim(row):
if row['Use Type'] == 'Commercial Property':
if 'L' in row.index:
return al_logic(row)
else:
return l_logic()
else:
return 'NaN'

启动:

df['L'] = df.apply(lambda row: l_sim(row), axis=1)
df['AL'] = df.apply(lambda row: l_sim(row), axis=1)

假设数据帧至少有两列"A"one_answers"使用类型",例如:

df = pd.DataFrame({'Use Type':['Commercial Property']*3+['other']*2, 'A':1})

然后通过修改您的功能:

def l_sim(df,use_type=None):
#check if you want to do it ont he whole datafrmae or a specific Use type
if use_type:
mask = df['Use Type'] == use_type
else:
mask = slice(None)
# generete the random values
df.loc[mask,'RAND'] = np.random.uniform(0, 1, size=df[mask].index.size)
# create conditions (same for both L and AL by the way)
conditions = [ df['RAND'] >= (1 - 0.8062), (df['RAND'] >= 0.1), (df['RAND'] >= 0.05), 
(df['RAND'] >= 0.025), (df['RAND'] >= 0.0125), (df['RAND'] < 0.0125)]
#choices for the column L and create the column
choices_L = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5']
df.loc[mask,'L'] = np.select(conditions, choices_L)[mask]
#choices for the column AL and create the column
choices_A = [df['A'] * 0.02, df['A'] * 0.15, df['A'] * 0.20, df['A'] * 0.50,
df['A'] * 1, df['A'] * 1]
df.loc[mask,'AL'] = np.select(conditions, choices_A)[mask]

如果你这样做:

l_sim(df,'Commercial Property')
print (df)
Use Type  A      RAND    L    AL
0  Commercial Property  1  0.036593   L3  0.50
1  Commercial Property  1  0.114773   L1  0.15
2  Commercial Property  1  0.651873   L0  0.02
3                other  1       NaN  NaN   NaN
4                other  1       NaN  NaN   NaN

l_sim(df)
print (df)
Use Type  A      RAND   L    AL
0  Commercial Property  1  0.123265  L1  0.15
1  Commercial Property  1  0.906185  L0  0.02
2  Commercial Property  1  0.107588  L1  0.15
3                other  1  0.434560  L0  0.02
4                other  1  0.304901  L0  0.02

我去掉了循环for,因为我看不出这一点,我简化了你的conditions,就像你之前的问题的答案一样

最新更新