根据其他两个列的复杂条件更新列值



对于下方的DataFrame df

df = pd.DataFrame([('Tesla','Model3', '-', 'Motor'),
('Tesla', 'ModelS', '-', 'MotorMD3'),
('Tesla', 'ModelX', '-', 'MotorMD3'),
('Tesla', 'ModelY', '-', 'Motor'),
('Jeep',  'Wrangler','Grouped','Engine'),
('Jeep',  'Compass', 'Grouped','EngineMD3'),
('Jeep',  'Patriot', 'Grouped','Engine'),
('Jeep',  'Cherokee','Grouped','Engine'),
('Ford',  'Mustang', 'Grouped','Engine'),
('Ford',  'F150',     'Grouped','Engine')  ],columns=['Make','Model','Status','Type'])

df
Make    Model        Status    Type
0   Tesla   Model3             -    Motor
1   Tesla   ModelS             -    MotorMD3
2   Tesla   ModelX             -    MotorMD3
3   Tesla   ModelY             -    Motor
4   Jeep    Wrangler     Grouped    Engine
5   Jeep    Compass      Grouped    EngineMD3
6   Jeep    Patriot      Grouped    Engine
7   Jeep    Cherokee     Grouped    Engine
8   Ford    Mustang      Grouped    Engine
9   Ford    F150         Grouped    Engine

如果EngineMD3存在于该Make中的任何一个Models中,并且如果Status是该MakeGrouped,则我正在尝试用EngineMD3为所有相同的Make更新列Type。但是,如果Status不是Grouped,则对于每个ModelsType应当保持原样。如果"EngineMD3"不存在,则Type应保持为Engine

例如,Tesla不是Grouped,因此每个模型的Type保持不变。但是JeepGrouped,而CompassTypeEngineMD3,所以EngineMD3被更新为所有JeepModelsTypeFordGrouped,但Models都不具有类型EngineMD3,因此Type保持为Engine

预期输出

Make    Model    Status       Type
0   Tesla   Model3      -           Motor  #For Tesla Type is maintained for each model seperately since it is not grouped
1   Tesla   ModelS      -         MotorMD3
2   Tesla   ModelX      -         MotorMD3
3   Tesla   ModelY      -            Motor
4   Jeep    Wrangler   Grouped   EngineMD3 #Since Jeep is grouped, all its Type is changed to EngineMD3 since one of the model had EngineMD3
5   Jeep    Compass    Grouped   EngineMD3
6   Jeep    Patriot    Grouped   EngineMD3
7   Jeep    Cherokee   Grouped   EngineMD3
8   Ford    Mustang    Grouped      Engine #Even though Ford is grouped, since there is no EngineMD3 the Type is maintained as Engine.
9   Ford    F150       Grouped      Engine

换句话说,条件适用于所有的makes(例如吉普车(如果make被分组,并且如果FD3被附加到任何一个模型types,那么同一make中的所有分组模型将具有附加到它们的FD3

我尝试使用np.select来更新具有多个条件的Type列,但我不能给出一个条件来同时选择所有相同的Make并更新Type。请帮忙,我这里没有选择了。

df.loc[df.Status.eq('Grouped'), 'Type'] = df[df.Status.eq('Grouped')].groupby('Make').Type.transform(
lambda x: 'EngineMD3' if x.eq('EngineMD3').any() else x)
Make     Model   Status       Type
0  Tesla    Model3        -      Motor
1  Tesla    ModelS        -   MotorMD3
2  Tesla    ModelX        -   MotorMD3
3  Tesla    ModelY        -      Motor
4   Jeep  Wrangler  Grouped  EngineMD3
5   Jeep   Compass  Grouped  EngineMD3
6   Jeep   Patriot  Grouped  EngineMD3
7   Jeep  Cherokee  Grouped  EngineMD3
8   Ford   Mustang  Grouped     Engine
9   Ford      F150  Grouped     Engine

你可以试试这个:

def process(g):
if sum(g.Status=='Grouped')>0:
g['Type'] = 'EngineMD3' if sum(g.Type=='EngineMD3')>0 else 'Engine'
return g
df.groupby('Make').apply(process)

输出:

Make    Model       Status  Type
0   Tesla   Model3      -       Motor
1   Tesla   ModelS      -       MotorMD3
2   Tesla   ModelX      -       MotorMD3
3   Tesla   ModelY      -       Motor
4   Jeep    Wrangler    Grouped EngineMD3
5   Jeep    Compass     Grouped EngineMD3
6   Jeep    Patriot     Grouped EngineMD3
7   Jeep    Cherokee    Grouped EngineMD3
8   Ford    Mustang     Grouped Engine
9   Ford    F150        Grouped Engine

我们可以使用以下函数:

def trans(s_1, s_2, s_3):

Make_values, new_col = [], []

for i in range(len(s_1)):
if s_3[i]=="EngineMD3" and s_2[i]=="Grouped":
Make_values.append(s_1[i])

for i in range(len(Make_values)):

for j in range(len(s_3)):

if s_1[j]==Make_values[i]:

new_col.append("EngineMD3")
else:
new_col.append(s_3[j])
return new_col

我们应用它:

df['Type']=trans(df['Make'], df['Status'], df['Type'])

我不是python专家,我相信有更多的最佳解决方案。。。但这里有一个。

  1. 检查查询的长度>0
  2. 如果是>0,然后我们存储满足您在y变量中提到的条件的Make的值
  3. 然后我们替换特定Make的Type
if len(df.query("Status == 'Grouped' & Type.str.startswith('EngineMD3')")) > 0:
x = df.query("Status == 'Grouped' & Type.str.startswith('EngineMD3')")['Make'].values
for i in range(len(x)):
y = x[i]
df['Type'].loc[df['Make'] == y] = 'EnginerMD3'
display(df)

最新更新