有条件地替换pandas数据帧中数组列表中的值



我想有条件地替换包含一系列数组的列中的值。

下面的示例数据集:(我的真实数据集包含更多的列和行(

index   lists                                  condition
0       ['5 apples', '2 pears']                B
1       ['3 apples', '3 pears', '1 pumpkin']   A
2       ['4 blueberries']                      A
3       ['5 kiwis']                            C
4       ['1 pumpkin']                          B
...     ...                                    ...

例如,如果条件为A,并且该行包含"1南瓜",则我希望将该值替换为XXX。但是,如果条件是B,并且该行包含1 pumpkin,那么我希望将该值替换为YYY

所需输出

index   lists                                  condition
0       ['5 apples', '2 pears']                B
1       ['3 apples', '3 pears', 'XXX']         A
2       ['4 blueberries']                      A
3       ['5 kiwis']                            C
4       ['YYY']                                B
...     ...                                    ...

事实上,目标是替换所有这些值,但1 pumpkin只是一个例子。重要的是,我希望保持阵列结构。谢谢

让我们先执行explode,然后执行np.select

s = df.explode('lists')
cond = s['lists']=='1 pumpkin'
c1 = cond&s['condition'].eq('A')
c2 = cond&s['condition'].eq('B')
s['lists'] = np.select([c1,c2],['XXX','YYY'],default = s.lists.values )
df['lists'] = s.groupby(level=0)['lists'].agg(list)

您可以用要应用于Dataframe的逻辑定义一个函数,然后调用df.apply(function)将该逻辑传递给df

def pumpkin(row):
if '1 pumpkin' in row['lists']:
data = row['lists'][:]
if row['condition'] == 'A':
data[data.index('1 pumpkin')] = 'XXX'
elif row['condition'] == 'B':
data[data.index('1 pumpkin')] = 'YYY'
return data
return row['lists']
df['lists'] = df.apply(pumpkin, axis=1)

输出

lists condition
0       [5 apples, 2 pears]         B
1  [3 apples, 3 pears, XXX]         A
2           [4 blueberries]         A
3                 [5 kiwis]         C
4                     [YYY]         B

最新更新