我想有条件地替换包含一系列数组的列中的值。
下面的示例数据集:(我的真实数据集包含更多的列和行(
index lists condition
0 ['5 apples', '2 pears'] B
1 ['3 apples', '3 pears', '1 pumpkin'] A
2 ['4 blueberries'] A
3 ['5 kiwis'] C
4 ['1 pumpkin'] B
... ... ...
例如,如果条件为A
,并且该行包含"1南瓜",则我希望将该值替换为XXX
。但是,如果条件是B
,并且该行包含1 pumpkin
,那么我希望将该值替换为YYY
。
所需输出
index lists condition
0 ['5 apples', '2 pears'] B
1 ['3 apples', '3 pears', 'XXX'] A
2 ['4 blueberries'] A
3 ['5 kiwis'] C
4 ['YYY'] B
... ... ...
事实上,目标是替换所有这些值,但1 pumpkin
只是一个例子。重要的是,我希望保持阵列结构。谢谢
让我们先执行explode
,然后执行np.select
s = df.explode('lists')
cond = s['lists']=='1 pumpkin'
c1 = cond&s['condition'].eq('A')
c2 = cond&s['condition'].eq('B')
s['lists'] = np.select([c1,c2],['XXX','YYY'],default = s.lists.values )
df['lists'] = s.groupby(level=0)['lists'].agg(list)
您可以用要应用于Dataframe
的逻辑定义一个函数,然后调用df.apply(function)
将该逻辑传递给df
def pumpkin(row):
if '1 pumpkin' in row['lists']:
data = row['lists'][:]
if row['condition'] == 'A':
data[data.index('1 pumpkin')] = 'XXX'
elif row['condition'] == 'B':
data[data.index('1 pumpkin')] = 'YYY'
return data
return row['lists']
df['lists'] = df.apply(pumpkin, axis=1)
输出
lists condition
0 [5 apples, 2 pears] B
1 [3 apples, 3 pears, XXX] A
2 [4 blueberries] A
3 [5 kiwis] C
4 [YYY] B