Python np.select 将某些条件与多个选项匹配

我有一个熊猫数据帧，如下所示：

id variable value
1    x        5
1    y        5
2    x        7
2    y        7

现在我想将一些变量重命名为其他变量，对于其余变量，我想将它们映射到两个不同的变量(行的其余部分将按原样复制(。例如，在上面的数据帧中，我想将x重命名为x1，y重命名为a和b。我正在寻找这样的东西：

conditions = [(df['variable']=='x'),(df['variable']=='y')]
choices = ['x1',['y1','y2']]
df['variable'] = np.select(conditions, choices, default='NA')

因此，最终数据帧将如下所示：

id variable value
1    x1       5
1    a        5
1    b        5
2    x1       7
2    a        7
2    b        7

我怎样才能做到这一点？

您正在尝试更改数据的形状，您可以尝试这种方法，该方法使用分隔符连接列表，然后我们可以分解列并连接：

conditions = [(df['variable']=='x'),(df['variable']=='y')]
s=pd.Series(np.select(conditions,['x1','|'.join(['a','b'])])).str.split('|').explode()
out = df.join(s.rename("variable_new"))

<小时 />

print(out)
id variable  value variable_new
0   1        x      5           x1
1   1        y      5            a
1   1        y      5            b
2   2        x      7           x1
3   2        y      7            a
3   2        y      7            b

编辑低于 0.25 的熊猫版本：

conditions = [(df['variable']=='x'),(df['variable']=='y')]
df['variable'] = (pd.Series(np.select(conditions,
['x1','|'.join(['a','b'])])).str.split('|'))
out = (df.loc[df.index.repeat(df['variable'].str.len())]
.assign(variable=np.concatenate(df['variable'])))
print(out)
id variable  value
0   1       x1      5
1   1        a      5
1   1        b      5
2   2       x1      7
3   2        a      7
3   2        b      7

相关内容

最新更新

热门标签：