在熊猫数据帧中更改时比较行和上一行

我有以下形式的纵向数据

import pandas as pd
df = pd.DataFrame({
    'a': ['apples', 'plums', 'pears', 'pears', 'pears'],
    'b': ['grapes', 'grapes', 'grapes', 'grapes', 'bananas'],
    'c': [0, 0, 1, 0, 1]
})

以及比较列表的函数（细节并不重要）

def compare(old_fruit, new_fruit):
    if set(new_fruit) - set(old_fruit) == {'pears'}:
        return 1
    else:
        return 0

c 是 1 当我感兴趣的a和b发生变化时。我想找到 c = 1 的行，获取该点的 a 和 b 的值，加上上一行中的 a 和 b 的值，使用我的函数比较它们，并向显示比较结果的数据帧添加新的 Series。

对于上面的例子，我想要的操作将执行compare(['plums', 'grapes'], ['pears', 'grapes'])和compare(['pears', 'grapes'], ['pears', 'bananas'])并将系列[0, 0, 1, 0, 0]添加到数据帧中，即所需的输出是一个数据帧，如下所示：

pd.DataFrame({
    'a': ['apples', 'plums', 'pears', 'pears', 'pears'],
    'b': ['grapes', 'grapes', 'grapes', 'grapes', 'bananas'],
    'c': [0, 0, 1, 0, 1],
    'd': [0, 0, 1, 0, 0]
})

以矢量化的方式完全按照您想要比较的方式进行操作：

df_set = df[['a', 'b']].apply(set, axis=1)
df_set
Out[38]: 
0    {grapes, apples}
1     {grapes, plums}
2     {grapes, pears}
3     {grapes, pears}
4    {bananas, pears}
dtype: object
(df_set - df_set.shift()) == {'pears'}
Out[39]: 
0    False
1    False
2     True
3    False
4    False
dtype: bool

相关内容

最新更新

热门标签：