while循环不断地重新检查Pandas数据框中的更改



我有两个相同的数据框newoldnew数据框将在一天中随机更新。下面的代码检查是否有任何更改。

import pandas as pd
import numpy as np
new = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Howard'],
'episodes': [42, 24, 31, 29, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
old = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Howard'],
'episodes': [12, 32, 31, 32, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}    
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
df = pd.DataFrame(old, columns = ['name','episodes', 'gender'])
while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
print(df[~df.episodes.eq(df1.episodes)])
df1 = df

我需要在while循环中编写条件,其中df[~df.episodes.eq(df1.episodes)]仅在检测到变化时才打印。在打印新数据之后,它会将数据框设置为相同的值(因为不再需要旧数据)并重新检查更改。上面的代码将输出:

Columns: [name, episodes, gender]
Index: []
Empty DataFrame
Columns: [name, episodes, gender]
Index: []
Empty DataFrame
Columns: [name, episodes, gender]
Index: []
Empty DataFrame

因此,如果更改实际上已经打印,则会忽略。你能建议一种更有效的方法来完成这件事吗?

== Edit ==

根据@BENY的回答,如果我这样做:

import pandas as pd
import numpy as np
new = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Sheldon'],
'episodes': [42, 24, 31, 29, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
old = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Sheldon'],
'episodes': [12, 32, 31, 32, 37, 40],
'gender': ['male', 'female', 'female', 'female', 'male', 'male']}    
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
df = pd.DataFrame(old, columns = ['name','episodes', 'gender'])
while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
out = df.merge(df1[['name','episodes']],on=['name','episodes'],how='left',indicator=True).loc[lambda x : x['_merge']=='left_only']
print(out)
df = df1

它会在整个while循环中打印出来:

name  episodes  gender     _merge
0     Sheldon        12    male  left_only
1       Penny        32  female  left_only
3  Bernadette        32  female  left_only
name  episodes  gender     _merge
0     Sheldon        12    male  left_only
1       Penny        32  female  left_only
3  Bernadette        32  female  left_only
name  episodes  gender     _merge
0     Sheldon        12    male  left_only
1       Penny        32  female  left_only
3  Bernadette        32  female  left_only

是否有可能只打印一次?直到有另一个变化。如果我输入df= df1,那么它将打印如下所示,我将错过更改:

Columns: [name, episodes, gender, _merge]
Index: []
Empty DataFrame
Columns: [name, episodes, gender, _merge]

我需要在检测到更改的地方干净地获取这些数据。

如果您想比较两个数据帧并检查任何更改/差异,为什么不使用DataFrame.compare()函数呢?

下面是基于示例数据的示例输出:

df.compare(df1)

输出:

episodes      
self other
0   12.0  42.0
1   32.0  24.0
3   32.0  29.0

默认情况下,它只突出显示差异。在这里,它显示只有episodes列有差异。
self对应df,other对应df1

左边的索引,即。013表示差异的行索引

如果您想显示整个原始形状,您也可以使用keep_shape=参数,如下所示:

df.compare(df1, keep_shape=True)

输出:

name       episodes       gender      
self other     self other   self other
0  NaN   NaN     12.0  42.0    NaN   NaN
1  NaN   NaN     32.0  24.0    NaN   NaN
2  NaN   NaN      NaN   NaN    NaN   NaN
3  NaN   NaN     32.0  29.0    NaN   NaN
4  NaN   NaN      NaN   NaN    NaN   NaN
5  NaN   NaN      NaN   NaN    NaN   NaN

只显示不同的值。

NaN

值为无差异值。当然,如果你愿意,你也可以选择显示所有的值,包括相等的值,如下所示:

df.compare(df1, keep_shape=True, keep_equal=True)

name             episodes        gender        
self       other     self other    self   other
0     Sheldon     Sheldon       12    42    male    male
1       Penny       Penny       32    24  female  female
2         Amy         Amy       31    31  female  female
3  Bernadette  Bernadette       32    29  female  female
4         Raj         Raj       37    37    male    male
5      Howard      Howard       40    40    male    male

此选项允许您并排比较以检查差异。无论如何,要发现它们之间的区别就不那么容易了。

我建议你采用默认选项,首先只显示差异(可能是写下有差异行的索引),并可选地,只有当你想要详细检查另一边的值(它们是相等的)时才使用其他2个选项。

要在while循环下使用,可以使用:

while True:
df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
out = df.compare(df1)
print(out)
df = df1

编辑

如果您希望看到name,而保持只看到其他列的差异,您可以使用append=True设置索引,如下所示:

df.set_index('name', append=True).compare(df1.set_index('name', append=True))

episodes      
self other
name                     
0 Sheldon        12.0  42.0
1 Penny          32.0  24.0
3 Bernadette     32.0  29.0

通过这种方式,您可以看到name和行索引之间的差异。

让我们试试merge

out = df.merge(df1[['name','episodes']],on=['name','episodes'],how='left',indicator=True).loc[lambda x : x['_merge']=='left_only']
name  episodes  gender     _merge
0     Sheldon        12    male  left_only
1       Penny        32  female  left_only
3  Bernadette        32  female  left_only

最新更新