比较列中数据帧的逐行值时出现意外结果



我试图比较两个数据框架中的数据,以查找标题"A-Score"下缺失或具有不同值的行。然而,我的脚本给出的结果有点出乎意料。如何解决这个问题?

import pandas as pd
print(df1)
Ensembl_ID  length   score  identity       p_value  R-Site  
0   ENSG00000000457     208   42.98    92.857  4.390000e-34     110   
1   ENSG00000000457     208   42.98    92.857  4.390000e-34     133   
2   ENSG00000000457     208   42.98    92.857  4.390000e-34     149   
3   ENSG00000000457     208   42.98    92.857  4.390000e-34     164   
4   ENSG00000000460     349   56.10   100.000  0.000000e+00      90   
5   ENSG00000000460     349   56.10   100.000  0.000000e+00     168   
6   ENSG00000000460     349   56.10   100.000  0.000000e+00     187   
7   ENSG00000000460     349   56.10   100.000  0.000000e+00     297   
8   ENSG00000000460     349   56.10   100.000  0.000000e+00     317   
9   ENSG00000000460     349   56.10   100.000  0.000000e+00     336   
10  ENSG00000004399     656  130.45   100.000  0.000000e+00     134   
11  ENSG00000004399     656  130.45   100.000  0.000000e+00     151   
12  ENSG00000004399     656  130.45   100.000  0.000000e+00     153   
13  ENSG00000004399     656  130.45   100.000  0.000000e+00     204   
14  ENSG00000004399     656  130.45   100.000  0.000000e+00     290   
15  ENSG00000004399     656  130.45   100.000  0.000000e+00     298   
16  ENSG00000004399     656  130.45   100.000  0.000000e+00     342   
17  ENSG00000004399     656  130.45   100.000  0.000000e+00     362   
18  ENSG00000004399     656  130.45   100.000  0.000000e+00     431   
19  ENSG00000004399     656  130.45   100.000  0.000000e+00     434   
20  ENSG00000004399     656  130.45   100.000  0.000000e+00     514   
21  ENSG00000004399     656  130.45   100.000  0.000000e+00     516   
22  ENSG00000004399     656  130.45   100.000  0.000000e+00     556   
23  ENSG00000004399     656  130.45   100.000  0.000000e+00     576
R-PercentPosition  R-Score  
0           52.884615    0.147  
1           63.942308    0.040  
2           71.634615    0.105  
3           78.846154    0.063  
4           25.787966    0.711  
5           48.137536    0.094  
6           53.581662    0.252  
7           85.100287    0.726  
8           90.830946    0.024  
9           96.275072    0.001  
10          20.426829    0.015  
11          23.018293    0.017  
12          23.323171    0.528  
13          31.097561    0.044  
14          44.207317    0.008  
15          45.426829    0.111  
16          52.134146    0.382  
17          55.182927    0.042  
18          65.701220    0.002  
19          66.158537    0.001  
20          78.353659    0.014  
21          78.658537    0.872  
22          84.756098    0.243  
23          87.347561    0.115 
print(df2)

Ensembl_ID  length   score  identity       p_value  A-Site  
0   ENSG00000000457     208   42.98    92.857  4.390000e-34     133   
1   ENSG00000000457     208   42.98    92.857  4.390000e-34     149   
2   ENSG00000000457     208   42.98    92.857  4.390000e-34     164   
3   ENSG00000000460     349   56.61   100.000  0.000000e+00      90   
4   ENSG00000000460     349   56.61   100.000  0.000000e+00     168   
5   ENSG00000000460     349   56.61   100.000  0.000000e+00     187   
6   ENSG00000000460     349   56.61   100.000  0.000000e+00     297   
7   ENSG00000000460     349   56.61   100.000  0.000000e+00     317   
8   ENSG00000000460     349   56.61   100.000  0.000000e+00     336   
9   ENSG00000004399     656  131.30   100.000  0.000000e+00     134   
10  ENSG00000004399     656  131.30   100.000  0.000000e+00     151   
11  ENSG00000004399     656  131.30   100.000  0.000000e+00     153   
12  ENSG00000004399     656  131.30   100.000  0.000000e+00     204   
13  ENSG00000004399     656  131.30   100.000  0.000000e+00     290   
14  ENSG00000004399     656  131.30   100.000  0.000000e+00     298   
15  ENSG00000004399     656  131.30   100.000  0.000000e+00     342   
16  ENSG00000004399     656  131.30   100.000  0.000000e+00     362   
17  ENSG00000004399     656  131.30   100.000  0.000000e+00     431   
18  ENSG00000004399     656  131.30   100.000  0.000000e+00     434   
19  ENSG00000004399     656  131.30   100.000  0.000000e+00     514   
20  ENSG00000004399     656  131.30   100.000  0.000000e+00     516   
21  ENSG00000004399     656  131.30   100.000  0.000000e+00     556   
22  ENSG00000004399     656  131.30   100.000  0.000000e+00     573 
A-PercentPosition  A-Score  
0           63.942308    0.040  
1           71.634615    0.105  
2           78.846154    0.063  
3           25.787966    0.711  
4           48.137536    0.094  
5           53.581662    0.252  
6           85.100287    0.726  
7           90.830946    0.024  
8           96.275072    0.001  
9           20.426829    0.251  
10          23.018293    0.148  
11          23.323171    0.021  
12          31.097561    0.099  
13          44.207317    0.070  
14          45.426829    0.065  
15          52.134146    0.115  
16          55.182927    0.024  
17          65.701220    0.425  
18          66.158537    0.413  
19          78.353659    0.469  
20          78.658537    0.519  
21          84.756098    0.506  
22          87.347561    0.169 
df1['compare_Scores'] = df1['R-Score'].isin(df2['A-Score'])
print(df1)
Ensembl_ID  length   score  identity       p_value  R-Site  
0   ENSG00000000457     208   42.98    92.857  4.390000e-34     110   
1   ENSG00000000457     208   42.98    92.857  4.390000e-34     133   
2   ENSG00000000457     208   42.98    92.857  4.390000e-34     149   
3   ENSG00000000457     208   42.98    92.857  4.390000e-34     164   
4   ENSG00000000460     349   56.10   100.000  0.000000e+00      90   
5   ENSG00000000460     349   56.10   100.000  0.000000e+00     168   
6   ENSG00000000460     349   56.10   100.000  0.000000e+00     187   
7   ENSG00000000460     349   56.10   100.000  0.000000e+00     297   
8   ENSG00000000460     349   56.10   100.000  0.000000e+00     317   
9   ENSG00000000460     349   56.10   100.000  0.000000e+00     336   
10  ENSG00000004399     656  130.45   100.000  0.000000e+00     134   
11  ENSG00000004399     656  130.45   100.000  0.000000e+00     151   
12  ENSG00000004399     656  130.45   100.000  0.000000e+00     153   
13  ENSG00000004399     656  130.45   100.000  0.000000e+00     204   
14  ENSG00000004399     656  130.45   100.000  0.000000e+00     290   
15  ENSG00000004399     656  130.45   100.000  0.000000e+00     298   
16  ENSG00000004399     656  130.45   100.000  0.000000e+00     342   
17  ENSG00000004399     656  130.45   100.000  0.000000e+00     362   
18  ENSG00000004399     656  130.45   100.000  0.000000e+00     431   
19  ENSG00000004399     656  130.45   100.000  0.000000e+00     434   
20  ENSG00000004399     656  130.45   100.000  0.000000e+00     514   
21  ENSG00000004399     656  130.45   100.000  0.000000e+00     516   
22  ENSG00000004399     656  130.45   100.000  0.000000e+00     556   
23  ENSG00000004399     656  130.45   100.000  0.000000e+00     573 
R-PercentPosition  R-Score  compare_Scores  
0           52.884615    0.147           False  
1           63.942308    0.040            True  
2           71.634615    0.105            True  
3           78.846154    0.063            True  
4           25.787966    0.711            True  
5           48.137536    0.094            True  
6           53.581662    0.252            True  
7           85.100287    0.726            True  
8           90.830946    0.024            True  
9           96.275072    0.001            True  
10          20.426829    0.015           False  
11          23.018293    0.017           False  
12          23.323171    0.528           False  
13          31.097561    0.044           False  
14          44.207317    0.008           False  
15          45.426829    0.111           False  
16          52.134146    0.382           False  
17          55.182927    0.042           False  
18          65.701220    0.002           False  
19          66.158537    0.001            True  
20          78.353659    0.014           False  
21          78.658537    0.872           False  
22          84.756098    0.243           False  
23          87.347561    0.115            True

结果中,正如预期的那样,第0行显示"false",因为在df2中没有R-Site值110。

但是第19和23行的R-Score值在df1和df2之间是不相同的。然而,结果显示"正确"。

是否有更好的方法根据"r - score"中的值来查找df1和df2之间的差异?列?

我不认为你没有做你认为你在做的事。

通过发布df1['compare_Scores'] = df1['R-Score'].isin(df2['A-Score']),您正在查看df1中R-Score列中的每个值(在您的示例中,您正在谈论R-Site值110,但您正在使用R-Score,而不是R-Site),并检查该值是否存在于df2中的A-Score列中(不一定在同一行索引中)。因此,对于第19行,R-score为0.001,并且它出现在第8行df2的a列分数中,因此答案为True。

如果你想要做的是假当行x在df1['R-Score']不同于同一行x在df2['A-Score']中,否则为True,那么您可以执行类似df1['compare_Scores'] = df1['R-Score'] == df2['A-Score']的操作。

请注意,要使工作,您需要df1和df2索引对齐,而在您的示例中并非如此(df1有24行从0到23的索引,而df2有23行从0到22的索引)。

你的逻辑问题是"将验证列值是否在任何一个索引中找到,然后返回True。在您的样本数据中,df2值的第19个指数0.001出现在您的df1的第8个指数中。

df1['compare_Scores'] = df1['R-Score'].isin(df2['A-Score'])

如果你想进行索引明智的比较,下面的逻辑为你工作。

df1['compare_Scores'] = df1.R-Score == df2.A-Score

解决方法:

df1['compare_Scores'] = df1['R-Score'].isin(df2['A-Score'])
#print(df1)
if False in df1['compare_Sites']:
print(df1[df1['compare_Sites'] == False])
Ensembl_ID  length   score  identity       p_value  R-Site  
0   ENSG00000000457     208   42.98    92.857  4.390000e-34     110   
10  ENSG00000004399     656  130.45   100.000  0.000000e+00     134   
11  ENSG00000004399     656  130.45   100.000  0.000000e+00     151   
12  ENSG00000004399     656  130.45   100.000  0.000000e+00     153   
13  ENSG00000004399     656  130.45   100.000  0.000000e+00     204   
14  ENSG00000004399     656  130.45   100.000  0.000000e+00     290   
15  ENSG00000004399     656  130.45   100.000  0.000000e+00     298   
16  ENSG00000004399     656  130.45   100.000  0.000000e+00     342   
17  ENSG00000004399     656  130.45   100.000  0.000000e+00     362   
18  ENSG00000004399     656  130.45   100.000  0.000000e+00     431   
20  ENSG00000004399     656  130.45   100.000  0.000000e+00     514   
21  ENSG00000004399     656  130.45   100.000  0.000000e+00     516   
22  ENSG00000004399     656  130.45   100.000  0.000000e+00     556   
R-PercentPosition  R-Score  compare_Sites  
0           52.884615    0.147          False  
10          20.426829    0.015          False  
11          23.018293    0.017          False  
12          23.323171    0.528          False  
13          31.097561    0.044          False  
14          44.207317    0.008          False  
15          45.426829    0.111          False  
16          52.134146    0.382          False  
17          55.182927    0.042          False  
18          65.701220    0.002          False  
20          78.353659    0.014          False  
21          78.658537    0.872          False  
22          84.756098    0.243          False