根据列值的组合获取所有单元格值的最佳方法是什么?
示例数据帧一:
Stock Name Price
0 AMD Advanced Micro Devices 100
1 GE General Electric Company 200
2 BAC Bank of America Corporation 300
3 AAPL Apple Inc. 500
4 MSFT Microsoft Corporation 1000
5 GOOGL Alphabet Inc. 2000
示例数据帧二:
Stock Name Price
0 AMD Advanced Micro Devices 100
1 GE General Electric Company 200
2 BAC Branch of America Corporation 300
3 AAPL Apple Inc. 500
4 MSFT Microsoft Corporation 1000
5 GOOGL Alphabet Inc. 2000
例如:我想使用(Stock和Name(作为关键列,然后比较数据集。目标是打印两个数据集之间不匹配的条目,并将Stock+Name列用作组合键。
我正在使用Pandas/Python3.7
样本输出:
BAC美国银行股份有限公司300-BAC美国分行Corporation 300
也许,使用merge
+query
的完整INNER JOIN?
df1.merge(df2, on='Stock').query('Name_x != Name_y')
Stock Name_x Price_x Name_y Price_y
2 BAC Bank of America Corporation 300 Branch of America Corporation 300
或者,map
的解决方案略有不同,您可以使用它来获得股票符号:
m = df1.Stock.map(df2.set_index('Stock').Name).ne(df1.Name)
symbols = df1.loc[m, 'Stock']
print(symbols)
2 BAC
Name: Stock, dtype: object
然后通过股票符号访问每个DataFrame行:
df1[df1.Stock.isin(symbols)]
Stock Name Price
2 BAC Bank of America Corporation 300
df2[df2.Stock.isin(symbols)]
Stock Name Price
2 BAC Branch of America Corporation 300
如果它们在两个数据帧中,那么使用.concat
可以非常简单地无条件地合并它们。一旦他们加入,这里有一种方法可以得到不匹配:
import pandas as pd
df1 = pd.DataFrame({
"Ticker_y": list("qwerty"),
"Name_y": list("asdfgh"),
"Ticker_x": list("qw3r7y"),
"Name_x": list("as6f8h")
})
mismatch = df1[(df1["Ticker_y"] != df1["Ticker_x"]) & (df1["Name_y"] != df1["Name_x"])]
最后一行只是说"df只有在满足这些条件的情况下。">
我们可以使用isin
使用值序列进行测试,因为它确保DataFrame中的每个元素都包含在值中
第一个数据帧
>>> df1
Stock Name Price
0 AMD Advanced Micro Devices 100
1 GE General Electric Company 200
2 BAC Bank of America Corporation 300
3 APPL Apple Inc. 500
4 MSFT Microsoft Corporation 1000
5 GOOGL Alphabet Inc. 2000
第二个数据帧
>>> df2
Stock Name Price
0 AMD Advanced Micro Devices 100
1 GE General Electric Company 200
2 BAC Branch of America Corporation 300
3 APPL Apple Inc. 500
4 MSFT Microsoft Corporation 1000
5 GOOGL Alphabet Inc. 2000
给你。。
>>> df2[~df2.Name.isin(df1.Name.values)]
Stock Name Price
2 BAC Branch of America Corporation 300
或
>>> df1[~df1.Name.isin(df2.Name.values)]
Stock Name Price
2 BAC Bank of America Corporation 300