Pandas-从两个df中匹配列,并从另一个columnl中返回值



我有一个银行交易的csv,还有我创建的另一个csv,我在其中对最频繁发生的交易进行了分类。我想通过匹配两个数据帧之间的描述来将银行csv分类为一个类别。不是所有的都会被分类/匹配。

df1~2.5k行:

Description, Amount    
Corner Store, 30
Cinema, 10
Trampoline Store, 20

df2~100行:

Description, Category
Corner Store, Groceries
Cinema, Recreation
The Pub, Alcohol

想要结果:

Description, Amount, Category
Corner Store, 30, Groceries
Cinema, 10, Recreation
Trampoline Store, 20,

我已经尝试过了,但在我的df:中只得到了一个带有"nan"的Category列

df1['Category'] = df1['Description'].map(df2.set_index('Description')['Category'])

您可以使用pandas的join,但需要将'Descritpion'设置为索引:

import pandas as pd
data1 = {'Description':['Corner Store','Cinema','Trampoline Store'],
'Amount':[30,10,20]}
df1 = pd.DataFrame(data1)
data2 = {'Description':['Corner Store','Cinema', 'The Pub'],
'Category':['Groceries','Recreation','Alcohol']}
df2 = pd.DataFrame(data2)
df1.set_index('Description',inplace=True)
df2.set_index('Description',inplace=True)
df3 = df1.join(df2)
print(df3)

输出:

Amount    Category
Description                         
Corner Store          30   Groceries
Cinema                10  Recreation
Trampoline Store      20         NaN

相关内容

最新更新