我有一个银行交易的csv,还有我创建的另一个csv,我在其中对最频繁发生的交易进行了分类。我想通过匹配两个数据帧之间的描述来将银行csv分类为一个类别。不是所有的都会被分类/匹配。
df1~2.5k行:
Description, Amount
Corner Store, 30
Cinema, 10
Trampoline Store, 20
df2~100行:
Description, Category
Corner Store, Groceries
Cinema, Recreation
The Pub, Alcohol
想要结果:
Description, Amount, Category
Corner Store, 30, Groceries
Cinema, 10, Recreation
Trampoline Store, 20,
我已经尝试过了,但在我的df:中只得到了一个带有"nan"的Category列
df1['Category'] = df1['Description'].map(df2.set_index('Description')['Category'])
您可以使用pandas的join
,但需要将'Descritpion'
设置为索引:
import pandas as pd
data1 = {'Description':['Corner Store','Cinema','Trampoline Store'],
'Amount':[30,10,20]}
df1 = pd.DataFrame(data1)
data2 = {'Description':['Corner Store','Cinema', 'The Pub'],
'Category':['Groceries','Recreation','Alcohol']}
df2 = pd.DataFrame(data2)
df1.set_index('Description',inplace=True)
df2.set_index('Description',inplace=True)
df3 = df1.join(df2)
print(df3)
输出:
Amount Category
Description
Corner Store 30 Groceries
Cinema 10 Recreation
Trampoline Store 20 NaN