从另一个数据帧逐列划分数据帧



我有两个看起来相似的数据帧,我想用df1的一列除以df2的一列。

一些样本数据如下:

dict1 = {'category': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 1.0},
'Id': {0: 24108, 1: 24307, 2: 24307, 3: 24411, 4: 24411},
'count': {0: 3, 1: 2, 2: 33, 3: 98, 4: 33}}
df1 = pd.DataFrame(dict1)
dict2 = {'Id': {0: 24108, 1: 24307, 2: 24411},
'count': {0: 3, 1: 35, 2: 131}}
df2 = pd.DataFrame(dict2)

我试图通过将df1['count']除以df2['count'],在第一个数据帧(df1(中创建一个名为weights的新列。除了两个dfs中的列categorycount之外,其他列中的值都相同。

我有以下代码,但我似乎无法理解错误在哪里:

df1['weights'] = (df1['count']
.div(df1.merge(df2, on = 'Id', how = 'left')
['count'].to_numpy())
)

当我运行代码时,我得到以下错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
3362             except KeyError as err:
/opt/conda/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
/opt/conda/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'count'
The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_354/1318629977.py in <module>
1 complete['weights'] = (complete['count']
----> 2                       .div(complete.merge(totals, on = 'companyId', how = 'left')['count'].to_numpy())
3                       )

你知道为什么会这样吗?

由于合并后会出现count_xcount_y,因此需要指定要使用哪一个:

df1['weights'] = (df1['count'].div(df1.merge(df2, on = 'Id', how = 'left')['count_y'].to_numpy()))

最新更新