合并相同数据类型的列上的两个数据帧,但出现 ValueError



我想合并收入 df 中第 profile_ID 列上的两个数据帧和 CompProfile df 中的索引商品类。

收入

profile_ID   type          col1        col2
0         O-COMP-1006  Small_Off   4.1427e+07   4.0027e+07
1         O-COMP-1006  Small_Off   4.7915e+07   4.6515e+07
2         O-COMP-1006  Small_Off  6.10424e+07  5.96424e+07
3         O-COMP-1006  Small_Off  6.83726e+07  6.69726e+07
4         O-COMP-1008  Small_Off  7.28167e+07  7.14167e+07
5         O-COMP-1009  Small_Off   7.6147e+07   7.4747e+07
7         O-COMP-1006  Small_Off  8.02798e+07  7.88798e+07
8         O-COMP-1006  Small_Off  8.17172e+07  8.03172e+07
9         O-COMP-1006  Small_Off  8.42322e+07  8.28322e+07
10        O-COMP-1005  Small_Off  8.54957e+07  8.40747e+07
11        O-COMP-1006  Small_Off  8.67782e+07  8.53358e+07
12        O-COMP-1006  Small_Off  8.80798e+07  8.66159e+07
13        O-COMP-1007  Small_Off   8.9401e+07  8.79151e+07
14        O-COMP-1006  Small_Off   9.0742e+07  8.92338e+07

和 CompProfile

col1     col2        col3
CommodityClass          
profile_ID         NaN      NaN          NaN
O-COMP-1001        0.0      0.0          0.0
O-COMP-1002        0.0      0.0          0.0
O-COMP-1003        0.0      0.0          0.0
O-COMP-1004        0.0      0.0          0.0
O-COMP-1005        0.0      0.0          0.0
O-COMP-1006        1.0      0.0          0.0
O-COMP-1007        0.0      0.0          1.0
O-COMP-1008        0.0      0.0          0.0
O-COMP-1009        0.0      1.0          0.0

我使用

pd.merge( Income, CompProfile, how='left', 
left_on = 'profile_ID', right_index=True, 
suffixes = ("_USD","_frac") )

并收到错误

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

我检查了要合并的列和索引,它们的类型是 object。 我尝试使用连接,但得到同样的错误。

Income.join(CompProfile, on= 'profile_ID',lsuffix = "_USD",rsuffix = "_frac")

我还尝试重置 CompProfile 的索引并合并列:

CompProfile.reset_index()
pd.merge( Income, CompProfile, how='left', 
left_on = 'profile_ID', right_on='CommodityClass', 
suffixes = ("_USD","_frac") )

在这种情况下,我得到

KeyError: 'CommodityClass'

我还尝试从 CompProfile 中删除"profile_ID"行,但它没有改变任何东西。

CompProfile.head(10).to_dict()
{'col1': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 1.0, 'O-COMP-1007': 0.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 0.0}, 'col2': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 0.0, 'O-COMP-1007': 0.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 1.0}, 'col3': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 0.0, 'O-COMP-1007': 1.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 0.0}}

您的第一次尝试工作正常。下面是一个完整的工作示例。您需要退后一步,找出为什么您的数据与您下面提供的示例数据完全不同。

import pandas as pd
from numpy import nan
d1 = {'profile_ID': {0: 'O-COMP-1006', 1: 'O-COMP-1006', 2: 'O-COMP-1006', 3: 'O-COMP-1006', 4: 'O-COMP-1008', 5: 'O-COMP-1009', 7: 'O-COMP-1006', 8: 'O-COMP-1006', 9: 'O-COMP-1006', 10: 'O-COMP-1005'}, 'type': {0: 'Small_Off', 1: 'Small_Off', 2: 'Small_Off', 3: 'Small_Off', 4: 'Small_Off', 5: 'Small_Off', 7: 'Small_Off', 8: 'Small_Off', 9: 'Small_Off', 10: 'Small_Off'}, 'col1': {0: 41427000.0, 1: 47915000.0, 2: 61042400.0, 3: 68372600.0, 4: 72816700.0, 5: 76147000.0, 7: 80279800.0, 8: 81717200.0, 9: 84232200.0, 10: 85495700.0}, 'col2': {0: 40027000.0, 1: 46515000.0, 2: 59642400.0, 3: 66972600.0, 4: 71416700.0, 5: 74747000.0, 7: 78879800.0, 8: 80317200.0, 9: 82832200.0, 10: 84074700.0}}
d2 = {'col1': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 1.0, 'O-COMP-1007': 0.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 0.0}, 'col2': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 0.0, 'O-COMP-1007': 0.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 1.0}, 'col3': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 0.0, 'O-COMP-1007': 1.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 0.0}}
Income = pd.DataFrame.from_dict(d1)
CompProfile = pd.DataFrame.from_dict(d2)
res = pd.merge(Income, CompProfile, how='left',
left_on='profile_ID', right_index=True,
suffixes=('_USD', '_frac'))
print(res)
profile_ID       type    col1_USD    col2_USD  col1_frac  col2_frac  col3
0   O-COMP-1006  Small_Off  41427000.0  40027000.0        1.0        0.0   0.0
1   O-COMP-1006  Small_Off  47915000.0  46515000.0        1.0        0.0   0.0
2   O-COMP-1006  Small_Off  61042400.0  59642400.0        1.0        0.0   0.0
3   O-COMP-1006  Small_Off  68372600.0  66972600.0        1.0        0.0   0.0
4   O-COMP-1008  Small_Off  72816700.0  71416700.0        0.0        0.0   0.0
5   O-COMP-1009  Small_Off  76147000.0  74747000.0        0.0        1.0   0.0
7   O-COMP-1006  Small_Off  80279800.0  78879800.0        1.0        0.0   0.0
8   O-COMP-1006  Small_Off  81717200.0  80317200.0        1.0        0.0   0.0
9   O-COMP-1006  Small_Off  84232200.0  82832200.0        1.0        0.0   0.0
10  O-COMP-1005  Small_Off  85495700.0  84074700.0        0.0        0.0   0.0

最新更新