熊猫合并用空值填充新数据帧

我正在尝试合并两个数据帧：

第一个数据帧 control 填充整数/字符串
左侧数据帧一起填充整数/列表。

当我使用 pandas merge() 函数时，新数据帧会用 NaN 填充正确的数据帧，而不是列表

final_dataset = pd.merge(control, together, on="zip_code", how="left")

我希望一个新的合并数据帧，其中包含来自两个原始数据帧的值。相反，在新数据帧中，"控件"数据帧中的所有值都是正确的，但"一起"数据帧中的所有列表都是 NaN

下面是一些示例数据：

control                                       together
-------------------------------              -------------------------------
payment             zip_code                   age                  zip_code
   Rent                 94053                    [25, 64, 24]         12583
   Mortgage             47283                    [78. 39, 35]         47283
   Rent                 25769                    [82, 33, 19]         25769

最终数据集如下所示：

final_dataset
-----------------------------------------------------------
zip_code             payment                 age                  
47283                  Mortgage               NaN                 
25769                  Rent                   NaN

我想你这里有一些事情要做。当你说左边的数据帧时，我假设你的意思是它应该是左边连接右边的数据帧？你不是说"在一起"在样本的左侧吗？

我认为可以安全地假设您在"together"中的zip_code是一个字符串而不是"int"。您获得 NaN 是因为它们在 2 个数据帧中不匹配，示例 47283 不等于"47283"。

此外，如果它是一个您想要一起在左侧的左联接，您应该有 1 个 NaN 付款，因为如果它们是相同的数据类型，您只有 2 个匹配的zip_codes。

如果您想控制左侧，以下是我建议的做法(我认为您这样做(：

control = pd.DataFrame({
    'payment':['Rent','Mortgage','Rent'],
    'zip_code':[94053,47283,25769]
})
together = pd.DataFrame({
    'age':[[25,64,24],[78, 39,35],[82,33,19]],
    'zip_code':[12583,47283,25769]
})
control.merge(together,on='zip_code',how='left')

这将为您提供以下结果：

    payment  zip_code           age
0      Rent     94053           NaN
1  Mortgage     47283  [78, 39, 35]
2      Rent     25769  [82, 33, 19]

如您所见，您的年龄为 1 NaN，因为 94053 不在"在一起"数据帧中。

如果zip_code列对每个数据帧具有不同的类型，则可能会发生这种情况，其中一个是int64，另一个是对象，例如：

 a = pd.DataFrame([
    {"colA": 1, "key": "1"},
    {"colA": 2, "key": "2"},
    {"colA": 3, "key": "3"}
])
b = pd.DataFrame([
    {"colB": [25, 64, 24], "key": 1},
    {"colB": [25, 64, 24], "key": 2},
    {"colB": [25, 64, 24], "key": 4}
])

如果合并这两个数据帧，将得到

res = pd.merge(a, b, on="key", how='left')

   colA key colB
0   1   1   NaN
1   2   2   NaN
2   3   3   NaN

因此，您需要确保zip_code在两个数据帧中具有相同的类型。

相关内容

最新更新

热门标签：