在词典列表中左合并不起作用



作为练习,我想减少对熊猫的依赖,并在字典列表上构建自定义合并函数。从本质上讲,这是一个左合并,其中保留了原始列表,如果键有多个匹配项,则会添加额外的行。但是,就我而言,似乎添加了额外的行,但具有完全相同的信息。

谁能引导我朝着正确的方向前进,关于这段代码出错的地方?

def merge(self, l2, key):
    #self.data is a list of dictionaries
    #l2 is the second list of dictionaries to merge
    headers = l2[0]
    found = {}
    append_list = []
    for row in self.data:
        for row_b in l2:
            if row_b[key] == row[key] and row[key] not in found:
                found[row[key]] = ""
                for header in headers:
                    row[header] = row_b[header]
            elif row_b[key] == row[key]:
                new_row = row
                for header in headers:
                    new_row[header] = row_b[header]
                    append_list.append(new_row)

    self.data.extend(append_list)

编辑:这是一些示例输入和预期输出:

self.data = [{'Name':'James', 'Country':'Australia'}, {'Name':'Tom', 'Country':'France'}]
l2 = [{'Country':'France', 'Food':'Frog Legs'}, {'Country':'Australia', 'Food':'Meat Pie'},{'Country':'Australia', 'Food':'Pavlova'}]

我希望 self.data 在通过函数后等于以下内容,参数为"国家":

[{'Name':'James', 'Country':'Australia', 'Food':'Meat Pie'}, {'Name':'James', 'Country':'Australia', 'Food':'Pavlova'}, {'Name':'Tom', 'Country':'France', 'Food':'Frog Legs'}]

下面的函数采用两个字典列表,其中字典都应将keyprop作为其属性之一:

from collections import defaultdict
from itertools import product
def left_join(left_table, right_table, keyprop):
    # create a dictionary indexed by `keyprop` on the left
    left = defaultdict(list)
    for row in left_table:
        left[row[keyprop]].append(row)
    # create a dictionary indexed by `keyprop` on the right
    right = defaultdict(list)
    for row in right_table:
        right[row[keyprop]].append(row)
    # now simply iterate through the "left side",
    # grabbing rows from the "right side" if they are available
    result = []
    for key, left_rows in left.items():
        right_rows = right.get(key)
        if right_rows:
            for left_row, right_row in product(left_rows, right_rows):
                result.append({**left_row, **right_row})
        else:
            result.extend(left_rows)
    return result
    sample1 = [{'Name':'James', 'Country':'Australia'}, {'Name':'Tom', 'Country':'France'}]
    sample2 = [{'Country':'France', 'Food':'Frog Legs'}, {'Country':'Australia', 'Food':'Meat Pie'},{'Country':'Australia', 'Food':'Pavlova'}]
    print(left_join(sample1, sample2, 'Country'))
    # outputs:
    # [{'Name': 'James', 'Country': 'Australia', 'Food': 'Meat Pie'},
    #  {'Name': 'James', 'Country': 'Australia', 'Food': 'Pavlova'},
    #  {'Name': 'Tom', 'Country': 'France', 'Food': 'Frog Legs'}]

在可以假设行在其各自数据集中的keyprop值上是唯一的数据集中,实现要简单得多:

def left_join(left_table, right_table, keyprop):
    # create a dictionary indexed by `keyprop` on the left
    left = {row[keyprop]: row for row in left_table}
    # create a dictionary indexed by `keyprop` on the right
    right = {row[keyprop]: row for row in right_table}
    # now simply iterate through the "left side",
    # grabbing rows from the "right side" if they are available
    return [{**leftrow, **right.get(key, {})} for key, leftrow in left.items()]

最新更新