作为练习,我想减少对熊猫的依赖,并在字典列表上构建自定义合并函数。从本质上讲,这是一个左合并,其中保留了原始列表,如果键有多个匹配项,则会添加额外的行。但是,就我而言,似乎添加了额外的行,但具有完全相同的信息。
谁能引导我朝着正确的方向前进,关于这段代码出错的地方?
def merge(self, l2, key):
#self.data is a list of dictionaries
#l2 is the second list of dictionaries to merge
headers = l2[0]
found = {}
append_list = []
for row in self.data:
for row_b in l2:
if row_b[key] == row[key] and row[key] not in found:
found[row[key]] = ""
for header in headers:
row[header] = row_b[header]
elif row_b[key] == row[key]:
new_row = row
for header in headers:
new_row[header] = row_b[header]
append_list.append(new_row)
self.data.extend(append_list)
编辑:这是一些示例输入和预期输出:
self.data = [{'Name':'James', 'Country':'Australia'}, {'Name':'Tom', 'Country':'France'}]
l2 = [{'Country':'France', 'Food':'Frog Legs'}, {'Country':'Australia', 'Food':'Meat Pie'},{'Country':'Australia', 'Food':'Pavlova'}]
我希望 self.data 在通过函数后等于以下内容,参数为"国家":
[{'Name':'James', 'Country':'Australia', 'Food':'Meat Pie'}, {'Name':'James', 'Country':'Australia', 'Food':'Pavlova'}, {'Name':'Tom', 'Country':'France', 'Food':'Frog Legs'}]
下面的函数采用两个字典列表,其中字典都应将keyprop
作为其属性之一:
from collections import defaultdict
from itertools import product
def left_join(left_table, right_table, keyprop):
# create a dictionary indexed by `keyprop` on the left
left = defaultdict(list)
for row in left_table:
left[row[keyprop]].append(row)
# create a dictionary indexed by `keyprop` on the right
right = defaultdict(list)
for row in right_table:
right[row[keyprop]].append(row)
# now simply iterate through the "left side",
# grabbing rows from the "right side" if they are available
result = []
for key, left_rows in left.items():
right_rows = right.get(key)
if right_rows:
for left_row, right_row in product(left_rows, right_rows):
result.append({**left_row, **right_row})
else:
result.extend(left_rows)
return result
sample1 = [{'Name':'James', 'Country':'Australia'}, {'Name':'Tom', 'Country':'France'}]
sample2 = [{'Country':'France', 'Food':'Frog Legs'}, {'Country':'Australia', 'Food':'Meat Pie'},{'Country':'Australia', 'Food':'Pavlova'}]
print(left_join(sample1, sample2, 'Country'))
# outputs:
# [{'Name': 'James', 'Country': 'Australia', 'Food': 'Meat Pie'},
# {'Name': 'James', 'Country': 'Australia', 'Food': 'Pavlova'},
# {'Name': 'Tom', 'Country': 'France', 'Food': 'Frog Legs'}]
在可以假设行在其各自数据集中的keyprop
值上是唯一的数据集中,实现要简单得多:
def left_join(left_table, right_table, keyprop):
# create a dictionary indexed by `keyprop` on the left
left = {row[keyprop]: row for row in left_table}
# create a dictionary indexed by `keyprop` on the right
right = {row[keyprop]: row for row in right_table}
# now simply iterate through the "left side",
# grabbing rows from the "right side" if they are available
return [{**leftrow, **right.get(key, {})} for key, leftrow in left.items()]