基于单独"key"列表中的值合并列表



我有一个如下列表:

test = [[2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7], [4, 6, 3, 2, 4, 5, 3, 5], [5, 3, 2, 4], [4, 3, 5, 2, 6]]

另一个列表key描述了原始列表需要如何合并:

key = ["one", "two", "one", "two"]
我希望合并">

一",并将"二"合并到原始test列表中。

输出应如下所示:

[[2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7, 5, 3, 2, 4], [4, 6, 3, 2, 4, 5, 3, 5, 4, 3, 5, 2, 6]]

如何做到这一点?

我相信字典是最合适的解决方案。 字典允许您轻松跟踪哪个分区与哪个键相关联。 如果只使用包含值的列表,则可能更难将分区映射到键。

以下是使用collections.defaultdict的解决方案:

dct = defaultdict(list)
for i, e in enumerate(key):
dct[e].extend(test[i])
# defaultdict(list,
#        {'one': [2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7, 5, 3, 2, 4],
#         'two': [4, 6, 3, 2, 4, 5, 3, 5, 4, 3, 5, 2, 6]})
# If you want the values
print(list(dct.values()))

输出:

[[2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7, 5, 3, 2, 4], [4, 6, 3, 2, 4, 5, 3, 5, 4, 3, 5, 2, 6]]

我建议您以下答案,而无需导入并保持结果列表中键的顺序。这在执行时间方面没有优化,但易于阅读。另请注意,如果列表keytest的长度不同,则算法将以最短长度运行,而不会引发任何错误(zip的行为(:

test = [[2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7], [4, 6, 3, 2, 4, 5, 3, 5], [5, 3, 2, 4], [4, 3, 5, 2, 6]]
key = ["one", "two", "one", "two"]
d = {}
orderedKeys = []
for k,t in zip(key,test):
if k in d.keys():
d[k] += t
else:
d[k] = t
orderedKeys.append(k)

print([d[k] for k in orderedKeys])
# [[2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7, 5, 3, 2, 4], [4, 6, 3, 2, 4, 5, 3, 5, 4, 3, 5, 2, 6]]

你可以:

  • zip()将创建[(first, second), ...]的两个列表,其中first来自keysecond是您要分组的值
  • 排序 (sorted()( 在first
  • 组 (itertools.groupby()( 在first
  • 压平(itertools.chain.from_iterable()(second

例如:

In []:
import operator as op
import itertools as it
first, second = op.itemgetter(0), op.itemgetter(1)
[list(it.chain.from_iterable(map(second, g)))
for k, g in it.groupby(sorted(zip(key, test), key=first), first)]
Out[]:
[[2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7, 5, 3, 2, 4], [4, 6, 3, 2, 4, 5, 3, 5, 4, 3, 5, 2, 6]]

可扩展的香草解决方案:

此解决方案不导入任何内容。

#all_keys is complete and ordered
all_keys = ["one","two","three","four","five","six","seven","eight","nine"]
max_keys = len(all_keys)
output =[[]*max_keys]
test = [[2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7], [4, 6, 3, 2, 4, 5, 3, 5], [5, 3, 2, 4], [4, 3, 5, 2, 6]]
key = ["one", "two", "one", "two"]
for i,entry in enumerate(test):
output[all_keys.index(key[i])]+=entry

我提出这个解决方案:

test = [[2, 4, 2, 4, 3, 5, 6, 6, 3, 2, 3, 3, 3, 7], 
[4, 6, 3, 2, 4, 5, 3, 5], 
[5, 3, 2, 4], 
[4, 3, 5, 2, 6]]
key = ["one", "two", "one", "two"]
if len(test) != len(key):
raise Exception
else:
unique = list(set(key))
total = []
for x in unique:
pair = (x, [])
total.append(pair)
for i in range(len(key)):
s = (key[i], test[i])
for i in range(len(total)):
if total[i][0] == s[0]:
total[i] = tuple([total[i][0],total[i][1]+s[1]])

首先,我使用set来避免在keys列表中重复值,一旦我有了唯一值,我就会遍历这两个值,然后我创建一个由(key, array_value)组成的元组,并找到我的总数组在哪里附加块。

最新更新