有条件地将列表拆分为2个子列表,支持连续通过



我想逐步将值从一个列表过滤到子列表中。每次条件匹配时,我都希望在下一个筛选器中忽略该值。

例如,假设我想a(获取可被3整除的项目,b(获取奇数项目,c(保留其余项目

li = [0,1,2,3,4,5,6,7,8,9]

我想得到:

divby3 = [3,6,9]
odd = [1,5,7]
rest =[0,2,4,8]

itertools中有什么东西可以做到这一点吗?我写了一些测试代码,但它看起来可能已经存在了。最具表现力,也是最快的是:

def append_split(li,cond):
""" list comp appends to 2 separate lists """
hits, miss = [],[]
[hits.append(v) if cond(v) else miss.append(v) for v in li]
return hits, miss
p1_divby3, li = append_split(li, is_3)
p2_odd, p3_rest = append_split(li, is_odd)

或其建议的替代方案:

def looped_append(li, cond):
""" for-loop to avoid side-effects within list comp """
hits, miss = [],[]
for v in li: 
(hits if cond(v) else miss).append(v)
return hits, miss

在标准库中有更好的方法吗?

我(在10000件商品清单上(的表现如下:

timings:
0.00301504 by_filter_prune_set
0.00335598 by_append_split
0.00498891 by_tupling
0.56877589 by_memberships

完整测试代码:

import sys
from time import time
if len(sys.argv) >=2 :
li = range(0,int(sys.argv[1]))
do_compare = False
else:
li = [0,1,2,3,4,5,6,7,8,9]
do_compare = True
exp = dict(
p1_divby3 = [3,6,9],
p2_odd = [1,5,7],
p3_rest =[0,2,4,8],
)
def is_3(v): 
return v and not (v % 3)
def is_odd(v): 
return bool(v % 2)
def get_result(di):
return {k:v for k,v in sorted(di.items()) if k in exp}
def by_memberships(li):
""" SLOWEST.  filter checks that item wasn't previously extracted """
p1_divby3 = [v for v in li if is_3(v)]
p2_odd = [v for v in li if is_odd(v) and not v in p1_divby3]
p3_rest = [v for v in li if not v in p1_divby3 and not v in p2_odd]
return get_result(locals())
def prune_set(candidates, seen):
""" filter, then prune found from list."""
seen = set(seen)  #really slow if you dont cast to a set
return [v for v in candidates if not v in seen]
def by_filter_prune_set(li):
p1_divby3 = [v for v in li if is_3(v)]
li = prune_set(li, p1_divby3)
p2_odd = [v for v in li if is_odd(v)]
p3_rest = prune_set(li, p2_odd)
return get_result(locals())
def looped_append(li, cond):
# from comments, also slighty faster than append_split
hits, miss = [],[]
for v in li: 
(hits if cond(v) else miss).append(v)
return hits, miss
def by_looped_append(li):
p1_divby3, li = looped_append(li, is_3)
p2_odd, p3_rest = looped_append(li, is_odd)
return get_result(locals())

def append_split(li,cond):
""" list comp appends to 2 separate lists """
hits, miss = [],[]
[hits.append(v) if cond(v) else miss.append(v) for v in li]
return hits, miss
def by_append_split(li):
p1_divby3, li = append_split(li, is_3)
p2_odd, p3_rest = append_split(li, is_odd)
return get_result(locals())
def split_tupling(li, cond):
""" put into a (hit, miss) tuple then re-filter into 2 lists"""
undefined = NotImplemented
li = [(v, undefined) if cond(v) else (undefined, v) for v in li  ]
hits = [v[0] for v in li if v[0] is not undefined]
miss = [v[1] for v in li if v[0] is undefined]
return hits, miss
def by_tupling(li):
p1_divby3, li = split_tupling(li, is_3)
p2_odd, p3_rest = split_tupling(li, is_odd)
return get_result(locals())
timings = {}
for fn in [by_memberships, by_looped_append, by_append_split, by_tupling, by_filter_prune_set]:
sys.stdout.write(f"nn{fn.__name__:20.20}")
start = time()
got = fn(li)
duration = time()-start
sys.stdout.write(f" {duration:10.8f}n")
timings[fn.__name__] = duration
if do_compare:
if got == exp:
flag = "✅"
else:
flag = "❌"
print(f"{flag}{exp=}n{flag}{got=}")
li = sorted([(v,k) for k,v in timings.items()])
print("nntimings:")
[print(f"{tu[0]:010.8f} {tu[1]}") for tu in li]

下面是使用more_itertools.partition的最小工作示例答案。

from more_itertools import partition
li = [0,1,2,3,4,5,6,7,8,9]
def is_3(v): 
return v and not (v % 3)
def is_odd(v): 
return bool(v % 2)
def by_partition(li): 
""" using more_itertools.partition(pred, iterable) """ 
li2, p1_divby3 = partition(is_3, li) 
p3_rest, p2_odd = partition(is_odd, li2) 
return tuple(map(list, [p1_divby3, p2_odd, p3_rest]))
div_by_3, odd, rest = by_partition(li)

我只能补充一点,如果一个人多次遇到这种情况,那么写一个更通用的函数可能是很好的,它会根据几个条件将一个可迭代函数拆分为几个可迭代函数。

附言:谢谢你的代码!

最新更新