我有一个列表,例如["scissors", "rock", "rock", "paper", "rock" ... (more than 100,000 items)]
并希望在列表中查找诸如["rock", rock", "paper"]
之类的项目数组,找到所有相同的模式,并在所有情况下识别遵循该模式的下一个项目。
例如
original list = ["scissors", "rock", "rock", "paper", "rock", "scissors", "rock", "paper", "scissors"]
我要识别的模式 = ["rock", "paper"]
(上面的列表中有 2 个(
我正在寻找的图案的最终下一项="岩石"和"剪刀"。
我该如何编码?
这个怎么样,
def indexofsublist(l, sublist):
l1, l2 = len(l), len(sublist)
for idx in range(l1):
if l[idx:idx+l2] == sublist:
return idx
original_list = ["scissors", "rock", "rock", "paper", "rock", "scissors", "rock", "paper", "scissors"]
identify = ["rock", "paper"]
idx = indexofsublist(original_list, identify) # 2
next_idx = idx + len(identify) # 4
target = ["rock", "paper"]
original = ["scissors", "rock", "rock", "paper", "rock", "scissors", "rock", "paper", "scissors"]
def combinations(iterable, length):
return [iterable[i: i + length] for i in range(len(iterable) - length + 1)]
combinations = combinations(original, 2)
indices = [i for i, x in enumerate(combinations) if x == target]
for index in indices:
print(combinations[index+1][-1])
输出:
rock
scissors
代码做了什么:
- 使用方法
combinations
打印所有 2 个连续元素组合 - 查找所有匹配项索引。
- 打印
combinations[index+1]
的最后一个元素,这是您要查找的内容。
这将收集模式第一部分发生的索引:
original = ["scissors", "rock", "rock", "paper",
"rock", "scissors", "rock", "paper", "scissors"]
pattern = ["rock", "paper"]
indices = []
for index, item in enumerate(original):
if item == pattern[0] and index + 1 < len(original):
if original[index + 1] == pattern[1]:
indices.append(index)
# where in the original is the pattern found? starting at index...
print(indices)
# how many times was the pattern found?
print(len(indices))
# Result:
# [[2, 6]
# 2
如果要查找多个模式并确定每个模式在原始列表中出现的位置以及出现频率:
original = ["scissors", "rock", "rock", "paper",
"rock", "scissors", "rock", "paper", "scissors"]
patterns = [["rock", "paper"], ["scissors", "rock"]]
def look_for_patterns(ori, pat):
indices = []
length = len(ori)
for p in pat:
sublst = []
for index, item in enumerate(ori):
if item == p[0] and index + 1 < length:
if ori[index + 1] == p[1]:
sublst.append(index)
indices.append(sublst)
return indices, [len(i) for i in indices]
# where in the original is the pattern found? starting at index...
print(look_for_patterns(original, patterns)[0])
# how many times was the pattern found?
print(look_for_patterns(original, patterns)[1])
# Result:
# [[2, 6], [0, 5]]
# [2, 2]
或者这个:
pattern = ["rock", "paper"]
lenpat=len(pattern)
original = ["scissors", "rock", "rock", "paper","rock", "scissors", "rock", "paper", "scissors"]
index=[]
for i in range(len(original)-lenpat):
if original[i:i+lenpat]==pattern:
index.append(i)
print original
print pattern
print index
到目前为止,答案的问题在于,不幸的是,每个查询的答案都很棒,但都是 O(n(。您需要一个后缀 trie 来回答 O(k( 中的查询,其中 k 是模式长度。
这是一个很好的库,用于相同的 https://github.com/ptrus/suffix-trees pip 安装后缀树
但是,在您的情况下,还需要做更多的处理。由于选择只能来自"岩石","纸张"和"剪刀"(假设蜥蜴和斯波克稍后不会加入:-P(规范化并用"r","p"和"s"替换它们
使用" .join(new_arr( 并准备好使用上面的 github 链接。如果您有疑问或想要更多解释,请发表评论。
如果你有一个很长的列表要浏览,递归可能会被调整(而且更有趣(:
def find_pattern_and_followings(pattern, rest, followings=None,
pattern_indexes=None, curr_index=0):
if len(rest) < len(target):
return followings, pattern_indexes
followings = [] if followings is None else followings
pattern_indexes = [] if pattern_indexes is None else pattern_indexes
# Check if the first elements match the pattern
# and move to next elements
if len(rest) >= len(target):
if rest[0:len(target)] == target:
pattern_indexes.append(curr_index)
if len(rest) > len(target):
followings.append(rest[len(target)])
rest = rest[len(target):]
curr_index += len(target)
else:
rest = rest[1:]
curr_index += 1
return(find_pattern_and_followings(pattern, rest,
followings=followings,
pattern_indexes=pattern_indexes,
curr_index=curr_index))
返回:
(['rock', 'scissors'], [2, 6])
该函数的作用是逐项浏览列表项,如果列表的第一项与模式匹配,则它存储有趣的信息。然后,它会删除已扫描的元素并重新开始,直到列表太短而无法包含图案。