我有一个字符串S(单词从0开始索引(和一个子字符串Q。我希望找到 S 中的最小范围 [L, R],其中包含 Q 中的所有单词。Q 中没有重复的单词。我该如何处理?
例如
输入: S:懒惰的棕色狐狸跳过另一只棕色的狐狸,懒狗吃了狐狸的食物呢Q:懒惰的棕色狗
输出: [11,15]
我的代码:
S = raw_input().strip().split(' ')
Q = raw_input().strip().split(' ')
count = [0 for x in range(len(Q))]
smallest_index = [0 for x in range(len(Q))]
largest_index = [0 for x in range(len(Q))]
for i in range(len(S)):
for j in range(len(Q)):
if S[i] == Q[j]:
count[j] += 1
if count[j] <= 1:
smallest_index[j] = i
largest_index[j] = i
if count[j] > 1:
largest_index[j] = i
largest_index.sort()
print "[%d," % largest_index[0],
print "%d]" % largest_index[len(Q)-1]
这段代码不是特别有效,但它确实可以正常工作。也许有人会设计出比使用product
更好的处理位置信息的方法。同时,您可以使用此代码来测试其他算法。
from itertools import product
def words_range(src, query):
# Create a dict to store the word positions in src of each query word
pos = {s: [] for s in query}
for i, s in enumerate(src):
if s in pos:
pos[s].append(i)
print(pos)
# Find all the ranges that hold all the query word
ranges = ((min(t), max(t)) for t in product(*pos.values()))
# Find the smallest range
return min(ranges, key=lambda t:t[1] - t[0])
# Test
src = '''what about the lazy brown fox that jumped over the other
brown one which lazy dog ate the food of the fox'''.split()
for i, s in enumerate(src):
print(i, s)
query = 'lazy brown dog'.split()
print(words_range(src, query))
query = 'the lazy brown fox'.split()
print(words_range(src, query))
输出
0 what
1 about
2 the
3 lazy
4 brown
5 fox
6 that
7 jumped
8 over
9 the
10 other
11 brown
12 one
13 which
14 lazy
15 dog
16 ate
17 the
18 food
19 of
20 the
21 fox
{'lazy': [3, 14], 'brown': [4, 11], 'dog': [15]}
(11, 15)
{'the': [2, 9, 17, 20], 'lazy': [3, 14], 'brown': [4, 11], 'fox': [5, 21]}
(2, 5)
这是PM 2Ring解决方案的一个稍微更有效的版本,用循环代替了对product
的调用:
from itertools import product
def words_range(src, query):
query = set(query)
# Create a dict to store the word positions in src of each query word
pos = {s: [] for s in query}
for i, s in enumerate(src):
if s in pos:
pos[s].append(i)
# Find all the ranges that hold all the query word
# We'll iterate over the input string and keep track of
# where each word appeared last
last_pos = {}
ranges = []
for i, word in enumerate(src):
if word in query:
last_pos[word] = i
if len(last_pos) == len(query):
ranges.append( (min(last_pos.values()), i) )
# Find the smallest range
return min(ranges, key=lambda t:t[1] - t[0])
这不是一个线性的时间(因为循环中的min(last_pos.values())
(,但它是朝着正确方向迈出的一步。可能有一种方法可以摆脱min
调用(我现在想不起来(,这将使它变得线性。
这是基于@PM 2Ring 答案的另一种方法:
S ='what about the lazy brown fox that jumped over the other brown one which lazy dog ate the food of the fox'
Q ='lazy brown dog'
import itertools
track={}
for index,value in enumerate(S.split()):
if value in Q:
if value not in track:
track[value]=[index]
else:
track[value].append(index)
combination = [(min(item),max(item)) for item in itertools.product(*track.values())]
result=min([(i[1]-i[0],(i[0],i[1])) for i in combination if set(Q.split()).issubset(S.split()[i[0]:i[1]+1])])
print(result[1])
输出:
(11, 15)