根据每个元组的前两个元素筛选元组列表



我有一个元组列表,每个元组包含两个整数,后面跟着一个字符串。我想从这个列表中筛选出子字符串,但它基于元组的前两个整数,而不是字符串本身。

例如

[(0, 7, 'Lorenzo'), (0, 16, 'Lorenzo Malburto'), (3, 7, 'enzo'), (8, 16, 'Malburto'), (9, 13, 'albu'), (24, 32, 'American'), (25, 32, 'merican'), (33, 50, 'singer-songwriter'), (34, 50, 'inger-songwriter'), (44, 47, 'wri'), (44, 50, 'writer'), (53, 61, 'Malburto'), (54, 58, 'albu'), (90, 97, 'Lorenzo'), (93, 97, 'enzo')]

我希望最后的名单是

['Lorenzo Malburto', 'American', 'singer-songwriter', 'Lorenzo', 'Malburto']

我复制了列表,并试图检查每个元组的字符串元素是否是其他任何字符串的子字符串,并且也不等于字符串

for sub in duplicates:
if any(sub in s and sub!= s for s in original_list):
#further actions

但这导致了

['Lorenzo Malburto', 'American', 'singer-songwriter']

"洛伦佐"one_answers"马尔博托"不见了。这就是为什么我想知道这是否可以基于整数来完成。通过这种方式,"enzo"将被过滤掉,因为3-7包含在0-16的范围内,这就是"Lorenzo Malburto",但其中一个"Lorenzo'不会被过滤掉因为90-97不是。

这是如何实现的?或者有更聪明的方法吗?

您只使用索引进行比较:

items = [
(0, 7, 'Lorenzo'),
(0, 16, 'Lorenzo Malburto'),
(3, 7, 'enzo'),
(8, 16, 'Malburto'),
(9, 13, 'albu'),
(24, 32, 'American'),
(25, 32, 'merican'),
(33, 50, 'singer-songwriter'),
(34, 50, 'inger-songwriter'),
(44, 47, 'wri'),
(44, 50, 'writer'),
(53, 61, 'Malburto'),
(54, 58, 'albu'),
(90, 97, 'Lorenzo'),
(93, 97, 'enzo')]
# Added solely for readability
# If you decide to use tuples, you can replace 
#   - item.start with item[0]
#   - item.end with item[1]
#   - item.value with item[2]
import collections
Item = collections.namedtuple('Item', ('start', 'end', 'value'))
items = [Item(*value) for value in items]
def uniquefy(items):
results = []
previous_item = None

for current_item in items:
# If it's the first iteration, define the current range
if previous_item is None:
previous_item = current_item
continue

# Detect if the current item corresponds to a new range
# If the previous range is [0, 5] and the current range is [7, 10]
# (note that 7 > 5), add the previous range to results and update
# the current range
if current_item.start > previous_item.end:
results.append(previous_item)
previous_item = current_item
continue

# Detect if the current item corresponds to the same range but wider
# If the previous range is [0, 5] and the current range is [3, 10]
# (note that 0 < 3 < 5 < 10), update the current range
if current_item.start <= previous_item.start <= previous_item.end <= current_item.end:
previous_item = current_item

# If there's still a value to be added to results
if previous_item is not None:
results.append(previous_item)

return [item.value for item in results]

print(uniquefy(items))
# Outputs ['Lorenzo Malburto', 'American', 'singer-songwriter', 'Malburto', 'Lorenzo']

相关内容

最新更新