将列表拆分为开始和关闭令牌之间的元素子列表



我有一个字符串列表(见下文)。我想通过查找两个特定的标记(开始和结束)来获取列表中的元素,然后保存这些标记之间存在的所有字符串。

例如,我有下面的列表,我想获取字符串'RATED''Like'之间的所有字符串。这些子序列也可以多次出现。

['RATED',
 '  Awesome food at a good price .',
 'Delivery was very quick even on New Yearxe2x80x99s Eve .',
 'Please try crispy corn and veg noodles From this place .',
 'Taste maintained .',
 'Like',
 '1',
 'Comment',
 '0',
 'Share',
 'Divyansh Agarwal',
 '1 Review',
 'Follow',
 '3 days ago',
 'RATED',
 '  I have tried schezwan noodles and the momos with kitkat shake',
 "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone",
 'Like']

我尝试了不同的方法,例如正则表达式,但没有一种可以解决问题。

您可以使用正则表达式首先,您需要使用一些不会出现在文本中的分隔符来连接列表

delimiter = "#$#"
bigString = delimiter + delimiter.join(yourList) + delimiter

之后,您可以使用正则表达式

results = re.findall(r'#$#RATED#$#(.*?)#$#Like#$#', bigString)

现在你只需要迭代所有结果并使用分隔符拆分字符串

for s in results:
    print(s.split(delimiter))

我建议您了解序列类型的索引查找和切片:

  • https://docs.python.org/3.7/library/stdtypes.html#common-sequence-operations

例:

def group_between(lst, start_token, end_token):
    while lst:
        try:
            # find opening token
            start_idx = lst.index(start_token) + 1
            # find closing token
            end_idx = lst.index(end_token, start_idx)
            # output sublist
            yield lst[start_idx:end_idx]
            # continue with the remaining items
            lst = lst[end_idx+1:]
        except ValueError:
            # begin or end not found, just skip the rest
            break
l = ['RATED','  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', 'Like', 
     '1', 'Comment', '0', 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', '3 days ago',
     'RATED', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 'Like'
]
for i in group_between(l, 'RATED', 'Like'):
    print(i)

输出为:

['  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .']
['  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]

使用正则表达式你可以做到这一点的方式.

a= ['RATED','  Awesome food at a good price .', 
 'Delivery was very quick even on New Year’s Eve .', 
 'Please try crispy corn and veg noodles From this place .', 
 'Taste maintained .', 'Like', '1', 'Comment', '0', 
 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', 
 '3 days ago', 'RATED', 
 '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 
 'Like']

import re
string = ' '.join(a)
b = re.compile(r'(?<=RATED).*?(?=Like)').findall(string)
print(b)

输出

['   Awesome food at a good price . Delivery was very quick even on New Year’s Eve . Please try crispy corn and veg noodles From this place . Taste maintained . ',
 "   I have tried schezwan noodles and the momos with kitkat shake And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone "]

你可以尝试例如

rec = False
result = []
for s in lst:
    if s == 'Like':
        rec = False
    if rec:
        result.append(s)
    if s == 'RATED':
        rec = True

结果

#[' Awesome food at a good price .',
# 'Delivery was very quick even on New Year’s Eve .',
# 'Please try crispy corn and veg noodles From this place .',
# 'Taste maintained .',
# ' I have tried schezwan noodles and the momos with kitkat shake',
# "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]
def find_between(old_list, first_word, last_word):
    new_list = []
    flag = False
    for i in old_list:
        if i is last_word:
            break
        if i is first_word:
            flag = True
            continue
        if flag:
            new_list.append(i)
    return new_list

没有标志的选项:

new_list = []
group = [] # don’t need if the list starts with 'RATED'
for i in your_list:
    if i == 'RATED':
        group = []
    elif i == 'Like':
        new_list.append(group[:])
    else:
        group.append(i)
您可以使用

以下代码,该代码使用简单的for循环:

l = ['RATED','  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', 'Like', 
     '1', 'Comment', '0', 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', '3 days ago',
     'RATED', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 'Like'
]
st, ed, aa = None, None, []
for k, v in enumerate(l):
    if v == "RATED":
        st = k
    if v == "Like":
        ed = k
    if st != None and ed!= None:
        aa.extend(l[st+1: ed])
        st = None
        ed = None
print (aa)
# ['  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]

最新更新