如何使用python根据列表中的一些关键元素拆分数据?



以下是我返回的列表

['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', 'nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', 'nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd', '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', 'nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', 'nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']

从元素'ccc'之后开始,如果元素'ddd'不存在,或者从'ddd'开始,直到下一个元素'aaa',它将得到,我可以得到以下字符串。

ABN AMRO Bank N.V.
Your monthly statement is
available under Self service >
Download statements or u
receive them by mail.
/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF
Settlement FX/MM
Trans. Ref. 0035979579
Deal Ticket ID 6225447
有谁能帮帮我吗?在尝试这个时,我在嵌套的for循环中弄乱了。谢谢!

你可以试试:

L = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', 'nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', 'nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd',
'/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', 'nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', 'nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
i = 0
S = None
while True:
try:
_L = L[i:]
o = _L.index('ccc') + 1
if _L[o] == 'ddd':
o += 1
S = []
while _L[o] != 'aaa':
S.append(_L[o])
o += 1
print(' '.join(S))
S = None
i += o
except (IndexError, ValueError):
if S:
print(' '.join(S))
break

您可以尝试使用regex如下:


import re
data = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', 
'nYour', 'monthly', 'statement', 'is', 'available', 'under', 
'Self', 'service', '>', 'nDownload', 'statements', 'or', 'u',
'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd',
'/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', 
'nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 
'ccc', 'Settlement', 'FX/MM', 'nTrans.', 'Ref.', '0035979579', 'Deal', 
'Ticket', 'ID', '6225447']
#flatten the list
one_line = ' '.join(data)
#substitue groups 'aaa bbb ccc' and 'aaa bbb ccc ddd' with newline chars
print(re.sub(r'(aaa bbb ccc) | (aaa bbb ccc ddd)', 'nn', one_line).lstrip())

输出:

ABN AMRO Bank N.V. 
Your monthly statement is available under Self service > 
Download statements or u receive them by mail.
/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/ 
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF 
Settlement FX/MM 
Trans. Ref. 0035979579 Deal Ticket ID 6225447

您可以用换行符替换aaa, bbb, cccddd,然后在多个换行符上分割:

import re
data = ['aaa', 'bbb', 'ccc', 'ABN', 'AMRO', 'Bank', 'N.V.', 'nYour', 'monthly', 'statement', 'is', 'available', 'under', 'Self', 'service', '>', 'nDownload', 'statements', 'or', 'u', 'receive', 'them', 'by', 'mail.', 'aaa', 'bbb', 'ccc', 'ddd', '/TRTP/SEPA', 'OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/', 'nPointbar', 'B.V./REMI/INV', '121-10005/EREF/NONREF', 'aaa', 'bbb', 'ccc', 'Settlement', 'FX/MM', 'nTrans.', 'Ref.', '0035979579', 'Deal', 'Ticket', 'ID', '6225447']
data = [' ' if i in ['aaa', 'bbb', 'ccc', 'ddd'] else i for i in data]
data = ' '.join([i for i in data]).strip()
data = re.split('sss+', data)

这将为您提供所需组的列表

print('nn'.join(data)):

ABN AMRO Bank N.V. 
Your monthly statement is available under Self service > 
Download statements or u receive them by mail.
/TRTP/SEPA OVERBOEKING/IBAN/NL93RABO0127299726/BIC/RABONL2U/NAME/ 
Pointbar B.V./REMI/INV 121-10005/EREF/NONREF
Settlement FX/MM 
Trans. Ref. 0035979579 Deal Ticket ID 6225447