如果分隔符重复两次,如何拆分字符串



我需要将字符串"apple/SP++/SW+橙色/NNG++FG+甜瓜/SL+食物/JKG'"转换为元组列表[("apple","SP"(,("+","SW'(,(‘range","NNG'(,"+"、"FG",("melon","SL"(,"food","JKG'(]我想,首先我需要用分隔符"+"拆分字符串,然后用分隔符"/"拆分。

但问题是有两个积极的迹象。第一个加号需要作为分隔符,第二个加号需要保存。如果只使用分隔符"+"拆分字符串,则会删除所有加号:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
x = s.split('+')
print(x)
#['apple/SP', '', '/SW', 'orange/NNG', '', '/FG', 'melon/SL', 'food/JKG']

如果使用分隔符"++"拆分:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
splitted_s = s.plit('++')
print(x)
#['apple/SP', '/SW+orange/NNG', '/FG+melon/SL+food/JKG']

我不知道如何得出[('apple','SP'(,('+','SW'(,[范围','NNG'(,'+','FG(,('梅隆','SL'(,'FOD','JKG'(]的结果

您可以使用正则表达式:

  • +(?=+)-加上另一个加(正向前瞻(
  • |-或
  • +(?!/)-加上不跟正斜杠(负前瞻(

代码:

import re
pattern = r"+(?=+)|+(?!/)"
string = "apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG"
print([s.split("/") for s in re.split(pattern, string)])

输出:

[['apple', 'SP'], ['+', 'SW'], ['orange', 'NNG'], ['+', 'FG'], ['melon', 'SL'], ['food', 'JKG']]

这里有一个解决方案:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
x = s.replace("++", "+/*")
x = x.split('+')
x = [item.replace("*", "+") for item in x]
x = [item.split('/') for item in x]
y = []
for item in x:
y += item
#remove the list items that are ''
for i in range(y.count('')):
y.remove('')
# modified from https://stackoverflow.com/questions/53990075/convert-list-into-list-of-tuples-of-every-two-elements
out = []
it = iter(y)
for i in range(len(y)):
if i % 2 == 0 and i < len(y) - 1:
out.append((y[i], y[i + 1]))
print(out)

结果:

[('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+', 'FG'), ('melon', 'SL'), ('food', 'JKG')]

这个答案与Paul提出的答案相似,但我认为我的答案更简单。

import re
s = "apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG"
pattern = r"((?:+|w+)/w+)"
res = [tuple(m.split("/")) for m in re.findall(pattern, s)]

最新更新