我需要基于一系列正则模式来拆分字符串。我能够单独应用分配,但问题是递归分开的。
例如,我有这个句子:
"I want to be splitted using different patterns. It is a complex task, and not easy to solve; so, I would need help."
我需要基于",",";和"。"。
Resulst应该是5个句子,例如:
"I want to be splitted using different patterns."
"It is a complex task,"
"and not easy to solve;"
"so,"
"I would need help."
到目前为止我的代码:
import re
sample_sentence = "I want to be splitted using different patterns. It is a complex task, and not easy to solve; so, I would need help."
patterns = [re.compile('(?<=.) '),
re.compile('(?<=,) '),
re.compile('(?<=;) ')]
for pattern in patterns:
splitted_sentences = pattern.split(sample_sentence)
print(f'Pattern used: {pattern}')
如何在不失去结果并获得预期结果的情况下应用不同的模式?
编辑:我需要一个一个一个模式,因为我需要在每个模式的结果中进行一些检查,因此以某种树算法运行它。很抱歉没有完全解释,在我的脑海中很明显,但我认为它不会有副作用。
您可以使用|
:
import re
s = "I want to be splitted using different patterns. It is a complex task, and not easy to solve; so, I would need help."
result = re.split('(?<=.)s|,s*|;s*', s)
输出:
['I want to be splitted using different patterns.', 'It is a complex task', 'and not easy to solve', 'so', 'I would need help.']
python在re
中具有尝试
re.split('; | , | . ',ourString)
我想不出一个正则是这样做的。因此,您可以做什么,它可以用自定义定义的定义符,例如 $ $ ,然后根据此定界符拆分句子。
new_sent = re.sub('[.,;]', '$DELIMITER$', sent)
new_sent.split('$DELIMITER$')
这将导致以下内容:
['I want to be splitted using different patterns',
' It is a complex task',
' and not easy to solve',
' so',
' I would need help',
'']
注意:上面的输出具有附加的空字符串。这是因为句子结尾处有一段时间。为避免这种情况,您可以从列表中删除该空元素,也可以在句子结尾处替换自定义定义定义的定界符。
。new_sent = re.sub('[.,;]', '$DELIMITER$', sent)
new_sent = re.sub('$DELIMITER$$', '', new_sent)
new_sent.split('$DELIMITER$')
,如果您有一个分界符列表,则可以使用以下代码使您的正则模式:
delimiter_list = [',', '.', ':', ';']
pattern = '[' + ''.join(delimiter_list) + ']' #will result in [,.:;]
new_sent = re.sub(pattern, '$DELIMITER$', sent)
new_sent = re.sub('$DELIMITER$$', '', new_sent)
new_sent.split('$DELIMITER$')
我希望这会有所帮助!
使用字符类使用lookBehind:
import re
s = "I want to be splitted using different patterns. It is a complex task, and not easy to solve; so, I would need help."
result = re.split('(?<=[.,;])s', s)
print(result)
输出:
['I want to be splitted using different patterns.',
'It is a complex task,',
'and not easy to solve;',
'so,',
'I would need help.']