Python正则表达式基于数字后面的逗号进行拆分

我有一个大文件，需要从中加载到字符串列表中。每个元素都将包含文本，直到数字后面紧跟一个"，"

例如：

this is some text, value 45789, followed by, 1245, and more text 78965, more random text 5252,

这应该变成：

["this is some text, value 45789", "followed by, 1245", "and more text 78965", "more random text 5252"]

我目前正在执行re.sub(r'([0-9]+),','~', <input-string>)，然后在"~"上进行拆分（因为我的文件不包含~），但这会抛出逗号前的数字。。有什么想法吗？

您可以将re.split与肯定的look-behind断言一起使用：

>>> import re
>>> 
>>> text = 'this is some text, value 45789, followed by, 1245, and more text 78965, more random text 5252,'
>>> re.split(r'(?<=d),', text)
['this is some text, value 45789',
 ' followed by, 1245',
 ' and more text 78965',
 ' more random text 5252',
 '']

如果你想让它也处理空间，可以这样做：

string = "  blah, lots  ,  of ,  spaces, here "
pattern = re.compile("^s+|s*,s*|s+$")
result = [x for x in pattern.split(string) if x]
print(result)
>>> ['blah', 'lots', 'of', 'spaces', 'here']

相关内容

最新更新

热门标签：