如何使用正则表达式将多行文本块解析为dict



我有这个多行文本:

1. fef w fwe fwe
fewfa 2. fwa f
fwefwfw gw
2 2f 23. f
g gegwg
32. gre34 g3 1. gr
egsg

我想在行开头使用该号码作为密钥(使用 .作为分离char(。
由此结果必须是:

{
    "1": "fef w fwe fwe fewfa 2. fwa f fwefwfw gw",
    "2": "2f 23. f g gegwg",
    "32": "gre34 g3 1. gr egsg"
}

您可以使用此正则:

/^(d+).?s+(.*?)(?=(?:^d+.?)|Z)/gms
 ^                                       assert start of line
    ^                                    capture 1 or more digits
       ^                                 optional literal . 
           ^                             one or more spaces
               ^                         every character including n  
                    ^                    lookahead to next block start or end                                 
                                     ^   flags M for multiline and S to have 
                                         dot match all     

演示

然后您可以像这样创建dict:

>>> dict(re.findall(r'^(d+).?s+(.*?)(?=(?:^d+.?)|Z)', s, re.M|re.S))
{'1': 'fef w fwe fwenfewfa 2. fwa fnfwefwfw gwn', '32': 'gre34 g3 1. grnegsg', '2': '2f 23. fng gegwgn'}

最新更新