正则表达式表示短语中的多个换行符

我正在用Python学习正则表达式，我想准备一个RE来匹配并收集以下输入中的句子：

食物：蛋糕：由面粉、糖和其他原料制成的烘焙甜食

电子学：计算机：进行计算机编程操作的机器
计算机主要由CPU、显示器、键盘和鼠标组成

汽车：汽车：汽车是一种用于运输的四轮机动车辆

我的预期输出应该给我类别、项目和该项目的描述。因此，对于第一个项目，蛋糕，RE应该分组"；食品"蛋糕"由面粉、糖和其他原料制成的烘焙甜食&"；。

我当前的RE如下：

[0-9]+s*.s*(w*)s*:s*(w*)s*:s*(.*)

这似乎适用于没有换行符的描述项目。如果它有换行符，即本例中的Computer，则RE仅将其描述与换行符相匹配。RE丢弃该描述中的第二句话。

请帮我理解我在这里错过了什么。

如果类别、项目和描述用双换行符分隔，则可以使用此示例对其进行解析(regex101(：

import re
txt = '''1. Food : Cake : Baked sweet food made from flour, sugar and other ingredients.
2. Electronics : Computer : A machine to carry out a computer programming operation.
Computers mainly consists of a CPU, monitor, keyboard and a mouse.
3. Automobile : Car : Car is a four wheeled motor vehicle used for transportation.'''

for cat, item, desc in re.findall(r'^(?:d+).([^:]+):([^:]+):(.*?)(?:nn|Z)', txt, flags=re.M|re.S):
print(cat)
print(item)
print(desc)
print('-' * 80)

打印：

Food 
Cake 
Baked sweet food made from flour, sugar and other ingredients.
--------------------------------------------------------------------------------
Electronics 
Computer 
A machine to carry out a computer programming operation.
Computers mainly consists of a CPU, monitor, keyboard and a mouse.
--------------------------------------------------------------------------------
Automobile 
Car 
Car is a four wheeled motor vehicle used for transportation.
--------------------------------------------------------------------------------

这可能是一种基本的方法，但它适用于您提供的示例输入：

[0-9]+s*.s*(w*)s*:s*(w*)s*:s*((?:.*[nr]?)+?)(?=$|ds*.)

基本上，我们在描述中尽可能多地使用文本(包括换行符(，直到到达文件的末尾或另一个数字索引。

你可以在这里看到的实现

相关内容

最新更新

热门标签：