如何使用正则表达式拆分编号列表



我正在尝试将以下格式的大量字符串拆分为python中的字典列表

1(钱妃宫原名真惠庙,后来叫钱灵宫。该寺建于元丰七年(1083年(北宋时期。该寺在明初进行了翻修。1967年,寺庙被拆除,但于1985年重建。主神是钱氏圣妃。次神为广平周王、泰山孔王。刘克庄在咸春年间(1265-1274(创作的石刻《济应钱夫人庙记》(碑文资料,1995:54,第48期(就是关于这座寺庙(碑已不复存在(。2(兴隆社:主要神祇为尊主明王、后土辅仁

我尝试了以下内容,但它也破坏了"48("的字符串。

re.split("\d+(", string(

结果: 1(, 48(, 2(

48(不应该是一个结果。

我正在考虑排除在左括号"("之后的结果,但我不确定如何做到这一点。

试试这个正则表达式:

(?:^|.s)d+)(?=s[A-Z])

解释:

(?:^|.s)(?#start of line or end of sentence)d+)(?#Number followed by bracket)(?=s[A-Z])(?#whitespace then a captital at the start of the sentence)

正则表达式101:https://regex101.com/r/Fierhb/1

在解析长字符串时,PyPiregex模块被证明提供了更快、更稳定的性能。

我建议用pip install regex(或pip3 install regex(安装它,然后运行

import regex
text="1) Qianfeigong 钱妃宫 was originally called the Zhenhuimiao 贞惠庙, and later the Qianlinggong 钱灵宫. The temple was built during the Northern Song in Yuanfeng 7 (1083). The temple was renovated during the early Ming. In 1967 the temple was demolished, but it was rebuilt in 1985. The main god is Qianshi shengfei 钱氏圣妃. Secondary gods are Guangping Zhouwang 广平周王 and Taishan Kongwang 泰山孔王. The stone inscription composed in the Xianchun period (1265–1274) by Liu Kezhuang 刘克庄 entitled 协应钱夫人庙记 (Record of the Temple to Lady Qian of Beneficial Assistance) (Epigraphical Materials, 1995:54, No. 48) is about this temple (stele no longer extant). 2) Xinglongshê 兴隆社: The main gods are Zunzhu mingwang 尊主明王 and Houtu furen 后土夫人."
print(regex.split(r'(?<!([^()]*)(?!^)(?=d+))', text))

请参阅 Python 3 演示,输出:

['1) Qianfeigong 钱妃宫 was originally called the Zhenhuimiao 贞惠庙, and later the Qianlinggong 钱灵宫. The temple was built during the Northern Song in Yuanfeng 7 (1083). The temple was renovated during the early Ming. In 1967 the temple was demolished, but it was rebuilt in 1985. The main god is Qianshi shengfei 钱氏圣妃. Secondary gods are Guangping Zhouwang 广平周王 and Taishan Kongwang 泰山孔王. The stone inscription composed in the Xianchun period (1265–1274) by Liu Kezhuang 刘克庄 entitled 协应钱夫人庙记 (Record of the Temple to Lady Qian of Beneficial Assistance) (Epigraphical Materials, 1995:54, No. 48) is about this temple (stele no longer extant). ', '2) Xinglongshê 兴隆社: The main gods are Zunzhu mingwang 尊主明王 and Houtu furen 后土夫人.']

  • (?<!([^()]*)- 不允许(和除当前位置左侧的()以外的任何 0+ 字符
  • (?!^)- 此时不允许字符串起始位置
  • (?=d+))- 当前位置右侧必须有 1+ 位数字和)

最新更新