使用正则表达式提取控股公司

给定一个字符串，该字符串遵循-的结构

" (subsidiary of <holding_company>) <post_>"

其中

holding_company可能包含字母表&包括括号在内的一些特殊字符
post_可以包含任何字符

示例字符串："谷歌(alphabet(股份有限公司(的子公司(xyz"；

如何使用正则表达式提取控股公司名称？

要提取的正则表达式如下：

"subsidiary ofs+(.*))s+S+"

在Python2代码中，您可以执行以下操作：

import re
regex = r"subsidiary ofs+(.*))s+S+"
test_str = "" (subsidiary of <holding_company>) <post_>""
m = re.search(regex, test_str)
if m:
# if it found the pattern, the company name is in group(1)
print m.group(1)

请在此处查看其实际操作：https://repl.it/repls/ShyFocusedInstructions#main.py

这会让你达到目标：

(?<=(subsidiary of)(.*)(?=) )

这将为您的控股公司和职位创建捕获组。您可能需要展开正则表达式以包含其他特殊字符。如果需要扩展，下面是regex101上的regexhttps://regex101.com/r/xpVfqU/1

#!/usr/bin/python3
import re
str=" (subsidiary of <holding_company>) <post_>"
holding_company=re.sub(r's(subsidiary of ([w<>]*))s*(.*)', '\1', str)
post=re.sub(r's(subsidiary of ([w<>]*))s*(.*)', '\2', str)
print(holding_company)
print(post)

相关内容

最新更新

热门标签：