Regex表示不是以类似国家代码的前缀开头的9位数字

我正试图在特定文本中筛选出潜在的公民服务号码(荷兰语为BSN(，这些文本中也充满了荷兰电话号码。电话号码以+31国家代码开头，而BSN号码则不是。

有人能帮我想出一个正则表达式来匹配任何不以+<country-code-like-prefix><space>开头的9位数吗？

例如，在句子中：

号码是+31 713176319，650068168是另一个。

我想提取650068168，但不提取713176319。这可能可以通过负面展望来解决，但我没能找到正确的解决方案。

使用负Lookbacking:

(?<!+dd )bd{9}b

这确保了9位数字前面没有("+"后面跟着两位数字，后面跟着一个空格字符(。

演示。

请注意，只有当国家/地区代码为两位数时，这才会起作用，如您的示例所示。要支持一位或三位数字的国家代码，事情会变得有点棘手，因为python不支持非固定宽度的Lookbehinds。然而，您可以使用多个Lookbehinds，如下所示：

(?<!+d )(?<!+d{2} )(?<!+d{3} )bd{9}b

演示。

我建议在此处使用re.findall：

inp = "The number is +31 713176319 and 650068168 is another one."
matches = re.findall(r'(?:^|(?<!S)(?!+d+)S+ )(d{9})b', inp)
print(matches)

此打印：

['650068168']

这里的regex策略是匹配一个9位数的独立数字，当它出现在字符串的最开始，或者它前面有一些"；单词"；(这里的单词大致定义为S+(，它是而不是国家代码前缀。

以下是对所用正则表达式的解释：

(?:
^          from the start of the string
|          OR
(?<!S)    assert that what precedes is whitespace or start of the string
(?!+d+)  assert that what follows is NOT a country code prefix
S+        match the non prefix "word", followed by a space
)
(d{9})        match and capture the 9 digit number
b             word boundary

相关内容

最新更新

热门标签：