正则表达式提取具有未知数字格式的负数



我能够从这个字符串中获取数字:

string_p= 'seven 5 blah 6 decimal 6.5 thousands 8,999 with dollar signs $9,000 and $9,500,001.45 end ... lastly.... 8.4% now end

使用此代码:

import re
def extractVal2(s,n):
if n > 0:
return re.findall(r'[0-9$,.%]+d*', s)[n-1]
else:
return re.findall(r'[0-9$,.%]+d*', s)[n]

for i in range(1,7): 
print extractVal2(string_n,i)

但我不能用它做负数。负数是括号中的数字。

string_n= 'seven (5) blah (6) decimal (6.5) thousands (8,999) with dollar signs $(9,000) and $(9,500,001.45) end lastly.... (8.4)% now end'

我试图首先用这样的负号替换()

string_n= re.sub(r"((d*,?d*))", r"-1", string_n)

然后这些得到负数

r'[0-9$,.%-]+d*', s)[n]
r'[0-9$,.%]+-d*', s)[n]
r'[-0-9$,.%]+-d*', s)[n]

甚至使用不同的方法:

words = string_n.split(" ")
for i in words:
try:
print -int(i.translate(None,"(),"))
except:
pass

您可以将正则表达式更改为:

import re
def extractVal2(s,n):
try:
pattern = r'$?(?[0-9][0-9,.]*)?%?'
if n > 0:
return re.findall(pattern, s)[n-1].replace("(","-").replace(")","")
else:
return re.findall(pattern, s)[n].replace("(","-").replace(")","")
except IndexError as e:
return None    
string_n=  ',seven (5) blah (6) decimal (6.5) thousands (8,999) with dollar ' + 
'signs $(9,000) and $(9,500,001.45) end lastly.... (8.4)%'
for i in range(1,9): 
print extractVal2(string_n,i)

它也会解析9,500,001.45- 并在$之后和数字之前捕获前导(,并将其替换为-符号。不过这是一个黑客 - 它不会"看到"您的(是否没有),并且还会捕获像2,200.200,22这样的"非法"数字。

输出:

-5
-6
-6.5
-8,999
$-9,000
$-9,500,001.45
-8.4%
None

如果您的re.findall(..)没有捕获任何内容(或捕获的内容太少(,并且您正在返回的列表后面编制索引,您可能还应该考虑捕获IndexError


正则表达式允许:

leading literal $       (not interpreded as ^...$ end of string)
optional literal (  
[0-9]                   one digit
[0-9,.%]*               any number (maybe 0 times) of the included characters in any order  
to the extend that it would mach smth like 000,9.34,2
optional literal )
optional literal %

最新更新