在Python中使用Regex在数值数量值前后添加数量标识符



我在Python中使用Regex在Numeric quantity值前后添加数量标识符。

基本上,我必须添加QtyOrd单位单词,这些单词不在数字量之后的文本中。

例如:

'PartNo-001A description 20 units some other description' => 'PartNo-001A description QtyOrd 20 units some other description'
'PartNo-001A description QtyOrd 20 some other description' => 'PartNo-001A description QtyOrd 20 units some other description'
'PartNo-001A description QtyOrd 20' => 'PartNo-001A description QtyOrd 20 units'
'PartNo-001A QtyOrd 20' => 'PartNo-001A QtyOrd 20 units'
'PartNo-001A 20 units'=> 'PartNo-001A QtyOrd 20 units'

我使用的代码如下:

import re
def process_QtyOrd( text):
for x in re.findall("(qtyord [0-9]+ units| [0-9]+ units|qtyord [0-9]+|qtyord[0-9]+units )", text.lower()):
Text_Intermediate = "OrderQty " + str(re.search("[0-9]+", x).group()) + " Units"

Text_Final = re.sub("(qtyord [0-9]+ units|[0-9]+ units|qtyord [0-9]+|qtyord [0-9]+ units)", Text_Intermediate, text, flags= re.IGNORECASE)
return Text_Final
text1 = 'PartNo-001A description 20 units some other description'
text2 = '''
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 units some other description
''' 
text3 = 'PartNo-001A description QtyOrd 20 some other description'
text4 = 'PartNo-001A description QtyOrd 20'
text5 = 'PartNo-001A QtyOrd 20'
text6 = 'PartNo-001A 20 units'
text7 = ''' 
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 units some other description PartNo-001A

''' 
text8 = ''' 
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
PartNo-001A
QtyOrd 
20
''' 

然后:

print(process_QtyOrd(text1))
print(process_QtyOrd(text2))
print(process_QtyOrd(text3))
print(process_QtyOrd(text4))
print(process_QtyOrd(text5))
print(process_QtyOrd(text6))
print(process_QtyOrd(text7))
print(process_QtyOrd(text8))

对于text8,代码不起作用。你能帮我一下吗?

输出应该是这样的:

1. PartNo-001A description QtyOrd 20 Units some other description

2. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 Units some other description

3. PartNo-001A description QtyOrd 20 Units some other description
4. PartNo-001A description QtyOrd 20 Units
5. PartNo-001A QtyOrd 20 Units
6. PartNo-001A QtyOrd 20 Units

7. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 Units some other description PartNo-001A

8. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
PartNo-001A
QtyOrd 
20
units

您可以使用

import re
text = r'PartNo-001A description 20 units some other description'
pattern = re.compile(r'b(?:qtyords*(d+)(?:s*units)?|(d+)s*units)b', re.I)
Text_Final = pattern.sub(r'QtyOrd 12 units', text)
print(Text_Final)
# => PartNo-001A description QtyOrd 20 units some other description

请参阅regex演示和Python演示。

详细信息

  • b-字边界
  • (?:-非捕获组的启动
    • qtyord
    • s*-零个或多个空白
    • (d+)-组1(从替换模式中用1表示):一个或多个数字
    • (?:s*units)?-一个可选的组,匹配零个或多个空格和units字(如果您还想匹配单数形式的unit,请在s之后添加?)
  • |-或
    • (d+)-组2(从替换模式中用2表示):一个或多个数字
    • s*units-零个或多个空白和units字(如果您还想以单数形式匹配unit,请在s之后添加?)
  • )-组结束
  • b-字边界

假设text8的预期结果是OrderQty 20 Units,不是QtyOrd 20 units,你能试试吗:

def process_QtyOrd(text):
m = re.match(r'^(.*?)(qtyords*d+s*units|d+s+units|qtyords+d+)(.*)$', text, flags=re.IGNORECASE|re.DOTALL)
if m:
str= re.sub(r'D*(d+)D*', r'OrderQty 1 Units', m.group(2))
text = m.group(1) + str + m.group(3)
return text
  • re.match将输入text分解为三个子字符串:我们要修改的感兴趣的部分,前导子字符串,以及尾部子字符串
  • 我使用了s而不是(空白)来匹配换行符
  • 感兴趣的部分由m.groups(2)捕获。然后我们可以用CCD_ 30函数进行修改
  • 最后的text是上面的子串的级联

[更新]

你能试试下面的吗:

def process_QtyOrd(text):
text = re.sub(r'(qtyords*d+s*units|d+s+units|qtyords+d+)', lambda m: re.sub(r'D*(d+)D*', r'QtyOrd 1 Units', m.group(1)), text, flags=re.IGNORECASE|re.DOTALL)
return text

现在,如果文本包含多个模式,它将起作用。我已将替换文本中的单词OrderQty更改为QtyOrd

  • 我使用了re.sub()的功能是一次替换文本中出现的所有模式
  • 引入lambda函数是为了将更换零件评估为表达

[Update2]

如果您想包括十进制数字,请尝试以下操作:

def process_QtyOrd(text):
text = re.sub(r'(qtyords*d+(?:.d+)?s*units|d+(?:.d+)?s+units|qtyords+d+(?:.d+)?)', lambda m: re.sub(r'D*(d+(?:.d+)?)D*', r'QtyOrd 1 Units', m.group(1)), text, flags=re.IGNORECASE|re.DOTALL)
return text

其概念是用d+(?:.d+)?替换d+,CCD_37匹配后面跟着可选点和数字的数字。

最新更新