我在Python中使用Regex在Numeric quantity值前后添加数量标识符。
基本上,我必须添加QtyOrd和单位单词,这些单词不在数字量之后的文本中。
例如:
'PartNo-001A description 20 units some other description' => 'PartNo-001A description QtyOrd 20 units some other description'
'PartNo-001A description QtyOrd 20 some other description' => 'PartNo-001A description QtyOrd 20 units some other description'
'PartNo-001A description QtyOrd 20' => 'PartNo-001A description QtyOrd 20 units'
'PartNo-001A QtyOrd 20' => 'PartNo-001A QtyOrd 20 units'
'PartNo-001A 20 units'=> 'PartNo-001A QtyOrd 20 units'
我使用的代码如下:
import re
def process_QtyOrd( text):
for x in re.findall("(qtyord [0-9]+ units| [0-9]+ units|qtyord [0-9]+|qtyord[0-9]+units )", text.lower()):
Text_Intermediate = "OrderQty " + str(re.search("[0-9]+", x).group()) + " Units"
Text_Final = re.sub("(qtyord [0-9]+ units|[0-9]+ units|qtyord [0-9]+|qtyord [0-9]+ units)", Text_Intermediate, text, flags= re.IGNORECASE)
return Text_Final
text1 = 'PartNo-001A description 20 units some other description'
text2 = '''
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 units some other description
'''
text3 = 'PartNo-001A description QtyOrd 20 some other description'
text4 = 'PartNo-001A description QtyOrd 20'
text5 = 'PartNo-001A QtyOrd 20'
text6 = 'PartNo-001A 20 units'
text7 = '''
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 units some other description PartNo-001A
'''
text8 = '''
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
PartNo-001A
QtyOrd
20
'''
然后:
print(process_QtyOrd(text1))
print(process_QtyOrd(text2))
print(process_QtyOrd(text3))
print(process_QtyOrd(text4))
print(process_QtyOrd(text5))
print(process_QtyOrd(text6))
print(process_QtyOrd(text7))
print(process_QtyOrd(text8))
对于text8
,代码不起作用。你能帮我一下吗?
输出应该是这样的:
1. PartNo-001A description QtyOrd 20 Units some other description
2. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 Units some other description
3. PartNo-001A description QtyOrd 20 Units some other description
4. PartNo-001A description QtyOrd 20 Units
5. PartNo-001A QtyOrd 20 Units
6. PartNo-001A QtyOrd 20 Units
7. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 Units some other description PartNo-001A
8. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
PartNo-001A
QtyOrd
20
units
您可以使用
import re
text = r'PartNo-001A description 20 units some other description'
pattern = re.compile(r'b(?:qtyords*(d+)(?:s*units)?|(d+)s*units)b', re.I)
Text_Final = pattern.sub(r'QtyOrd 12 units', text)
print(Text_Final)
# => PartNo-001A description QtyOrd 20 units some other description
请参阅regex演示和Python演示。
详细信息:
b
-字边界(?:
-非捕获组的启动qtyord
s*
-零个或多个空白(d+)
-组1(从替换模式中用1
表示):一个或多个数字(?:s*units)?
-一个可选的组,匹配零个或多个空格和units
字(如果您还想匹配单数形式的unit
,请在s
之后添加?
)
|
-或(d+)
-组2(从替换模式中用2
表示):一个或多个数字s*units
-零个或多个空白和units
字(如果您还想以单数形式匹配unit
,请在s
之后添加?
)
)
-组结束b
-字边界
假设text8的预期结果是OrderQty 20 Units
,不是QtyOrd 20 units
,你能试试吗:
def process_QtyOrd(text):
m = re.match(r'^(.*?)(qtyords*d+s*units|d+s+units|qtyords+d+)(.*)$', text, flags=re.IGNORECASE|re.DOTALL)
if m:
str= re.sub(r'D*(d+)D*', r'OrderQty 1 Units', m.group(2))
text = m.group(1) + str + m.group(3)
return text
re.match
将输入text
分解为三个子字符串:我们要修改的感兴趣的部分,前导子字符串,以及尾部子字符串- 我使用了
s
而不是(空白)来匹配换行符
- 感兴趣的部分由
m.groups(2)
捕获。然后我们可以用CCD_ 30函数进行修改 - 最后的
text
是上面的子串的级联
[更新]
你能试试下面的吗:
def process_QtyOrd(text):
text = re.sub(r'(qtyords*d+s*units|d+s+units|qtyords+d+)', lambda m: re.sub(r'D*(d+)D*', r'QtyOrd 1 Units', m.group(1)), text, flags=re.IGNORECASE|re.DOTALL)
return text
现在,如果文本包含多个模式,它将起作用。我已将替换文本中的单词OrderQty
更改为QtyOrd
。
- 我使用了
re.sub()
的功能是一次替换文本中出现的所有模式 - 引入lambda函数是为了将更换零件评估为表达
[Update2]
如果您想包括十进制数字,请尝试以下操作:
def process_QtyOrd(text):
text = re.sub(r'(qtyords*d+(?:.d+)?s*units|d+(?:.d+)?s+units|qtyords+d+(?:.d+)?)', lambda m: re.sub(r'D*(d+(?:.d+)?)D*', r'QtyOrd 1 Units', m.group(1)), text, flags=re.IGNORECASE|re.DOTALL)
return text
其概念是用d+(?:.d+)?
替换d+
,CCD_37匹配后面跟着可选点和数字的数字。