假设我有咖啡店菜单列表。我想接收文本并返回数量和项目名称。
menu = ['Cappuccino','Café Latte','Expresso','Macchiato ','Irish coffee ']
现在我想从我的菜单中提取编号和订购项目名称匹配(菜单中的任何第一个匹配(
示例文本:携带1个Capputino
输出数据帧:
text Quantity match
Bring 1 Capputino 1 Cappuccino
不需要的文本输入拼写将与菜单完全相同,所以它将只返回匹配列中菜单列表中的匹配模式。
我已经写了下面的代码,但它在匹配栏中返回了楠。感谢您的指导。
代码:
import pandas as pd
import numpy as np
import re
def ccd():
global df
menu = ['Cappuccino','Café Latte','Expresso','Macchiato ','Irish coffee ']
for i in range(len(menu)):
menu[i] = menu[i].upper()
order = input('Enter a substring: ').upper()
args_dict = {'CAPUCINO':'CAPPUCCINO',
"COFFI":"COFFEE",
"COOKI":"COOKIE" }
#order=order.split()
for i,j in enumerate(order):
if j in args_dict:
order[i]=args_dict[j]
df = pd.DataFrame({'text':[order]})
df["Quantity"] = df.text.str.extract('(d+)')
df['match'] = df.text.str.extract('(' + '|'.join(menu) + ')')
查看以下内容:
import re
menu_map = {'cap': 'Cappucino',
'caf': 'Café Latte',
"cof": "Irish coffee",
"cok": "Cookie",
"cook": "Cookie"}
order = input('Enter a substring: ')
df = pd.DataFrame({'Text': [order]})
df["Quantity"] = df.Text.str.extract('(d+)')
df['Match'] = df.Text.str.extract('(' + '|'.join(menu_map) + ')', flags=re.IGNORECASE)
df['Replacement'] = df.Match.str.casefold().map(menu_map)
order == 'Bring 1 Caputino'
的结果
Text Quantity Match Replacement
0 Bring 1 Caputino 1 Cap Cappucino
和order == 'Bring 1 Caxutino'
Text Quantity Match Replacement
0 Bring 1 Caxutino 1 NaN NaN
因为在CCD_ 3中没有捕获CCD_ 4的模式。
在我看来,这就是你想要的?既然你不想要Replacement
列(我只是为了透明而使用它(,你就可以这样做:
df['Match'] = df.Text.str.extract('(' + '|'.join(menu_map) + ')', flags=re.IGNORECASE)
df.Match = df.Match.str.casefold().map(menu_map)
(我不明白你想用for ... if ...
部分实现什么。(
EDIT:现在我了解了for ... if ...
部分,我提出以下方法:
args_dict = {'capu': 'Cappuccino', 'chap': 'Cappuccino',
'cof': 'Coffee', 'coof': 'Coffee', 'chof': 'Coffee',
'cok': 'Cookie', 'chok': 'Cookie', 'choo': 'Cookie'}
order = order.split()
for i, word in enumerate(order):
word = word.casefold()
for key in args_dict:
if word.startswith(key):
order[i] = args_dict[key]
break
order = ' '.join(order)
或者:
args_dict = {('capu', 'chap'): 'Cappuccino',
('cof', 'coof', 'chof'): 'Coffee',
('cok', 'chok', 'choo'): 'Cookie'}
order = order.split()
for i, word in enumerate(order):
word = word.casefold()
for keys, replacement in args_dict.items():
for key in keys:
if word.startswith(key):
order[i] = replacement
break
order = ' '.join(order)