如何查找特定文本并打印其后接下来的 2 个单词

我的代码如下。

我目前有一个 if 语句可以找到一个特定的单词，在本例中为"成分"。

下一个而不是print("true")我需要打印"成分"中的下一个 2 个单词/字符串。此单词/字符串在图像中出现一次("成分"(。

例如，我运行.py文件，如果我将其包含在脚本中，这就是我的输出：print(text)

Ground Almonds
INGREDIENTS: Ground Almonds(100%).
1kg

我只需要重新编码本节：

if 'INGREDIENTS' in text:
print("True")
else:
print("False")

所以输出是这样的：

INGREDIENTS: Ground Almonds

因为接下来的两个单词/字符串是Ground和Almonds

蟒蛇代码

from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:UsersgziAppDataRoamingPythonPython37site-packagestesseract.exe'
img=Image.open('C:/Users/gzi/Desktop/work/lux.jpg')
text = pytesseract.image_to_string(img, lang = 'eng')

if 'INGREDIENTS' in text:
print("True")
else:
print("False")

如果您不关心百分比并希望避免regex：

string = 'INGREDIENTS: Ground Almonds(100%).'
tokens = string.split()
for n,i in enumerate(tokens):
if 'INGREDIENTS' in i:
print(' '.join(tokens[n:n+3]))

输出：

INGREDIENTS: Ground Almonds(100%).

使用正则表达式查找所有匹配项：

import re
txt = "INGREDIENTS: Ground Almonds("100");"
x = re.findall("INGREDIENTS:s(w+)s(w+)", txt)
print(x)
# [('Ground', 'Almonds')]

因此，假设我们提取了以下文本，请使用pytesseract：

text = '''Ground Almonds
INGREDIENTS: Ground Almonds(100%).
1kg'''

我们可以通过以下方式达到预期的结果：

first_index = text.find('INGREDIENTS')
second_index = text.find('(')
my_string = f'{text[first_index:second_index]}'
print(my_string)

输出为：

INGREDIENTS: Ground Almonds

因此，在代码片段中，我们使用find方法来定位INGREDIENTS单词和(符号(假设它总是在主要成分之后，指定它的百分比(。

然后，我们对上述索引使用string切片并打印结果，并使用f-string将其格式化为所需的输出。

相关内容

最新更新

热门标签：