在Python中从句子中提取空格分隔的单词



我有一个字符串列表,x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']我需要从几个句子中提取出x15。

我的句子是"eskimo lives as a wild man in wild jungle and he stands as a guard".在这个句子中,我需要提取第一个单词爱斯基摩人和第七和第八个单词野人,它们是独立的单词,如x1。我不应该提取"stands"即使sta存在于stand中

def get_name(input_str):
prod_name= []
for row in x1:
if (row.strip().lower()in input_str.lower().strip()) or (len([x for x in input_str.split() if "b"+x in row])>0):
prod_name.append(row) 
return list(set(prod_name))

函数get_name("eskimo lives as a wild man in wild jungle and he stands as a guard")返回

[esk, eskimo,wild man,sta]

但是期望是

[eskimo,wild man]

我可以知道代码中有什么需要修改的吗?

我有一个稍微不同的方法。首先,你可以把输入的句子分成单词,也可以把你想要检查的每个短语分成组成单词。然后检查一个短语的所有单词是否都出现在句子中。

x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
input_sentence = "eskimo lives as a wild man in wild jungle and he stands as a guard"
# Remove all punctuation marks from the sentence
input_sentence = input_sentence.replace('!', '').replace('.', '').replace('?', '').replace(',', '')
# Split the input sentence into its component words to check individually
input_words = input_sentence.split()
for ele in x1:
# Split each element in x1 into words
ele_words = ele.split()
# Check if all words are part of the input words
if all(ele in input_words for ele in ele_words) and ele in input_sentence:
print(ele)

您可以简单地使用str.split(")获取句子中所有单词的列表,然后执行以下操作:

s = "eskimo lives as a wild man in wild jungle and he stands as a guard"
l = s.split(" ")
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
new_x1 = [word.split(" ") for word in x1 if " " in word] + [word for word in x1 if " " not in word]
ans = []
for x in new_x1:
if type(x) == str:
if x in l:
ans.append(x)
else:
temp = ""
for i in x:
temp += i + " "
temp = temp[:-1]
if all(sub_x in l for sub_x in x) and temp in s:
ans.append(temp)
print(ans)

可以使用正则表达式

import re
x1 = ['esk','wild man','eskimo', 'sta']
my_str = "eskimo lives as a wild man in wild jungle and he stands as a guard"
my_list = []
for words in x1:
if re.search(r'b' + words + r'b', my_str):
my_list.append(words)
print(my_list)

根据新的列表,因为字符串(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa产生一个错误与正则表达式你可以使用tryexcept

for words in x1:
try:
if re.search(r'b' + words + r'b', my_str):
my_list.append(words)
except:
pass

您可以使用在左侧(?<!S)和右侧(?!S)上带有空白边界的regex来不获得部分匹配,并连接x1列表中的所有项。

然后使用re.findall获取所有匹配项:

import re
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
s = "eskimo lives as a wild man in wild jungle and he stands as a guard"
pattern = fr"(?<!S)(?:{'|'.join(re.escape(x) for x in x1)})(?!S)"
print(re.findall(pattern, s))

输出
['eskimo', 'wild man']

查看Python演示。

最新更新