我有两个列表。一个是语言列表,第二个是字符串列表。我想搜索文本列表中是否存在任何语言,并将其(找到的语言(附加到新列表中,否则,将"english"
附加到该新列表中。
languages = ['afrikaans', 'russian', 'amharic', 'japanese', 'armenian', 'polish', ...]
texts = ['apple', 'orange in polish', 'grape in russian']
所需输出:
['english', 'polish', 'russian']
我第一次尝试这些行,但它返回['polish', 'russian']
!
list_of_valid_langs = []
for lang in langs:
for text in texts:
if lang in text:
list_of_valid_langs.append(lang)
对于我的第二次尝试,我添加了第二个条件,但这不是我需要的
list_of_valid_langs = []
for lang in langs:
for text in texts:
if lang in text:
list_of_valid_langs.append(lang)
elif lang not in text:
list_of_valid_langs.append('english')
我认为您的错误是首先迭代语言,然后迭代文本。让我们试着翻转一下:
for text in texts:
for lang in langs:
if lang in text:
list_of_valid_langs.append(lang)
break # lang is found, no need to keep searching
else: # if no lang was found, append 'english'
list_of_valid_langs.append('english')
在看到@fsimonjetz的答案后,我发现了一个使用集合的更好的解决方案:
# first of all, turn langs into a set
langs = set(langs)
# iterate over the texts
for text in texts:
# check if one of the words in the text is a language
for word in text.split():
if word in langs:
# if a language is found, append it and break
list_of_valid_langs.append(word)
break
else:
# if no language is found, append 'english'
list_of_valid_langs.append('english')
关于for else
的注意事项:for循环中的代码照常运行,但else块中的代码只有在for循环正常退出时才运行。另一种方法是,else块只有在未到达break语句时才运行
如果需要,可以使用bool
变量将for else
替换为一个普通的for循环,然后是一个If块。
这应该有效:
for text in texts:
lang_found = False
for lang in langs:
if lang in text:
list_of_valid_langs.append(lang)
lang_found = True
if not lang_found:
list_of_valid_langs.append('english')
我认为Roy Cohen的答案是解决您问题的完美方案,但我想建议一种使用集交集的更有效的替代方案:
languages = set(['afrikaans', 'russian', 'amharic', 'japanese', 'armenian', 'polish'])
texts = ['apple', 'orange in polish', 'grape in russian']
list_of_valid_langs = []
for t in texts:
# this will return the set of the language(s) occurring in the
# string if there are any, otherwise it returns {'english'}
lang = set(t.split()).intersection(languages) or {'english'}
# pop the element from the set and append to the list
list_of_valid_langs.append(lang.pop())