有效识别字符串的一部分是否在列表/字典键中



我在列表中有很多(>100,000(小写字符串,其中子集可能如下所示:

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

我还有一个这样的字典(实际上这将有大约 ~1000 的长度(:

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

对于列表中包含任何字典键的所有字符串,我想用相应的字典值替换整个字符串。因此,预期成果应为:

str_list = ["dk", "us", "nothing here"]

考虑到我拥有的字符串数量和字典的长度,最有效的方法是什么?

额外信息:字符串中永远不会有多个字典键。

似乎是一个很好的方法:

input_strings = ["hello i am from denmark",
                 "that was in the united states",
                 "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
output_strings = []
for string in input_strings:
    for key, value in dict_x.items():
        if key in string:
            output_strings.append(value)
            break
    else:
        output_strings.append(string)
print(output_strings)

假设:

lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

你可以做:

res = [dict_x.get(next((k for k in dict_x if k in my_str), None), my_str) for my_str in lst]

返回:

print(res)  # -> ['dk', 'us', 'nothing here']

关于这一点很酷的事情(除了它是蟒蛇忍者最喜欢的武器又名列表理解(是默认值为 my_strgetStopIteration值为 Nonenext触发上述默认值。

这样的事情会起作用。请注意,这会将字符串转换为符合条件的第一个遇到的键。如果有多个,您可能希望根据适合您的用例的任何内容修改逻辑。

strings = [str1, str2, str3]
converted = []
for string in strings:
    updated_string = string
    for key, value in dict_x.items()
        if key in string:
            updated_string = value
            break
    converted.append(updated_string)
print(converted)

尝试

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
for k, v in dict_x.items():
    for i in range(len(str_list)):
        if k in str_list[i]:
            str_list[i] = v
print(str_list)

这将遍历字典中的键、值对,并查看键是否在字符串中。如果是,它将字符串替换为值。

您可以子类dict并使用列表推导。

在性能方面,我建议您尝试几种不同的方法,看看哪种方法效果最好。

class dict_contains(dict):
    def __getitem__(self, value):
        key = next((k for k in self.keys() if k in value), None)
        return self.get(key)
str1 = "hello i am from denmark"
str2 = "that was in the united states"
str3 = "nothing here"
lst = [str1, str2, str3]
dict_x = dict_contains({"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"})
res = [dict_x[i] or i for i in lst]
# ['dk', 'us', "nothing here"]

最新更新