有效识别字符串的一部分是否在列表/字典键中

我在列表中有很多(>100,000(小写字符串，其中子集可能如下所示：

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

我还有一个这样的字典(实际上这将有大约 ~1000 的长度(：

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

对于列表中包含任何字典键的所有字符串，我想用相应的字典值替换整个字符串。因此，预期成果应为：

str_list = ["dk", "us", "nothing here"]

考虑到我拥有的字符串数量和字典的长度，最有效的方法是什么？

额外信息：字符串中永远不会有多个字典键。

这

似乎是一个很好的方法：

input_strings = ["hello i am from denmark",
                 "that was in the united states",
                 "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
output_strings = []
for string in input_strings:
    for key, value in dict_x.items():
        if key in string:
            output_strings.append(value)
            break
    else:
        output_strings.append(string)
print(output_strings)

假设：

lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

你可以做：

res = [dict_x.get(next((k for k in dict_x if k in my_str), None), my_str) for my_str in lst]

print(res)  # -> ['dk', 'us', 'nothing here']

关于这一点很酷的事情(除了它是蟒蛇忍者最喜欢的武器又名列表理解(是默认值为 my_str 的get和StopIteration值为 None 的next触发上述默认值。

这样的事情会起作用。请注意，这会将字符串转换为符合条件的第一个遇到的键。如果有多个，您可能希望根据适合您的用例的任何内容修改逻辑。

strings = [str1, str2, str3]
converted = []
for string in strings:
    updated_string = string
    for key, value in dict_x.items()
        if key in string:
            updated_string = value
            break
    converted.append(updated_string)
print(converted)

尝试

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
for k, v in dict_x.items():
    for i in range(len(str_list)):
        if k in str_list[i]:
            str_list[i] = v
print(str_list)

这将遍历字典中的键、值对，并查看键是否在字符串中。如果是，它将字符串替换为值。

您可以子类dict并使用列表推导。

在性能方面，我建议您尝试几种不同的方法，看看哪种方法效果最好。

class dict_contains(dict):
    def __getitem__(self, value):
        key = next((k for k in self.keys() if k in value), None)
        return self.get(key)
str1 = "hello i am from denmark"
str2 = "that was in the united states"
str3 = "nothing here"
lst = [str1, str2, str3]
dict_x = dict_contains({"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"})
res = [dict_x[i] or i for i in lst]
# ['dk', 'us', "nothing here"]

相关内容

最新更新

热门标签：