在字典列表中查找最常用字典的最佳方法

我有一个字典列表，其中每个字典都有关键字"shape"one_answers"color"。例如：

info = [
{'shape': 'pentagon', 'colour': 'red'},
{'shape': 'rectangle', 'colour': 'white'},
# etc etc
]

我需要找到最常见的形状/颜色组合。我决定通过在列表中找到最常见的词典来做到这一点。我把我的方法简化为：

frequency = defaultdict(int)
for i in info:
hashed = json.dumps(i) # Get dictionary, turn into string to store as key in frequency dict
frequency[hashed] += 1
most_common = max(frequency, key = frequency.get) # Get most common key and unhash it back into dict
print(json.loads(most_common))

我对python有点陌生，我总是会发现一些1-2行的函数，它们最终会做我最初想做的事情。我想知道在这种情况下是否有更快的方法？也许这最终会帮助另一个初学者，因为我在谷歌上搜索了很久都找不到任何东西。

如果列表中的项目具有一致的键，则更好的选择是使用namedtuple代替dict，例如：

from collections import namedtuple
# Define the named tuple
MyItem = namedtuple("MyItem", "shape colour")
# Create your list of data
info = [
MyItem('pentagon', 'red'),
MyItem('rectangle', 'white'),
# etc etc
]

这提供了许多好处：

# To instantiate
item = MyItem("pentagon", "red")
# or using keyword arguments
item = MyItem(shape="pentagon", colour="red")
# or from your existing dict
item = MyItem(**{'shape': 'pentagon', 'colour': 'red'})
# Accessors
print(item.shape)
print(item.colour)
# Decomposition
shape, colour = item

然而，回到计数匹配项目的问题，因为namedtuple是可散列的，所以可以使用collections.Counter，然后计数代码变为：

from collections import Counter
frequency = Counter(info)
# Get the items in the order of most common
frequency.most_common()

享受吧！

不要将dict转换为特定的字符串表示，而是从每个dict中获取所需的数据。制作一个由两个字符串值组成的元组，可以作为dict键进行哈希处理。
Python标准库提供collections.Counter用于此精确计数目的。

因此：

from collections import Counter
info = # ...
histogram = Counter((item['shape'], item['colour']) for item in info)
# the most_common method gives a list of the n most common items.
shape, colour = histogram.most_common(1)[0]
# re-assemble the dict, if desired, and print it.
print({'shape': shape, 'colour': colour})

使用panda会使问题变得更简单。

import pandas as pd
info = [
{'shape': 'pentagon', 'colour': 'red'},
{'shape': 'rectangle', 'colour': 'white'},
# etc etc
]
df = pd.DataFrame(info)
# to get the most commonly occurring shape
# to get the count of values
print (df['shape'].value_counts())
# to get the most commonly occurring value
print (df['shape'].value_counts().argmax())
#or
print (df['shape'].value_counts().idxmax())

为了获得最常见的颜色，只需将形状更改为颜色即可例如。print (df['shape'].value_counts())至print (df['colour'].value_counts())

不仅如此，熊猫还为您提供了许多其他很酷的内置功能。要了解更多，只需谷歌搜索熊猫，你就会拥有它。

注意：请在使用之前安装panda。

pip install pandas

或

pip3 install pandas

相关内容

最新更新

热门标签：