我有以下列表,记录随机对象的计数频率:
counter_obj= [('oranges', 66), ('apple', 13), ('banana', 13), ('pear', 12), ('strawberry', 10), ('watermelon', 10), ('avocado', 8) ... ('blueberry',1),('pineapple',1)]
我试图通过从每个等级四分位数中随机选择两个对象来选择八个元素。
对于第一个(25%(四分位数,我尝试了以下操作:
from collections import Counter
dct = {('oranges', 66), ('apple', 13), ('banana', 13), ('pear', 12), ('strawberry', 10), ('watermelon', 10), ('avocado', 8) ... ('blueberry',1),('pineapple',1)}
[tup for tup in Counter(dct).most_common(len(dct)//4)] # 25th percentile by frequency count
知道我有很多值为1(它们只出现一次(,我该如何处理剩下的两个四分位数50%和75%
我的原始数据条形图:条形图来自我的原始数据
我会使用panda来解决这个问题:
import pandas as pd
dct = {('oranges', 66), ('apple', 13), ('banana', 13), ('pear', 12), ('strawberry', 10), ('watermelon', 10), ('avocado', 8) , ('blueberry',1),('pineapple',1)}
df = pd.DataFrame(dct, columns = ['Fruit','Count']) # convert to DataFrame
select = []
for quant in [.25,.5,.75,1]:
curr_q = df['Count'].quantile(quant) # this calculates the quantile value
curr_choice = df[df['Count']<=curr_q].sample(2) # this selects all rows of your dataframe within current quantile, then samples two of these rows
select.append(curr_choice)
select = pd.concat(select).reset_index(drop=True) # concatenates the selected rows to get a nice dataframe, resets the indices.