使用熊猫来过滤数据框，以获取最受欢迎的因素

假设我有一个数据集，其中有熊猫中的因子，并且我有'a'ta t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t t o'a'a'，'b'和'c'有30个观察结果剩下的只有5。在此数据框架中还有其他列，但我只关心这一列因素（我们称其为 factor1）。

我在熊猫中使用什么操作来过滤此数据框架，以便数据框架中唯一的行是具有超过20个观测值的因素？如果我想要数据框中factor1的前3个最流行的因素，我将使用什么操作？

编辑：这是一组有限的代码

data = {'factor1':['A','A','A', 'B', 'B', 'B', 'C','C', 'D'], 'factor2':['apple','apple','apple','apple','apple','apple','orange','orange','orange'], 'response':range(9)}
df = pandas.DataFrame(data)

我如何过滤df，以使factor1具有大于5（或n或其他任何真正）的最流行的三个最流行的因素或因素

尝试最流行的三个最受欢迎的因素：

N = 3
handy = df.groupby('factor1')['factor1'].count()
handy.sort('factor1',ascending=False)
topNFactors = handy.head(N)
print topNFactors
dataOfTopNFactors = df[df['factor1'].map(lambda x: x in topNFactors)]
print dataOfTopNFactors

或对的尝试至少2 ：

M = 2
handy = df.groupby('factor1')['factor1'].count()
minimumValueMFactors = handy[handy>=M]
dataOfMinimumValueMFactors = df[df['factor1'].isin(minimumValueMFactors.index)]
print dataOfMinimumValueMFactors

相关内容

最新更新

热门标签：