根据等式过滤熊猫数据



大家好,我有这样一个问题。我需要根据等式过滤我的数据。是什么意思

例如,我有这样的数据帧:

tonnage period_year
5   2,462,297.5 2014
13  2,274,912.9 2015
19  2,181,492.2 2015
20  2,173,654.8 2016
21  2,158,043.7 2016
... ... ...
92885   5.0 2016
92886   5.0 2016
92901   5.0 2016
94814   0.0 2016
94861   0.0 2013

我有

data[data.tonnage > 0.02e6]['tonnage'].sum()/data.tonnage.sum() * 100.0

97.08690080799717

data[data.tonnage > 5e6]['tonnage'].sum()/data.tonnage.sum() * 100.0

18.5541547916532426

所以我需要找到的最大x

data[data.tonnage > x]['tonnage'].sum()/data.tonnage.sum() * 100.0

将给出等于或大于40 的答案

最好的方法是什么?

试试这个:

# Your sample input
df = pd.DataFrame({
'tonnage': [100,100,100,200,5,5,5,5,5]
})
# Get the sum of each unique value in `tonnage`
t = df.groupby('tonnage')['tonnage'].sum().sort_index(ascending=False)
# Since your requirement is "> x", we have to subtract the current value from the cumsum
ratio = (t.cumsum() - t) / t.sum() * 100
# And voila!
x = ratio[ratio >= 40].index[0]

最新更新