大家好,我有这样一个问题。我需要根据等式过滤我的数据。是什么意思
例如,我有这样的数据帧:
tonnage period_year
5 2,462,297.5 2014
13 2,274,912.9 2015
19 2,181,492.2 2015
20 2,173,654.8 2016
21 2,158,043.7 2016
... ... ...
92885 5.0 2016
92886 5.0 2016
92901 5.0 2016
94814 0.0 2016
94861 0.0 2013
我有
data[data.tonnage > 0.02e6]['tonnage'].sum()/data.tonnage.sum() * 100.0
97.08690080799717
data[data.tonnage > 5e6]['tonnage'].sum()/data.tonnage.sum() * 100.0
18.5541547916532426
所以我需要找到的最大x
data[data.tonnage > x]['tonnage'].sum()/data.tonnage.sum() * 100.0
将给出等于或大于40 的答案
最好的方法是什么?
试试这个:
# Your sample input
df = pd.DataFrame({
'tonnage': [100,100,100,200,5,5,5,5,5]
})
# Get the sum of each unique value in `tonnage`
t = df.groupby('tonnage')['tonnage'].sum().sort_index(ascending=False)
# Since your requirement is "> x", we have to subtract the current value from the cumsum
ratio = (t.cumsum() - t) / t.sum() * 100
# And voila!
x = ratio[ratio >= 40].index[0]