按级别对加权值进行分组和排序



我有由几个值组成的数据。每个值都有一个正权重和一个秩。我想返回这些值的子列表,这样通过增加列表的排名来添加值,直到它们的权重之和尽可能小,大于或等于给定的目标,并附加一个约束,即如果添加了一个值,则必须添加所有具有相同排名的值,即按排名组将值添加到列表中。

我在下面写了一篇文章,它很慢,不太容易阅读。输入是一个元组列表(我们可以假设它已经按秩排序(;每个元组包含秩、值和权重。所有需要返回的只是保留值的列表及其总重量。该算法按秩对数据进行分组,然后计算出要添加多少秩才能达到适当的累积权重。有什么加速和改进的建议吗?

对于输入列表[(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]和目标权重0.47,它返回子列表['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']和保持权重0.49138273560337803

weightGoal = 0.47 # Target weight

# Example of data; each tuple has (rank, value and weight)
individualData = [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]
print("Input", individualData, weightGoal)
# Determine cumulative weight for each value.
vCumulWeight = [0]
for ind in individualData: vCumulWeight.append(vCumulWeight[-1] + ind[-1])
vCumulWeight.pop(0)
# To group by rank    
ranks = list(set([v[0] for v in individualData])) # Unique values of the rank
ranks.sort(reverse=False) # Sort the list by rank
rankWeights = [] # Total weight of values with same rank
rankValues = [] # Values with same rank
# Group points by rank
for r in ranks:
rankW, rankV = 0, []
for ind in individualData:
if r == ind[0]:
rankW += ind[-1]
rankV.append(ind[1])
rankWeights.append(rankW)
rankValues.append(rankV)

# Get cumulative weight of each group
rankCumulWeight = [0]
for op in rankWeights: rankCumulWeight.append(rankCumulWeight[-1] + op)
rankCumulWeight.pop(0)
# Rank all values and all groups
fullRankingVals, fullRankingGroups, rnk, prevLen, prevCP = [], [], 0, 1, 0
for i, (o, pr, ov, cp) in enumerate(zip(ranks, rankWeights, rankValues, rankCumulWeight)):
fullRankingGroups.append((i+1, ov, o, pr, cp, ((cp <= weightGoal) or (prevCP < weightGoal))))
rnk += prevLen
for v in ov: fullRankingVals.append((rnk, v, o, pr/len(ov), cp, ((cp <= weightGoal) or (prevCP < weightGoal))))
prevLen = len(ov)
prevCP = cp
print("============ Check by value")
for v in fullRankingVals: print(v)
print("============ Check by group")
for v in fullRankingGroups: print(v)
print("============")
## Actual acceptance
tmpAccepted = [v for v in fullRankingVals if (v[-1])]
acceptedValues = [v[1] for v in tmpAccepted]
accdeptedWeight = max(v[-2] for v in tmpAccepted)
print("Result", acceptedValues, accdeptedWeight)

使用Pandas:

import pandas as pd
df = pd.DataFrame(l, columns=['rank', 'value', 'weight'])
df = df.groupby('rank').agg({'value': list, 'weight': sum})
m = ~((df.weight.cumsum() > 0.47).shift(
fill_value=False))
result = df[m]['value'].explode().to_list()
weight = df[m]['weight'].cumsum().iloc[-1]

我可以建议您采取尽可能简单的方法,只需小块地解决问题吗?

让我们把它分解为3个小步骤,这样你就可以了解实际发生的事情:

  1. 使用rank作为关键字按升序对数据进行排序
  2. 将数据添加到我们的storage,直到达到限制weightGoal
  3. 如果达到weightGoal,如果value与我们添加到storage的最后一个相同,则添加到我们的storage

现在让我们对上面的流程进行编码:

weightGoal = 0.47 # Target weight

# Example of data; each tuple has (rank, value and weight)
individualData = [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]
# step 1
individualData.sort(key=lambda x: x[0])
storage = 0
last = None
for data in individualData:
if storage < weightGoal:  # step 2
last = data
storage += data[2]
elif last and data[1] == last[1]:  # step 3
storage += data[2]
# no need to change `last` here unless you really want to
else:  # termination clause
break

现在,最终的summation存储在storage变量中,您还可以了解使用last变量最后添加的元素。

如果有任何歧义,请告诉我。

最新更新