将按操作的系列分配给数据帧



我正在将一个函数传递给一个groupby操作,该操作返回一个list,该在apply中转换为系列,并最终将系列分配回主数据帧。

df_SOT = pd.DataFrame({'Lane': {26055: 'L2', 26056: 'L2', 26057: 'L2', 26058: 'L2', 26059: 'L2', 25972: 'L1', 25973: 'L1', 25974: 'L1', 25975: 'L1', 25976: 'L1'}, 'Carrier SCAC': {26055: 'JNJR', 26056: 'WOSQ', 26057: 'BGME', 26058: 'ITSB', 26059: 'UCSB', 25972: 'BGME', 25973: 'SCNN', 25974: 'XPOL', 25975: 'SJRG', 25976: 'MTRK'}, 'Annual Volume': {26055: 5604.0, 26056: 5604.0, 26057: 5604.0, 26058: 5604.0, 26059: 5604.0, 25972: 4917.0, 25973: 4917.0, 25974: 4917.0, 25975: 4917.0, 25976: 4917.0}, 'Annual Capacity': {26055: 260.0, 26056: 1300.0, 26057: 2704.0, 26058: 2080.0, 26059: 4368.0, 25972: 5408.0, 25973: 3380.0, 25974: 4940.0, 25975: 156.0, 25976: 4940.0}})

功能

def allocation(df_alloc):
Annual_Volume = df_alloc['Annual Volume']
Annual_Capacity = df_alloc['Annual Capacity']
Allocation = []
Cum_Capacity = 0
for idx in df_alloc.index:
Allocate = (min(0.5*Annual_Volume[idx],Annual_Capacity[idx],Annual_Volume[idx]-Cum_Capacity))
Cum_Capacity+=Allocate
Allocation.append(Allocate)
return Allocation

分组操作:

df_SOT.groupby('Lane').apply(allocation)
>>>Ouput
Lane
L1         [2458.5, 2458.5, 0.0, 0.0, 0.0]
L2    [260.0, 1300.0, 2704.0, 1340.0, 0.0]
dtype: object
df_SOT['Allocation] = df_SOT.groupby('Lane').apply(allocation)

但它返回为 NaN。不知道我在这里做错了什么。

5604.0
LaneCarrier SCAC年量年度容量分配
26055L2JNJR5604.0 260.0NaN
26056L2WOSQ5604.0 1300.0NaN
26057L2BGME2704.0NaN
26058L2ITSB5604.0 2080.0NaN
26059L2UCSB5604.0 4368.0NaN
25972L1BGME4917.05408.0NaN
25973L1SCNN4917.03380.0NaN
25974L1XPOL4917.0 4940.0NaN
25975L1SJRG4917.0156.0NaN
25976L1MTRK4917.0 4940.0NaN

一种替代方法是allocation返回具有正确索引的熊猫系列而不是简单的列表,并将group_keys设置为groupby中的False

def allocation(df_alloc):
Annual_Volume = df_alloc["Annual Volume"]
Annual_Capacity = df_alloc["Annual Capacity"]
Allocation = []
Cum_Capacity = 0
for idx in df_alloc.index:
Allocate = min(
0.5 * Annual_Volume[idx],
Annual_Capacity[idx],
Annual_Volume[idx] - Cum_Capacity,
)
Cum_Capacity += Allocate
Allocation.append(Allocate)
return pd.Series(Allocation, index=df_alloc.index)

df_SOT["Allocation"] = df_SOT.groupby("Lane", group_keys=False).apply(allocation)

相关内容

  • 没有找到相关文章

最新更新