我正在将一个函数传递给一个groupby
操作,该操作返回一个list
,该在apply
中转换为系列,并最终将系列分配回主数据帧。
df_SOT = pd.DataFrame({'Lane': {26055: 'L2', 26056: 'L2', 26057: 'L2', 26058: 'L2', 26059: 'L2', 25972: 'L1', 25973: 'L1', 25974: 'L1', 25975: 'L1', 25976: 'L1'}, 'Carrier SCAC': {26055: 'JNJR', 26056: 'WOSQ', 26057: 'BGME', 26058: 'ITSB', 26059: 'UCSB', 25972: 'BGME', 25973: 'SCNN', 25974: 'XPOL', 25975: 'SJRG', 25976: 'MTRK'}, 'Annual Volume': {26055: 5604.0, 26056: 5604.0, 26057: 5604.0, 26058: 5604.0, 26059: 5604.0, 25972: 4917.0, 25973: 4917.0, 25974: 4917.0, 25975: 4917.0, 25976: 4917.0}, 'Annual Capacity': {26055: 260.0, 26056: 1300.0, 26057: 2704.0, 26058: 2080.0, 26059: 4368.0, 25972: 5408.0, 25973: 3380.0, 25974: 4940.0, 25975: 156.0, 25976: 4940.0}})
功能
def allocation(df_alloc):
Annual_Volume = df_alloc['Annual Volume']
Annual_Capacity = df_alloc['Annual Capacity']
Allocation = []
Cum_Capacity = 0
for idx in df_alloc.index:
Allocate = (min(0.5*Annual_Volume[idx],Annual_Capacity[idx],Annual_Volume[idx]-Cum_Capacity))
Cum_Capacity+=Allocate
Allocation.append(Allocate)
return Allocation
分组操作:
df_SOT.groupby('Lane').apply(allocation)
>>>Ouput
Lane
L1 [2458.5, 2458.5, 0.0, 0.0, 0.0]
L2 [260.0, 1300.0, 2704.0, 1340.0, 0.0]
dtype: object
df_SOT['Allocation] = df_SOT.groupby('Lane').apply(allocation)
但它返回为 NaN。不知道我在这里做错了什么。
Lane | Carrier SCAC | 年量 | 年度容量 | 分配 | |
---|---|---|---|---|---|
26055 | L2 | JNJR | 5604.0 260.0 | NaN | |
26056 | L2 | WOSQ | 5604.0 1300.0 | NaN | |
26057 | L2 | BGME | 5604.02704.0 | NaN | |
26058 | L2 | ITSB | 5604.0 2080.0 | NaN | |
26059 | L2 | UCSB | 5604.0 4368.0 | NaN | |
25972 | L1 | BGME | 4917.0 | 5408.0 | NaN |
25973 | L1 | SCNN | 4917.0 | 3380.0 | NaN |
25974 | L1 | XPOL | 4917.0 4940.0 | NaN | |
25975 | L1 | SJRG | 4917.0 | 156.0 | NaN |
25976 | L1 | MTRK | 4917.0 4940.0 | NaN |
一种替代方法是allocation
返回具有正确索引的熊猫系列而不是简单的列表,并将group_keys
设置为groupby
中的False
:
def allocation(df_alloc):
Annual_Volume = df_alloc["Annual Volume"]
Annual_Capacity = df_alloc["Annual Capacity"]
Allocation = []
Cum_Capacity = 0
for idx in df_alloc.index:
Allocate = min(
0.5 * Annual_Volume[idx],
Annual_Capacity[idx],
Annual_Volume[idx] - Cum_Capacity,
)
Cum_Capacity += Allocate
Allocation.append(Allocate)
return pd.Series(Allocation, index=df_alloc.index)
df_SOT["Allocation"] = df_SOT.groupby("Lane", group_keys=False).apply(allocation)