如何在python中遍历多个子集,执行操作并将结果带到原始数据帧?



我有一个包含数百万行和大约 100k 个唯一 ID 号的数据帧。我想按唯一 ID 执行操作。现在,我为每个唯一 ID 生成一个子集,并相应地执行一些操作。此循环有效。但是,如何有效地将子集合并到一个数据帧中呢?

也许有一种更有效的方法可以对唯一 ID 的子集执行操作。

谢谢

for ID in np.unique(df_fin['ID']):
ID_subset = df_fin.loc[df_fin['ID'] == ID]
for i in ID_subset.index:
if ID_subset['date_diff'][i] > 0:
for p in range(0,ID_subset['date_diff'][i]):
if p == WIP:
sl.appendleft(ID_subset.return_bin[i-1])
else:
sl.appendleft(0)
lissa = list(sl)
ID_subset.at[i,'list_stock'] = lissa
frames = [ID_subset] #this does not work
final_mod = pd.concat(frames) #this also does not work

这是有效的: 我也尝试了使用groupby.apply。请参阅下面的代码。

def create_stocklist(x):
x['date_diff'] = x['dates'] - x['dates'].shift()
x['date_diff'] = x['date_diff'].fillna(0)
x['date_diff'] = (x['date_diff'] / np.timedelta64(1, 'D')).astype(int)
x['list_stock'] = x['list_stock'].astype(object)
x['stock_new'] = x['stock_new'].astype(object)
var_stock = DOS*[0]
sl = deque([0],maxlen=DOS)
for i in x.index:
if x['date_diff'][i] > 0:
for p in range(0,x['date_diff'][i]):
if p == WIP:
sl.appendleft(x.return_bin[i-1])
else:
sl.appendleft(0)
lissa = list(sl)
x.at[i,'list_stock'] = lissa
return x
df_fin.groupby(by=['ID']).apply(create_stocklist)

一种方法可以是:

for g, _id in df_din.groupby(by=['ID']):
# do stuff with g

g是一个包含所有行的数据帧,以便df_fin['ID'] == _id

最新更新