我有一个df如下-
a b c
x 2 3
y 2 3
z 3 2
w 1 5
(upto thousands of records)
我想根据b,c对这个数据框进行分组,这样每组只有n行。如果同一组中还有更多行,我想创建一个新组。这是主要问题陈述。如果可能的话,我还想从原始数据框架中删除这些组。
示例输出(有更多的解释)-
I basically want to loop on the df and am currently using the following code-
for x,y in df.groupby(['b','c']):
print(y)
With this code Im getting the following groups:
a b c
x 2 3
y 2 3
a b c
z 3 2
a b c
w 1 5
Now lets say I want only 1(n) row in each group, this is the output Im looking for:
a b c
x 2 3
a b c
y 2 3
a b c
z 3 2
a b c
w 1 5
(如果可能的话,也可以从df中删除这些组)
谢谢!
根据这里公认的答案,我修改了您的问题的代码:
import pandas as pd
df = pd.DataFrame({"a": ["x", "y", "z", "w"],
"b": [2, 2, 3, 1],
"c": [3, 3, 2, 5]})
n = 1
for x, y in df.groupby(['b','c']):
list_df = [y[i: i+n] for i in range(0, y.shape[0], n)]
for i in list_df:
print(i)
#a b c
#w 1 5
#
#a b c
#x 2 3
#
#a b c
#y 2 3
#
#a b c
#z 3 2
按n
行的长度拆分分组数据框。如果每次都想从数据框中删除每个组,可以添加df.drop(i.index)
,这将删除索引值(因为这些值是执行的):
for x,y in df.groupby(['b','c']):
list_df = [y[i: i+n] for i in range(0, y.shape[0], n)]
for i in list_df:
print(i)
df = df.drop(i.index)
print(df)