我有一个格式的数据帧
A B
1990-02 1
1990-03 1
1999-05 1
1992-08 2
1996-12 2
2020-01 2
1990-05 3
1995-08 3
1999-11 3
2021-12 3
如何根据列B 的唯一值将此数据帧转换为组
所以我的结果应该是这种格式的
[[[1990-02, 1],[1990-03, 1],[1999-05, 1]],
[[1992-08, 2],[1996-12, 2],[2020-01, 2]],
[[1990-05, 3],[1995-08, 3],[1999-11, 3],[2021-12, 3]]
]
这应该使作业成为
import pandas as pd
data = {"A": ["1990-02", "1990-03","1999-05","1992-08","1996-12",
"2020-01","1990-05","1995-08","1999-11", "2021-12"],
"B": [1,1,1,2,2,2,3,3,3,3]}
df = pd.DataFrame(data=data)
out = df.groupby("B")['A'].apply(list)
output = [[[date, b_value] for date in block]
for b_value, block in zip(out.index, out.values)]
print(output)
这里有一种获得数组等效结构的方法:
>>> df.groupby("B").apply(pd.DataFrame.to_numpy).values
[array([['1990-02', 1],
['1990-03', 1],
['1999-05', 1]], dtype=object)
array([['1992-08', 2],
['1996-12', 2],
['2020-01', 2]], dtype=object)
array([['1990-05', 3],
['1995-08', 3],
['1999-11', 3],
['2021-12', 3]], dtype=object)]
这里有一种方法可以获得您想要的东西:
df.assign(l=df.agg(list, axis=1)).groupby('B')['l'].agg(list).tolist()
输出:
[[['1990-02', 1], ['1990-03', 1], ['1999-05', 1]],
[['1992-08', 2], ['1996-12', 2], ['2020-01', 2]],
[['1990-05', 3], ['1995-08', 3], ['1999-11', 3], ['2021-12', 3]]]