如何将df上的numpy数组扩展到它们自己的列



我有一个奇怪的问题,我有以下数据帧:

embedding
0   [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270...
1   [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270...
2   [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270..

这是一个带有单列命名嵌入的数据帧。每行大约有100个项目数组。每一行的尺寸都一样。

如何展开它,使数组中的每个项在数据帧中都有自己的列?有可能吗?还是必须提取numpy数组并从嵌套数组中创建数据帧?

更新:我没有所有列的名称。这对我来说并不重要。重要的是从numpy数组中保留顺序。

更新2:根据评论-

print(Xtest_e1.head(2).to_dict())
{'embedding': {0: array([0.        , 0.        , 0.        , 0.62235785, 0.        ,
0.27049118, 0.        , 0.31094068, 0.        , 0.        ,
0.        , 0.        , 0.        , 0.4330532 , 0.        ,
0.        , 0.25157961, 0.        , 0.        , 0.        ,
0.40683705, 0.01569915, 0.        , 0.        , 0.        ,
0.13090582, 0.        , 0.49955425, 0.06970194, 0.29155406,
0.        , 0.        , 0.27342197, 0.        , 0.        ,
0.        , 0.04415211, 0.        , 0.03908829, 0.        ,
0.07673171, 0.33199945, 0.        , 0.51759815, 0.        ,
0.47191489, 0.45380819, 0.13475986, 0.        , 0.        ,
0.        , 0.        , 0.        , 0.        , 0.08000553,
0.        , 0.02991109, 0.        , 0.50515431, 0.        ,
0.24663273, 0.        , 0.50839704, 0.        , 0.        ,
0.05281948, 0.44884402, 0.        , 0.44542992, 0.15376966,
0.        , 0.        , 0.        , 0.39128256, 0.49497205,
0.        , 0.        ]), 1: array([0.        , 0.        , 0.        , 0.62235785, 0.        ,
0.27049118, 0.        , 0.31094068, 0.        , 0.        ,
0.        , 0.        , 0.        , 0.4330532 , 0.        ,
0.        , 0.25157961, 0.        , 0.        , 0.        ,
0.40683705, 0.01569915, 0.        , 0.        , 0.        ,
0.13090582, 0.        , 0.49955425, 0.06970194, 0.29155406,
0.        , 0.        , 0.27342197, 0.        , 0.        ,
0.        , 0.04415211, 0.        , 0.03908829, 0.        ,
0.07673171, 0.33199945, 0.        , 0.51759815, 0.        ,
0.47191489, 0.45380819, 0.13475986, 0.        , 0.        ,
0.        , 0.        , 0.        , 0.        , 0.08000553,
0.        , 0.02991109, 0.        , 0.50515431, 0.        ,
0.24663273, 0.        , 0.50839704, 0.        , 0.        ,
0.05281948, 0.44884402, 0.        , 0.44542992, 0.15376966,
0.        , 0.        , 0.        , 0.39128256, 0.49497205,
0.        , 0.        ])}}

这是你所期望的吗:

>>> pd.DataFrame(Xtest_e1["embedding"].tolist()).add_prefix("c")
c0   c1   c2        c3   c4  ...  c72       c73       c74  c75  c76
0  0.0  0.0  0.0  0.622358  0.0  ...  0.0  0.391283  0.494972  0.0  0.0
1  0.0  0.0  0.0  0.622358  0.0  ...  0.0  0.391283  0.494972  0.0  0.0
[2 rows x 77 columns]

最新更新