我有一个奇怪的问题,我有以下数据帧:
embedding
0 [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270...
1 [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270...
2 [0.0, 0.0, 0.0, 0.6223578453063965, 0.0, 0.270..
这是一个带有单列命名嵌入的数据帧。每行大约有100个项目数组。每一行的尺寸都一样。
如何展开它,使数组中的每个项在数据帧中都有自己的列?有可能吗?还是必须提取numpy数组并从嵌套数组中创建数据帧?
更新:我没有所有列的名称。这对我来说并不重要。重要的是从numpy数组中保留顺序。
更新2:根据评论-
print(Xtest_e1.head(2).to_dict())
{'embedding': {0: array([0. , 0. , 0. , 0.62235785, 0. ,
0.27049118, 0. , 0.31094068, 0. , 0. ,
0. , 0. , 0. , 0.4330532 , 0. ,
0. , 0.25157961, 0. , 0. , 0. ,
0.40683705, 0.01569915, 0. , 0. , 0. ,
0.13090582, 0. , 0.49955425, 0.06970194, 0.29155406,
0. , 0. , 0.27342197, 0. , 0. ,
0. , 0.04415211, 0. , 0.03908829, 0. ,
0.07673171, 0.33199945, 0. , 0.51759815, 0. ,
0.47191489, 0.45380819, 0.13475986, 0. , 0. ,
0. , 0. , 0. , 0. , 0.08000553,
0. , 0.02991109, 0. , 0.50515431, 0. ,
0.24663273, 0. , 0.50839704, 0. , 0. ,
0.05281948, 0.44884402, 0. , 0.44542992, 0.15376966,
0. , 0. , 0. , 0.39128256, 0.49497205,
0. , 0. ]), 1: array([0. , 0. , 0. , 0.62235785, 0. ,
0.27049118, 0. , 0.31094068, 0. , 0. ,
0. , 0. , 0. , 0.4330532 , 0. ,
0. , 0.25157961, 0. , 0. , 0. ,
0.40683705, 0.01569915, 0. , 0. , 0. ,
0.13090582, 0. , 0.49955425, 0.06970194, 0.29155406,
0. , 0. , 0.27342197, 0. , 0. ,
0. , 0.04415211, 0. , 0.03908829, 0. ,
0.07673171, 0.33199945, 0. , 0.51759815, 0. ,
0.47191489, 0.45380819, 0.13475986, 0. , 0. ,
0. , 0. , 0. , 0. , 0.08000553,
0. , 0.02991109, 0. , 0.50515431, 0. ,
0.24663273, 0. , 0.50839704, 0. , 0. ,
0.05281948, 0.44884402, 0. , 0.44542992, 0.15376966,
0. , 0. , 0. , 0.39128256, 0.49497205,
0. , 0. ])}}
这是你所期望的吗:
>>> pd.DataFrame(Xtest_e1["embedding"].tolist()).add_prefix("c")
c0 c1 c2 c3 c4 ... c72 c73 c74 c75 c76
0 0.0 0.0 0.0 0.622358 0.0 ... 0.0 0.391283 0.494972 0.0 0.0
1 0.0 0.0 0.0 0.622358 0.0 ... 0.0 0.391283 0.494972 0.0 0.0
[2 rows x 77 columns]