我有一个嵌套的三维pandas DataFrame,如下所示:
data = pd.DataFrame([pd.Series([k for k in range(10)]) for j in range(5)] for i in range(8))
我想对这个数据帧进行切片,使k维序列的长度是它们当前长度(10)的一半。
我尝试过data.iloc[:,:][0:6]
,但这只是返回前6行(I -维)。我也尝试遍历整个数据框架并替换每个单元格,但我想知道是否有更简洁的方法来做到这一点。
已更新:
回答问题的方法如下:
import pandas as pd
data = pd.DataFrame([pd.Series([k for k in range(10)]) for j in range(5)] for i in range(8))
ni, nj = data.shape
nk = len(data.loc[0, 0])
print(ni, nj, nk)
data2 = data.applymap(lambda x: x[:len(x) // 2])
print(*data2.shape, len(data2.loc[0, 0]))
输出:
8 5 10
8 5 5
如果你的数据是在3D numpy数组中,那么实际上可以对3D数组进行切片。
这是一个从熊猫到numpy到pandas的往返解决方案:
import pandas as pd
import numpy as np
data = pd.DataFrame([pd.Series([k for k in range(10)]) for j in range(5)] for i in range(8))
ni, nj = data.shape
nk = len(data.loc[0, 0])
print(ni, nj, nk)
xf = [y.to_numpy() for y in data.to_numpy().flatten()]
n = np.array(xf).reshape([ni, nj, nk])
print(n.shape)
print(n)
n2 = n[:, :, :nk // 2]
print(n2.shape)
print(n2)
data2 = pd.DataFrame([pd.Series(n2[i, j, :]) for j in range(n2.shape[1])] for i in range(n2.shape[0]))
ni, nj = data2.shape
nk = len(data2.loc[0, 0])
print(ni, nj, nk)
这是从包含k长度序列值的i x j数据帧转换为3D numpy数组的输入n
:
[[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]]
这里是三维numpy数组n2
,它的k-extent是原始3D数组(nk//2)的一半:
[[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]]
作为最后一步,切片的3D numpy数组n2
被转换回包含长度为nk//2的Series值的i x j数据帧。