如何在轴上每n步采样一个数据帧



我有一个具有这种通用结构的数据帧(原始数据的一个子集(。


letter  number  wavelength       res
0       A       1         600  0.002809
1       A       1         610  0.003098
2       A       1         620  0.002338
3       A       1         630  0.002307
4       A       1         640  0.002453
5       A       1         650  0.003447
6       A       1         660  0.002596
7       A       1         670  0.002740
8       A       1         680  0.003353
9       A       1         690  0.002095
10      A       1         700  0.002912
11      A       2         600  0.001417
12      A       2         610  0.005712
13      A       2         620  0.007647
14      A       2         630  0.007952
15      A       2         640  0.007098
16      A       2         650  0.007043
17      A       2         660  0.006689
18      A       2         670  0.006314
19      A       2         680  0.006907
20      A       2         690  0.006896
21      A       2         700  0.007203

我想得到一个只包含每4个波长的新数据帧,因此:[600640680,…,600640680,…,…].

这是dt.head(20(.to_dict((

{'letter': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A', 6: 'A', 7: 'A', 8: 'A', 9: 'A', 10: 'A', 11: 'A', 12: 'A', 13: 'A', 14: 'A', 15: 'A', 16: 'A', 17: 'A', 18: 'A', 19: 'A'}, 'number': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 2, 12: 2, 13: 2, 14: 2, 15: 2, 16: 2, 17: 2, 18: 2, 19: 2}, 'wavelength': {0: 600, 1: 610, 2: 620, 3: 630, 4: 640, 5: 650, 6: 660, 7: 670, 8: 680, 9: 690, 10: 700, 11: 600, 12: 610, 13: 620, 14: 630, 15: 640, 16: 650, 17: 660, 18: 670, 19: 680}, 'res': {0: 0.002809136, 1: 0.003098235, 2: 0.002337703, 3: 0.002307137, 4: 0.002452738, 5: 0.003447402, 6: 0.002595696, 7: 0.002739954, 8: 0.003353279, 9: 0.002095429, 10: 0.002911785, 11: 0.001416974, 12: 0.005712076, 13: 0.007646978, 14: 0.007951877, 15: 0.007097805, 16: 0.007042982, 17: 0.006689001, 18: 0.006313695, 19: 0.006906712}}

因此,在这种情况下:


letter  number  wavelength       res
0      A       1         600  0.002809
1      A       1         640  0.002453
2      A       1         680  0.003353
3      A       2         600  0.001417
4      A       2         640  0.007098
5      A       2         680  0.006907

如何仅提取这些数据?我之前不知道数据将如何排序。因此,该解决方案应当仅取决于";波长";轴

我看过pandas.DataFrame.sample,但它似乎是随机采样的,我想用预定的步骤进行选择。

编辑:可能的解决方案

by_wl = data.groupby("wavelength")
#first group the data as Jon Clements proposed
index = pd.pivot_table(data, index = "wavelength").index
#get all the elements to select
new = []
to_get = np.arange(0, len(index), 4)
index_to_get = index[to_get]
#select one every n step
for wl, frame in by_wl:
#loop over the groupbby and get only the one that are in index_to_get
if wl in index_to_get:
new.append(frame)
result = pd.concat(new,sort=False).sort_index().reset_index()
#finally create new dataframe, sort over the index and reset the index (because I have deleted the others)

IIUC:

df[::4]

它来自于符号start:stop:step

注意,此语法需要Pandas 1.4+

该解决方案假设您已经按(number, wavelength)对数据帧进行了排序,并且您的波长确实以10的步长增加,这在您的示例数据帧中似乎就是这样。

无论如何,你想要每四个波长,按数字分组:

In [4]: df.groupby("number").nth[::4]
Out[4]:
letter  wavelength       res
number
1           A         600  0.002809
1           A         640  0.002453
1           A         680  0.003353
2           A         600  0.001417
2           A         640  0.007098
2           A         680  0.006907

最新更新