我有一个具有这种通用结构的数据帧(原始数据的一个子集(。
letter number wavelength res
0 A 1 600 0.002809
1 A 1 610 0.003098
2 A 1 620 0.002338
3 A 1 630 0.002307
4 A 1 640 0.002453
5 A 1 650 0.003447
6 A 1 660 0.002596
7 A 1 670 0.002740
8 A 1 680 0.003353
9 A 1 690 0.002095
10 A 1 700 0.002912
11 A 2 600 0.001417
12 A 2 610 0.005712
13 A 2 620 0.007647
14 A 2 630 0.007952
15 A 2 640 0.007098
16 A 2 650 0.007043
17 A 2 660 0.006689
18 A 2 670 0.006314
19 A 2 680 0.006907
20 A 2 690 0.006896
21 A 2 700 0.007203
我想得到一个只包含每4个波长的新数据帧,因此:[600640680,…,600640680,…,…].
这是dt.head(20(.to_dict((
{'letter': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A', 6: 'A', 7: 'A', 8: 'A', 9: 'A', 10: 'A', 11: 'A', 12: 'A', 13: 'A', 14: 'A', 15: 'A', 16: 'A', 17: 'A', 18: 'A', 19: 'A'}, 'number': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 2, 12: 2, 13: 2, 14: 2, 15: 2, 16: 2, 17: 2, 18: 2, 19: 2}, 'wavelength': {0: 600, 1: 610, 2: 620, 3: 630, 4: 640, 5: 650, 6: 660, 7: 670, 8: 680, 9: 690, 10: 700, 11: 600, 12: 610, 13: 620, 14: 630, 15: 640, 16: 650, 17: 660, 18: 670, 19: 680}, 'res': {0: 0.002809136, 1: 0.003098235, 2: 0.002337703, 3: 0.002307137, 4: 0.002452738, 5: 0.003447402, 6: 0.002595696, 7: 0.002739954, 8: 0.003353279, 9: 0.002095429, 10: 0.002911785, 11: 0.001416974, 12: 0.005712076, 13: 0.007646978, 14: 0.007951877, 15: 0.007097805, 16: 0.007042982, 17: 0.006689001, 18: 0.006313695, 19: 0.006906712}}
因此,在这种情况下:
letter number wavelength res
0 A 1 600 0.002809
1 A 1 640 0.002453
2 A 1 680 0.003353
3 A 2 600 0.001417
4 A 2 640 0.007098
5 A 2 680 0.006907
如何仅提取这些数据?我之前不知道数据将如何排序。因此,该解决方案应当仅取决于";波长";轴
我看过pandas.DataFrame.sample
,但它似乎是随机采样的,我想用预定的步骤进行选择。
编辑:可能的解决方案
by_wl = data.groupby("wavelength")
#first group the data as Jon Clements proposed
index = pd.pivot_table(data, index = "wavelength").index
#get all the elements to select
new = []
to_get = np.arange(0, len(index), 4)
index_to_get = index[to_get]
#select one every n step
for wl, frame in by_wl:
#loop over the groupbby and get only the one that are in index_to_get
if wl in index_to_get:
new.append(frame)
result = pd.concat(new,sort=False).sort_index().reset_index()
#finally create new dataframe, sort over the index and reset the index (because I have deleted the others)
IIUC:
df[::4]
它来自于符号start:stop:step
。
注意,此语法需要Pandas 1.4+
该解决方案假设您已经按(number, wavelength)
对数据帧进行了排序,并且您的波长确实以10的步长增加,这在您的示例数据帧中似乎就是这样。
无论如何,你想要每四个波长,按数字分组:
In [4]: df.groupby("number").nth[::4]
Out[4]:
letter wavelength res
number
1 A 600 0.002809
1 A 640 0.002453
1 A 680 0.003353
2 A 600 0.001417
2 A 640 0.007098
2 A 680 0.006907