从每小时的数据中,获取每列的每日nsmalest值



我有一个数据帧df,其中有n列,每小时的数据(date_I X1_I X2_I…Xn_I(
对于每一天,我都希望获得每一列的nsmalest值。但我找不到一种方法,不在列上循环
使用最小值df.groupby(pd.Grouper(freq='D'((.min((似乎可以做到这一点,但当我尝试nsmallist方法时,我会收到以下错误消息:
"无法访问'DataFrameGroupBy'对象的可调用属性'nsmallist',请尝试使用'apply'方法">
我尝试将nsmallist与"apply"方法一起使用,但被要求指定列
如果有人有想法,那将非常有帮助
谢谢

附言:很抱歉格式化,这是我的第一篇文章编辑:一些插图我的数据是什么样子的:

0             1     ...          9678          9679
2022-01-08 00:00:00  18472.232746  28934.878033  ...  20668.503228  22079.457224
2022-01-08 01:00:00  19546.101746  30239.880033  ...  21789.779228  23330.190224
2022-01-08 02:00:00  22031.448746  33016.048033  ...  24278.199228  25990.503224
2022-01-08 03:00:00  24089.368644  36134.608919  ...  26327.332591  28089.134306
2022-01-08 04:00:00  24640.942644  36818.412919  ...  26894.204591  28736.705306
2022-01-08 05:00:00  23329.700644  35639.693919  ...  25555.199591  27379.323306
2022-01-08 06:00:00  20990.043644  33329.805919  ...  23137.500591  24917.126306
2022-01-08 07:00:00  18314.599644  30347.799919  ...  20167.500591  22022.524306
2022-01-08 08:00:00  17628.482226  31301.113041  ...  21665.296600  24202.625832
2022-01-08 09:00:00  15743.339226  29588.354041  ...  19912.297600  22341.947832
2022-01-08 10:00:00  15498.405226  29453.561041  ...  19799.009600  22131.170832
2022-01-08 11:00:00  14950.121226  28767.791041  ...  19328.678600  21507.167832
2022-01-08 12:00:00  13925.869226  27530.472041  ...  18404.139600  20460.316832
2022-01-08 13:00:00  17502.122226  30922.783041  ...  21990.380600  24008.382832
2022-01-08 14:00:00  19159.511385  34275.005187  ...  23961.590286  26460.214883
2022-01-08 15:00:00  20583.356385  35751.662187  ...  25315.380286  27793.800883
2022-01-08 16:00:00  20443.423385  35925.362187  ...  25184.576286  27672.536883
2022-01-08 17:00:00  15825.211385  31604.614187  ...  20646.669286  23145.311883
2022-01-08 18:00:00  11902.354052  28786.559805  ...  16028.363856  19313.677750
2022-01-08 19:00:00  13483.710052  30631.806805  ...  17635.338856  20948.556750
2022-01-08 20:00:00  16084.773323  33944.862396  ...  20627.810852  22763.962851
2022-01-08 21:00:00  18340.833323  36435.799396  ...  22920.037852  25240.320851
2022-01-08 22:00:00  15110.698323  33159.222396  ...  19794.355852  22102.416851
2022-01-08 23:00:00  15663.400323  33741.501396  ...  20180.693852  22605.909851
2022-01-09 00:00:00  19500.930751  39058.431760  ...  24127.257756  26919.289816
2022-01-09 01:00:00  20562.985751  40330.807760  ...  25123.488756  28051.573816
2022-01-09 02:00:00  23408.547751  43253.635760  ...  27840.447756  30960.372816
2022-01-09 03:00:00  25975.071191  45523.722743  ...  30274.316013  32276.174330
2022-01-09 04:00:00  27180.858191  46586.959743  ...  31348.131013  33414.631330
2022-01-09 05:00:00  26383.511191  45793.920743  ...  30598.931013  32605.280330
...           ...  ...           ...           ...

我用min函数得到的:

2022-01-08  11902.354052  27530.472041  ...  16028.363856  19313.677750
2022-01-09  14491.281907  30293.870235  ...  16766.428013  21386.135041
...

我想要什么,例如nsmalest(2(

2022-01-08  11902.354052  27530.472041  ...  16028.363856  19313.677750
13483.710052  28767.791041  ...  17635.338856  20460.316832
2022-01-09  14491.281907  30293.870235  ...  16766.428013  21386.135041
14721.392907  30722.928235  ...  17130.594013  21732.426041
...

按天分组,将两个最小的值作为列表并分解所有列(pandas>=1.3.0(

get_2smallest = lambda x: x.nsmallest(2).tolist()
out = df.resample('D').apply(get_2smallest).explode(df.columns.tolist())
print(out)
# Output
0             1          9678          9679
2022-01-08  11902.354052  27530.472041  16028.363856   19313.67775
2022-01-08  13483.710052  28767.791041  17635.338856  20460.316832
2022-01-09  19500.930751   39058.43176  24127.257756  26919.289816
2022-01-09  20562.985751   40330.80776  25123.488756  28051.573816

更新

另一个版本,也许更快:

out = df.set_index(df.index.date).stack().rename_axis(['Date', 'Col']) 
.rename('Val').sort_values().groupby(level=[0, 1]).head(2) 
.sort_index().reset_index().assign(Idx=lambda x: x.index % 2) 
.pivot(index=['Date', 'Idx'], columns='Col', values='Val') 
.droplevel('Idx').rename_axis(index=None, columns=None)

最新更新