我有一个数据帧df,其中有n列,每小时的数据(date_I X1_I X2_I…Xn_I(
对于每一天,我都希望获得每一列的nsmalest值。但我找不到一种方法,不在列上循环
使用最小值df.groupby(pd.Grouper(freq='D'((.min((似乎可以做到这一点,但当我尝试nsmallist方法时,我会收到以下错误消息:
"无法访问'DataFrameGroupBy'对象的可调用属性'nsmallist',请尝试使用'apply'方法">
我尝试将nsmallist与"apply"方法一起使用,但被要求指定列
如果有人有想法,那将非常有帮助
谢谢
我有一个数据帧df,其中有n列,每小时的数据(date_I X1_I X2_I…Xn_I(
对于每一天,我都希望获得每一列的nsmalest值。但我找不到一种方法,不在列上循环
使用最小值df.groupby(pd.Grouper(freq='D'((.min((似乎可以做到这一点,但当我尝试nsmallist方法时,我会收到以下错误消息:
"无法访问'DataFrameGroupBy'对象的可调用属性'nsmallist',请尝试使用'apply'方法">
我尝试将nsmallist与"apply"方法一起使用,但被要求指定列
如果有人有想法,那将非常有帮助
谢谢
附言:很抱歉格式化,这是我的第一篇文章编辑:一些插图我的数据是什么样子的:
0 1 ... 9678 9679
2022-01-08 00:00:00 18472.232746 28934.878033 ... 20668.503228 22079.457224
2022-01-08 01:00:00 19546.101746 30239.880033 ... 21789.779228 23330.190224
2022-01-08 02:00:00 22031.448746 33016.048033 ... 24278.199228 25990.503224
2022-01-08 03:00:00 24089.368644 36134.608919 ... 26327.332591 28089.134306
2022-01-08 04:00:00 24640.942644 36818.412919 ... 26894.204591 28736.705306
2022-01-08 05:00:00 23329.700644 35639.693919 ... 25555.199591 27379.323306
2022-01-08 06:00:00 20990.043644 33329.805919 ... 23137.500591 24917.126306
2022-01-08 07:00:00 18314.599644 30347.799919 ... 20167.500591 22022.524306
2022-01-08 08:00:00 17628.482226 31301.113041 ... 21665.296600 24202.625832
2022-01-08 09:00:00 15743.339226 29588.354041 ... 19912.297600 22341.947832
2022-01-08 10:00:00 15498.405226 29453.561041 ... 19799.009600 22131.170832
2022-01-08 11:00:00 14950.121226 28767.791041 ... 19328.678600 21507.167832
2022-01-08 12:00:00 13925.869226 27530.472041 ... 18404.139600 20460.316832
2022-01-08 13:00:00 17502.122226 30922.783041 ... 21990.380600 24008.382832
2022-01-08 14:00:00 19159.511385 34275.005187 ... 23961.590286 26460.214883
2022-01-08 15:00:00 20583.356385 35751.662187 ... 25315.380286 27793.800883
2022-01-08 16:00:00 20443.423385 35925.362187 ... 25184.576286 27672.536883
2022-01-08 17:00:00 15825.211385 31604.614187 ... 20646.669286 23145.311883
2022-01-08 18:00:00 11902.354052 28786.559805 ... 16028.363856 19313.677750
2022-01-08 19:00:00 13483.710052 30631.806805 ... 17635.338856 20948.556750
2022-01-08 20:00:00 16084.773323 33944.862396 ... 20627.810852 22763.962851
2022-01-08 21:00:00 18340.833323 36435.799396 ... 22920.037852 25240.320851
2022-01-08 22:00:00 15110.698323 33159.222396 ... 19794.355852 22102.416851
2022-01-08 23:00:00 15663.400323 33741.501396 ... 20180.693852 22605.909851
2022-01-09 00:00:00 19500.930751 39058.431760 ... 24127.257756 26919.289816
2022-01-09 01:00:00 20562.985751 40330.807760 ... 25123.488756 28051.573816
2022-01-09 02:00:00 23408.547751 43253.635760 ... 27840.447756 30960.372816
2022-01-09 03:00:00 25975.071191 45523.722743 ... 30274.316013 32276.174330
2022-01-09 04:00:00 27180.858191 46586.959743 ... 31348.131013 33414.631330
2022-01-09 05:00:00 26383.511191 45793.920743 ... 30598.931013 32605.280330
... ... ... ... ...
我用min函数得到的:
2022-01-08 11902.354052 27530.472041 ... 16028.363856 19313.677750
2022-01-09 14491.281907 30293.870235 ... 16766.428013 21386.135041
...
我想要什么,例如nsmalest(2(
2022-01-08 11902.354052 27530.472041 ... 16028.363856 19313.677750
13483.710052 28767.791041 ... 17635.338856 20460.316832
2022-01-09 14491.281907 30293.870235 ... 16766.428013 21386.135041
14721.392907 30722.928235 ... 17130.594013 21732.426041
...
按天分组,将两个最小的值作为列表并分解所有列(pandas>=1.3.0
(
get_2smallest = lambda x: x.nsmallest(2).tolist()
out = df.resample('D').apply(get_2smallest).explode(df.columns.tolist())
print(out)
# Output
0 1 9678 9679
2022-01-08 11902.354052 27530.472041 16028.363856 19313.67775
2022-01-08 13483.710052 28767.791041 17635.338856 20460.316832
2022-01-09 19500.930751 39058.43176 24127.257756 26919.289816
2022-01-09 20562.985751 40330.80776 25123.488756 28051.573816
更新
另一个版本,也许更快:
out = df.set_index(df.index.date).stack().rename_axis(['Date', 'Col'])
.rename('Val').sort_values().groupby(level=[0, 1]).head(2)
.sort_index().reset_index().assign(Idx=lambda x: x.index % 2)
.pivot(index=['Date', 'Idx'], columns='Col', values='Val')
.droplevel('Idx').rename_axis(index=None, columns=None)