从数据帧中查找范围



我有一个dataframe,如下所示,

Date                         3tier1      3tier2
2013-01-01 08:00:00+08:00   20.97946282 20.97946282
2013-01-02 08:00:00+08:00   20.74539378 20.74539378
2013-01-03 08:00:00+08:00   20.51126054 20.51126054
2013-01-04 08:00:00+08:00   20.27707322 20.27707322
2013-01-05 08:00:00+08:00   20.04284112 20.04284112
2013-01-06 08:00:00+08:00   19.80857234 19.80857234
2013-01-07 08:00:00+08:00   19.57427331 19.57427331
2013-01-08 08:00:00+08:00   19.33994822 19.33994822
2013-01-09 08:00:00+08:00   19.10559849 19.10559849
2013-01-10 08:00:00+08:00   18.87122241 18.87122241
2013-01-11 08:00:00+08:00   18.63681507 18.63681507
2013-01-12 08:00:00+08:00   18.40236877 18.40236877
2013-01-13 08:00:00+08:00   18.16787383 18.16787383
2013-01-14 08:00:00+08:00   17.93331972 17.93331972
2013-01-15 08:00:00+08:00   17.69869612 17.69869612
2013-01-16 08:00:00+08:00   17.46399372 17.46399372
2013-01-17 08:00:00+08:00   17.22920466 17.22920466
2013-01-18 08:00:00+08:00   16.9943227  16.9943227
2013-01-19 08:00:00+08:00   17.27850867 16.7593431
2013-01-20 08:00:00+08:00   17.69762778 16.52426248
2013-01-21 08:00:00+08:00   18.12537837 16.28907864
2013-01-22 08:00:00+08:00   18.56180775 16.05379043
2013-01-23 08:00:00+08:00   19.00689471 15.81839767
2013-01-24 08:00:00+08:00   19.46053468 15.58290109
2013-01-25 08:00:00+08:00   19.92252218 15.3473024
2013-01-26 08:00:00+08:00   20.3925305  15.11160423
2013-01-27 08:00:00+08:00   20.87008788 14.87581016
2013-01-28 08:00:00+08:00   21.35454987 14.63992467
2013-01-29 08:00:00+08:00   21.84506726 14.40395298
2013-01-30 08:00:00+08:00   22.34054913 14.16790086
2013-01-31 08:00:00+08:00   22.83962058 13.93177434
2013-02-01 08:00:00+08:00   23.34057473 13.69557937
2013-02-02 08:00:00+08:00   23.84131896 13.45932144
2013-02-03 08:00:00+08:00   24.33931544 13.22300514
2013-02-04 08:00:00+08:00   24.8315166  12.98663374
2013-02-05 08:00:00+08:00   25.31429677 12.7502088
2013-02-06 08:00:00+08:00   25.78338191 12.51372976
2013-02-07 08:00:00+08:00   26.23378052 12.27719367
2013-02-08 08:00:00+08:00   26.65971992 12.04059517
2013-02-09 08:00:00+08:00   27.05459343 11.80392662
2013-02-10 08:00:00+08:00   27.41092527 11.56717871
2013-02-11 08:00:00+08:00   27.72036088 11.3303412
2013-02-12 08:00:00+08:00   27.97369094 11.09340384
2013-02-13 08:00:00+08:00   28.16091685 10.85635718
2013-02-14 08:00:00+08:00   28.27136466 10.61919323
2013-02-15 08:00:00+08:00   28.29385218 10.38190579
2013-02-16 08:00:00+08:00   28.21691143 10.14449064
2013-02-17 08:00:00+08:00   28.02906576 9.906945571
2013-02-18 08:00:00+08:00   27.71915819 9.669270289
2013-02-19 08:00:00+08:00   27.27672516 9.431466436
2013-02-20 08:00:00+08:00   26.69240919 9.193537583
2013-02-21 08:00:00+08:00   25.9584032  8.955489323
2013-02-22 08:00:00+08:00   25.06891975 8.717329426
2013-02-23 08:00:00+08:00   24.02067835 8.479068052
2013-02-24 08:00:00+08:00   22.81340411 8.240718006
2013-02-25 08:00:00+08:00   21.45033241 8.002294987
2013-02-26 08:00:00+08:00   19.93872048 7.763817801
2013-02-27 08:00:00+08:00   18.29038758 7.525308512
2013-02-28 08:00:00+08:00   16.5223583  7.286792516
2013-03-01 08:00:00+08:00   14.65781009 7.048298548
2013-03-02 08:00:00+08:00   12.72782154 6.809858708
2013-03-03 08:00:00+08:00   10.77512952 6.57150857
2013-03-04 08:00:00+08:00   8.862866684 6.333287469
2013-03-05 08:00:00+08:00   7.095368405 6.095239078
2013-03-06 08:00:00+08:00   5.857412338 5.857412338
2013-03-07 08:00:00+08:00   6.062085995 5.619862847
2013-03-08 08:00:00+08:00   7.707047277 5.382654808
2013-03-09 08:00:00+08:00   9.419192265 5.145863673
2013-03-10 08:00:00+08:00   11.12489254 4.909579657
2013-03-11 08:00:00+08:00   12.78439056 4.673912321
2013-03-12 08:00:00+08:00   14.37406958 4.438996486
2013-03-13 08:00:00+08:00   15.87932086 4.204999838
2013-03-14 08:00:00+08:00   17.29126015 3.97213278
2013-03-15 08:00:00+08:00   18.60496304 3.740661371
2013-03-16 08:00:00+08:00   19.81836754 3.510924673
2013-03-17 08:00:00+08:00   20.9315104  3.283358444
2013-03-18 08:00:00+08:00   21.94595693 3.058528064
2013-03-19 08:00:00+08:00   22.86436015 2.837174881
2013-03-20 08:00:00+08:00   23.69011593 2.620282024
2013-03-21 08:00:00+08:00   24.42709384 2.409168144
2013-03-22 08:00:00+08:00   25.07942941 2.205620134
2013-03-23 08:00:00+08:00   25.65136634 2.012076744
2013-03-24 08:00:00+08:00   26.14713926 1.831868652
2013-03-25 08:00:00+08:00   26.57088882 1.669492776
2013-03-26 08:00:00+08:00   26.92660259 1.53082259
2013-03-27 08:00:00+08:00   27.21807571 1.423006398
2013-03-28 08:00:00+08:00   27.44888683 1.353644799
2013-03-29 08:00:00+08:00   27.66626757 1.328979238
2013-03-30 08:00:00+08:00   28.03215155 1.351655979
2013-03-31 08:00:00+08:00   28.34758652 1.419589908

我想为我选择的列找到每个month的范围。并对范围方向发生变化的月份进行分组,例如:1月份的3tier1实际上从20开始到16,然后再到22,例如:从1月1日到1月18日-向下20到16,然后从1月19日到2月15日从17到28,依此类推,

Expected output:
2013-01-01 to 2013-01-18 - 20 to 16
2013-01-19 to 2013-02-15 - 17 to 28

有没有一个内置的pandas函数可以轻松地做到这一点?感谢您提前提供的帮助。

我不知道有什么内置函数可以满足您的需求。它可以用足够多的代码行组合在一起。我会使用.diff().shift()

这就是我想到的。

import pandas as pd
import numpy as np
file = 'C:/path_to_file/data.csv'
df = pd.read_csv(file, parse_dates=['Date'])
# Now I have your dataframe loaded. ** Your procedures are below. 
df['trend'] = np.where(df['3tier1'].diff()>0,1,-1) # trend is increasing or decreasing
df['again'] = df['trend'].diff() # get the differnece in trend
df['again'] = df['again'].shift(periods=-1) + df['again']
df['change'] = np.where(df['again'].isin([2,-2,np.nan]), 2, 0)
# get to the desired data. 
dfc = df[df['change']==2]
dfc['to_date'] = dfc['Date'].shift(periods=-1)
dfc['to_End'] = dfc['3tier1'].shift(periods=-1)
dfc.drop(columns=['trend', 'again','change'], inplace=True)
# get the rows that show the trend
dfc = dfc.iloc[::2, :]
print(dfc)

最新更新