Pandas:在选定日期上方10行



我试图使用get_loc获取当前日期,然后从数据中返回当前日期以上的10行,但我一直收到一个Key Error。

这是我的数据表=>posting_df5:

Posting_date  rooms  Origin  Rooms booked         ADR       Revenue
0    2019-03-31      1       1             1  156.000000    156.000000
1    2019-04-01     13      13            13  160.720577   2089.367500
2    2019-04-02     15      15            15  167.409167   2511.137500
3    2019-04-03     21      21            21  166.967405   3506.315500
4    2019-04-04     37      37            37  162.384909   6008.241643
5    2019-04-05     52      52            52  202.150721  10511.837476
6    2019-04-06     49      49            49  199.611887   9780.982476
7    2019-04-07     44      44            44  182.233171   8018.259527
8    2019-04-08     50      50            50  187.228192   9361.409623
9    2019-04-09     37      37            37  177.654422   6573.213623
10   2019-04-10     31      31            31  184.138208   5708.284456

我试着做以下事情:

idx = posting_df7.index.get_loc('2019-04-05')
posting_df7 = posting_df5.iloc[idx - 5 : idx + 5]

但我收到了以下错误:

indexer = self._get_level_indexer(key, level=level)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2939, in _get_level_indexer
code = level_index.get_loc(key)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 128, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 91, in pandas._libs.index.Int64Engine._check_type
KeyError: '2019-04-05'

因此,我尝试在使用get_loc之前先对Posting_date进行索引,但效果不佳:

rooms  Origin  Rooms booked         ADR       Revenue
Posting_date                                                       
0  2019-03-31        1       1             1  156.000000    156.000000
1  2019-04-01       13      13            13  160.720577   2089.367500
2  2019-04-02       15      15            15  167.409167   2511.137500
3  2019-04-03       21      21            21  166.967405   3506.315500
4  2019-04-04       37      37            37  162.384909   6008.241643
5  2019-04-05       52      52            52  202.150721  10511.837476
6  2019-04-06       49      49            49  199.611887   9780.982476
7  2019-04-07       44      44            44  182.233171   8018.259527
8  2019-04-08       50      50            50  187.228192   9361.409623
9  2019-04-09       37      37            37  177.654422   6573.213623

然后我使用了相同的get_loc函数,但出现了相同的错误。如何在不需要日期的情况下选择行。

感谢

这里有一种不同的方法。。。

由于ilocget_loc可能很棘手,因此此解决方案使用布尔掩码返回相对于给定日期的行,然后使用head()函数返回所需的行数。

import pandas as pd
PATH = '/home/user/Desktop/so/room_rev.csv'
# Read in data from a CSV.
df = pd.read_csv(PATH)
# Convert the date column to a `datetime` format.
df['Posting_date'] = pd.to_datetime(df['Posting_date'], 
format='%Y-%m-%d')
# Sort based on date.
df.sort_values('Posting_date')

原始数据集:

Posting_date  rooms  Origin  Rooms booked         ADR       Revenue
0    2019-03-31      1       1             1  156.000000    156.000000
1    2019-04-01     13      13            13  160.720577   2089.367500
2    2019-04-02     15      15            15  167.409167   2511.137500
3    2019-04-03     21      21            21  166.967405   3506.315500
4    2019-04-04     37      37            37  162.384909   6008.241643
5    2019-04-05     52      52            52  202.150721  10511.837476
6    2019-04-06     49      49            49  199.611887   9780.982476
7    2019-04-07     44      44            44  182.233171   8018.259527
8    2019-04-08     50      50            50  187.228192   9361.409623
9    2019-04-09     37      37            37  177.654422   6573.213623
10   2019-04-10     31      31            31  184.138208   5708.284456

解决方案:

head()函数中的值替换为要返回的行数注意:逆还有一个tail()函数。

df[df['Posting_date'] > '2019-04-05'].head(3)

输出:

Posting_date  rooms  Origin  Rooms booked         ADR      Revenue
6   2019-04-06     49      49            49  199.611887  9780.982476
7   2019-04-07     44      44            44  182.233171  8018.259527
8   2019-04-08     50      50            50  187.228192  9361.409623

最新更新