我试图使用get_loc获取当前日期,然后从数据中返回当前日期以上的10行,但我一直收到一个Key Error。
这是我的数据表=>posting_df5:
Posting_date rooms Origin Rooms booked ADR Revenue
0 2019-03-31 1 1 1 156.000000 156.000000
1 2019-04-01 13 13 13 160.720577 2089.367500
2 2019-04-02 15 15 15 167.409167 2511.137500
3 2019-04-03 21 21 21 166.967405 3506.315500
4 2019-04-04 37 37 37 162.384909 6008.241643
5 2019-04-05 52 52 52 202.150721 10511.837476
6 2019-04-06 49 49 49 199.611887 9780.982476
7 2019-04-07 44 44 44 182.233171 8018.259527
8 2019-04-08 50 50 50 187.228192 9361.409623
9 2019-04-09 37 37 37 177.654422 6573.213623
10 2019-04-10 31 31 31 184.138208 5708.284456
我试着做以下事情:
idx = posting_df7.index.get_loc('2019-04-05')
posting_df7 = posting_df5.iloc[idx - 5 : idx + 5]
但我收到了以下错误:
indexer = self._get_level_indexer(key, level=level)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2939, in _get_level_indexer
code = level_index.get_loc(key)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 128, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 91, in pandas._libs.index.Int64Engine._check_type
KeyError: '2019-04-05'
因此,我尝试在使用get_loc之前先对Posting_date进行索引,但效果不佳:
rooms Origin Rooms booked ADR Revenue
Posting_date
0 2019-03-31 1 1 1 156.000000 156.000000
1 2019-04-01 13 13 13 160.720577 2089.367500
2 2019-04-02 15 15 15 167.409167 2511.137500
3 2019-04-03 21 21 21 166.967405 3506.315500
4 2019-04-04 37 37 37 162.384909 6008.241643
5 2019-04-05 52 52 52 202.150721 10511.837476
6 2019-04-06 49 49 49 199.611887 9780.982476
7 2019-04-07 44 44 44 182.233171 8018.259527
8 2019-04-08 50 50 50 187.228192 9361.409623
9 2019-04-09 37 37 37 177.654422 6573.213623
然后我使用了相同的get_loc函数,但出现了相同的错误。如何在不需要日期的情况下选择行。
感谢
这里有一种不同的方法。。。
由于iloc
和get_loc
可能很棘手,因此此解决方案使用布尔掩码返回相对于给定日期的行,然后使用head()
函数返回所需的行数。
import pandas as pd
PATH = '/home/user/Desktop/so/room_rev.csv'
# Read in data from a CSV.
df = pd.read_csv(PATH)
# Convert the date column to a `datetime` format.
df['Posting_date'] = pd.to_datetime(df['Posting_date'],
format='%Y-%m-%d')
# Sort based on date.
df.sort_values('Posting_date')
原始数据集:
Posting_date rooms Origin Rooms booked ADR Revenue
0 2019-03-31 1 1 1 156.000000 156.000000
1 2019-04-01 13 13 13 160.720577 2089.367500
2 2019-04-02 15 15 15 167.409167 2511.137500
3 2019-04-03 21 21 21 166.967405 3506.315500
4 2019-04-04 37 37 37 162.384909 6008.241643
5 2019-04-05 52 52 52 202.150721 10511.837476
6 2019-04-06 49 49 49 199.611887 9780.982476
7 2019-04-07 44 44 44 182.233171 8018.259527
8 2019-04-08 50 50 50 187.228192 9361.409623
9 2019-04-09 37 37 37 177.654422 6573.213623
10 2019-04-10 31 31 31 184.138208 5708.284456
解决方案:
将head()
函数中的值替换为要返回的行数注意:逆还有一个tail()
函数。
df[df['Posting_date'] > '2019-04-05'].head(3)
输出:
Posting_date rooms Origin Rooms booked ADR Revenue
6 2019-04-06 49 49 49 199.611887 9780.982476
7 2019-04-07 44 44 44 182.233171 8018.259527
8 2019-04-08 50 50 50 187.228192 9361.409623