使用Pandas更新列中日期子集的年份



我有一个pandas数据框架,它由一年的日期列和与之配套的每日数据组成。我只想更新与1月份有关的行的年份。我可以选择数据帧的一月子集,并尝试根据这里给出的答案更改该子集的年份,但是当我试图通过添加偏移量来更新该子集的值时,我得到了一个错误。

设置:

import pandas as pd
df = pd.DataFrame({'Date': pd.date_range(start = "01-01-2023", end = "12-31-2023"), 'data': 25})

Select January子集:

df[df['Date'].dt.month == 1]

按预期运行:

Date    data
0   2023-01-01  25
1   2023-01-02  25
2   2023-01-03  25
3   2023-01-04  25
4   2023-01-05  25
5   2023-01-06  25
6   2023-01-07  25
7   2023-01-08  25
8   2023-01-09  25
9   2023-01-10  25
10  2023-01-11  25
11  2023-01-12  25
12  2023-01-13  25
13  2023-01-14  25
14  2023-01-15  25
15  2023-01-16  25
16  2023-01-17  25
17  2023-01-18  25
18  2023-01-19  25
19  2023-01-20  25
20  2023-01-21  25
21  2023-01-22  25
22  2023-01-23  25
23  2023-01-24  25
24  2023-01-25  25
25  2023-01-26  25
26  2023-01-27  25
27  2023-01-28  25
28  2023-01-29  25
29  2023-01-30  25
30  2023-01-31  25

尝试更改:

df[df['Date'].dt.month == 1] = df[df['Date'].dt.month == 1] + pd.offsets.DateOffset(years=1)
TypeError: Concatenation operation is not implemented for NumPy arrays, use np.concatenate() instead. Please do not rely on this error; it may not be given on all Python implementations.

我已经尝试了一些不同的变化,但似乎有问题改变子集数据帧数据。

您必须选择Date列(解决方案由@mozway增强,谢谢):

df.loc[df['Date'].dt.month == 1, 'Date'] += pd.offsets.DateOffset(years=1)
print(df)
# Output
Date  data
0   2024-01-01    25
1   2024-01-02    25
2   2024-01-03    25
3   2024-01-04    25
4   2024-01-05    25
..         ...   ...
360 2023-12-27    25
361 2023-12-28    25
362 2023-12-29    25
363 2023-12-30    25
364 2023-12-31    25
[365 rows x 2 columns]

最新更新