我有一个数据帧(df1(,其中有一个日期列,日期格式为dd/mm/yyyy。
我有具有相同结构的第二个数据帧(df2(,但是,有一些共享数据。我想把数据从df2添加到df1,因为数据在df1中的最近日期之后。
我的方法是在df1中找到maxdate,然后在df2子集中查找日期并附加到df1
maxdate = df.loc[pd.to_datetime(df['DATE'],dayfirst=True).idxmax(), 'DATE']
# in this instance it is 11/09/2022 (dd/mm/yyyy)
df3 = df2.loc[pd.to_datetime(df2['DATE']) > maxdate] #this is to by my subset to append to df1
以下的一些df1
DATE TIME X Y Z
3692 23/08/2022 16:55:00 734154.2872 9551189.353 2.845237e+03
3693 23/08/2022 16:55:00 734199.2516 9551070.666 2.842993e+03
3694 23/08/2022 05:02:00 734669.6130 9551361.865 2.845012e+03
3695 24/08/2022 17:25:00 734215.9910 9551068.295 2.842111e+03
3696 24/08/2022 17:25:00 734684.8444 9551383.618 2.846049e+03
3697 27/08/2022 17:20:00 734214.1851 9551061.242 2.841501e+03
3698 28/08/2022 17:00:00 734669.6130 9551361.865 2.845012e+03
3699 30/08/2022 05:25:00 734176.3412 9551168.550 2.844325e+03
3700 01/09/2022 17:18:00 734686.1061 9551385.420 2.846083e+03
3701 01/09/2022 17:18:00 734667.0922 9551358.264 2.844812e+03
3702 01/09/2022 17:18:00 734164.7047 9551178.039 2.844962e+03
3703 02/09/2022 17:16:00 734151.9079 9551185.951 2.845472e+03
3704 03/09/2022 17:15:00 734141.2542 9551197.062 2.844747e+03
3705 04/09/2022 17:08:00 734687.3678 9551387.222 2.846116e+03
3706 04/09/2022 17:08:00 734665.8319 9551356.464 2.844713e+03
3707 05/09/2022 05:08:00 734704.3326 9551376.581 2.842331e+03
3708 07/09/2022 16:58:00 734687.3678 9551387.222 2.846116e+03
3709 08/09/2022 16:55:00 734663.3109 9551352.864 2.844512e+03
3710 10/09/2022 17:03:00 734689.8913 9551390.826 2.846184e+03
3711 11/09/2022 17:13:00 734691.1530 9551392.628 9.551393e+06
以下的一些df2
DATE TIME X Y Z
134 23/08/22 16:55:00 734154.2872 9551189.3534 2845.237
135 23/08/22 16:55:00 734199.2516 9551070.6664 2842.9929
136 23/08/22 5:02:00 734669.613 9551361.8645 2845.0122
138 24/08/22 17:25:00 734215.991 9551068.2954 2842.1106
139 24/08/22 17:25:00 734684.8444 9551383.618 2846.0492
147 27/08/22 17:20:00 734214.1851 9551061.2423 2841.501
149 28/08/22 17:00:00 734669.613 9551361.8645 2845.0122
151 29/08/22 17:30:00 - - -
153 30/08/22 5:25:00 734176.3412 9551168.5498 2844.325
180 11/09/22 17:13:00 734691.153 9551392.6276 9551392.6276
然而,df3正在对包括在"0"之前的日期的数据帧进行子设置;maxdate";
我觉得这与我的日期格式有关。
感谢任何帮助。
您需要将值转换为pandas DateTime,否则比较将基于字符串值而不是日期,也不清楚样本最大日期11/09/2022
中的11是日期还是09是日期,如果11
是日期,您还需要将dayfirst=True
传递给pd.to_datetime
:
>>> maxdate=pd.to_datetime('11/09/2022')
# Timestamp('2022-11-09 00:00:00')
>>> df2 = df.loc[pd.to_datetime(df['DATE'], dayfirst=True) > maxdate]
以下是您添加到问题中的示例数据的执行:
# Getting the max date from first dataframe
max_date=pd.to_datetime(df1['DATE'],dayfirst=True).max()
max_date
Timestamp('2022-09-11 00:00:00')
# Filtering second dataframe based on maximum date
df2[pd.to_datetime(df2['DATE'], dayfirst=True)>max_date]
Empty DataFrame
Columns: [DATE, TIME, X, Y, Z]
Index: []
# Result is empty dataframe for the sample data cause no record matches condition
# Records for maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)==max_date]
DATE TIME X Y Z
180 11/09/22 17:13:00 734691.153 9551392.6276 9551392.6276
# Records for dates older than the maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)<max_date]
DATE TIME X Y Z
134 23/08/22 16:55:00 734154.2872 9551189.3534 2845.237
135 23/08/22 16:55:00 734199.2516 9551070.6664 2842.9929
136 23/08/22 5:02:00 734669.613 9551361.8645 2845.0122
138 24/08/22 17:25:00 734215.991 9551068.2954 2842.1106
139 24/08/22 17:25:00 734684.8444 9551383.618 2846.0492
147 27/08/22 17:20:00 734214.1851 9551061.2423 2841.501
149 28/08/22 17:00:00 734669.613 9551361.8645 2845.0122
151 29/08/22 17:30:00 - - -
153 30/08/22 5:25:00 734176.3412 9551168.5498 2844.325