两个 Pandas 数据帧:根据日期,向数据帧添加值



我的第一个 Pandas 数据帧看起来(示例)如下 -Date值为datetimeTemp in F值为floatSignal值为string

Date  Temp in F   Signal    
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze  
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze  
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze   

我的第二个 Pandas 数据帧看起来(示例)如下 -Date值为datetimeMoisture值为浮点数。

Date     Moisture
1994-05-27      4.21232
1995-05-19      3.30000
1996-05-24      3.43227
1997-05-23      3.63333
1998-05-15      4.60000
1999-05-28      2.43240
2000-05-26      1.34237
2001-05-21      1.23430
2002-05-29      2.34343
2003-05-02      1.83433
2004-04-29      2.34341
2005-06-28      3.15373
2006-05-05      1.78565
2007-05-04      0.34533
2008-08-02      0.42267
2009-05-07      0.40000
2010-08-07      0.30000
2011-05-06      2.30000
2012-05-04      3.12300
2013-05-06      4.10200
2014-05-02      2.42000
2015-05-08      2.53300
2016-06-09      1.20000
2017-05-11      1.45000
2018-05-10      1.30000
2019-05-15      1.67230
2020-05-29      2.34000

现在,我想根据Signal所在的日期值将Moisture的值添加到第一个数据帧freeze。举个例子

  • 在第二个数据帧中:值 4.21232 记录于 1994-05-27。此值应添加到日期为 1994-10-03 的数据帧 1 行,因为它在湿度日期 (=1994-05-27) 之后,并且信号已冻结。
  • 在第二个数据帧中:值 1.67230 记录于 2019-05-15。此值应添加到日期为 2019-10-01 的数据帧 1 行,因为它在湿度日期 (=2019-05-15) 之后,并且信号已冻结。
  • 如果找到的中间日期没有值,则它应保持为空。

此过程应使用第二个数据帧的"水分"列的所有值完成,以便修改后的第一个数据帧应如下所示:

Date  Temp in F   Signal    Moisture
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     4.21232  
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze     1.30000
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze     1.67230
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze     2.34000

有人知道如何解决这个问题吗?

merge_asof()与连接所需数据的子集一起使用,然后重新添加数据。

df1 = pd.read_csv(io.StringIO("""         Date  Temp_in_F   Signal    
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze  
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze  
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze"""), sep="s+")
df1["Date"] = pd.to_datetime(df1["Date"],errors="coerce")
df1 = df1.dropna().reset_index(drop=True)
df2 = pd.read_csv(io.StringIO("""      Date     Moisture
1994-05-27      4.21232
1995-05-19      3.30000
1996-05-24      3.43227
1997-05-23      3.63333
1998-05-15      4.60000
1999-05-28      2.43240
2000-05-26      1.34237
2001-05-21      1.23430
2002-05-29      2.34343
2003-05-02      1.83433
2004-04-29      2.34341
2005-06-28      3.15373
2006-05-05      1.78565
2007-05-04      0.34533
2008-08-02      0.42267
2009-05-07      0.40000
2010-08-07      0.30000
2011-05-06      2.30000
2012-05-04      3.12300
2013-05-06      4.10200
2014-05-02      2.42000
2015-05-08      2.53300
2016-06-09      1.20000
2017-05-11      1.45000
2018-05-10      1.30000
2019-05-15      1.67230
2020-05-29      2.34000"""), sep="s+")
df2["Date"] = pd.to_datetime(df2["Date"])
# useful ...
mask = df1["Signal"]=="freeze"
# exclude defrost from merge_asof()
pd.concat([
pd.merge_asof(df1[mask].sort_values("Date"), df2.sort_values("Date"), on="Date")
, df1[~mask] # put defrost back in
]).sort_values("Date").reset_index(drop=True)

输出

Date Temp_in_F   Signal  Moisture
1990-10-01   2.23337   freeze       NaN
1991-07-31   2.99860  defrost       NaN
1991-10-01   3.12221   freeze       NaN
1992-07-31    3.2328  defrost       NaN
1992-10-01   4.21327   freeze       NaN
1993-07-29   2.23222  defrost       NaN
1993-10-01   1.53200   freeze       NaN
1994-07-29   2.15030  defrost       NaN
1994-10-03   1.22299   freeze   4.21232
2018-10-01   8.95045   freeze   1.30000
2019-07-31   9.32463  defrist       NaN
2019-10-01   9.34722   freeze   1.67230
2020-07-31  10.53200  defrost       NaN
2020-10-01  10.34000   freeze   2.34000

相关内容

  • 没有找到相关文章

最新更新