如何在多逻辑语句中的数据范围之间比较数据范围



我有问题,比较了多逻辑语句中两个数据范围之间的日期。

df1:

 EmailAddress     DateTimeCreated
    1@1        2019-02-12 20:47:00

df2:

 EmailAddress     DateTimeCreated
  1@1.com         2019-02-07 20:47:00
  2@2.com         2018-11-13 20:47:00
  3@3.com         2018-11-04 20:47:00

我想做三件事,只要DF1中有一排:

 1. Compare to see if `EmailAddress` from df1 is present in df2:
 1a. If `EmailAddress` is present, compare `DateTimeCreated` in df1 to `DateTimeCreated` in df2, 
 2. If `DateTimeCreated` in df1 is greater than today-90 days append df1 into df2. 

用更简单的词:

我想查看DF2中存在的电子邮件地址,如果是,请比较DF2中的DateTimeccreate,以查看自上次人回答以来自从今天至90天的时间大。如果已经大于90天,则将从DF1(DF1)附加到DF2中。

我的逻辑正在附加所有不确定我在做错什么,这样:

import pandas as pd
from datetime import datetime, timedelta
df2.append(df2.loc[df2.EmailAddress.isin(df1.EmailAddress)&(df2.DateTimeCreated.ge(datetime.today() - timedelta(90)))])

我在日期搞砸了什么错?

编辑:

在上面的示例中,在数据范围之间,DF1的行不会附加BC DateTimecreateweatewepting()至90天。

请参阅内联注释以获取说明。请注意,您需要重命名DF1列以匹配此解决方案中的DF2列。

import pandas as pd
import datetime
from datetime import timedelta, datetime
df1 = pd.DataFrame({'EmailAddress':['2@2.com'], 'DateTimeCreated':[datetime(2019,2,12,20,47,0)]})
df2 = pd.DataFrame({'EmailAddress':['1@1.com', '2@2.com', '3@3.com'], 
    'DateTimeCreated':[
        datetime(2019,2,7,20,47,0), 
        datetime(2018,11,13,20,47,0), 
        datetime(2019,11,4,20,47,0)]})
# Get all expired rows
df3 = df2.loc[datetime.now() - df2['DateTimeCreated'] > timedelta(days=90)]
# Update it with the timestamp from df1
df3 = df3.set_index('EmailAddress').join(df1.set_index('EmailAddress'), how='inner', rsuffix='_r')
df3.drop('DateTimeCreated', axis=1, inplace=True)
df3.columns = ['DateTimeCreated']
# Patch df2 with the latest timestamp
df2 = df3.combine_first(df2.set_index('EmailAddress')).reset_index()
# Patch again for rows in df1 that are not in df2
df1 = df1.loc[df1['EmailAddress'].apply(lambda x: 1 if x not in df2['EmailAddress'].tolist() else 0) == 1]
df2 = pd.concat([df2, df1])
>>>df2
  EmailAddress     DateTimeCreated
0      1@1.com 2019-02-07 20:47:00
1      2@2.com 2019-02-12 20:47:00
2      3@3.com 2019-11-04 20:47:00

尝试1.左加入DF1和DF2,符合条件1的电子邮件地址相同 combined_df = df1。2.计算了DF1 DateTimecreated和今天之间的差距 gap = pd.datetime.today() - combined_df.datetimecreated_df1

  1. 返回索引差距> 90mask = combined_df.gap> 90
  2. df2.append(df1 [mask])

注意:我认为您可能仅需要组合的_DF,第四步的附加应导致复制或混乱的数据。无论如何,您可以选择步骤1,2,3,4或仅使用的步骤1,2,3

最新更新