In[df['create_date'] = pd.to_datetime(df.create_date)
df['second_date'] = pd.to_datetime(df.second_date)
df
study_name indicator create_date second_date
0 science A 2022-02-25 2022-01-01
1 science [A/C] 2022-02-25 2022-04-10
2 math C 2022-03-02 2022-01-01
3 math B/C 2022-03-02 2022-04-10
4 entertainment [E 2021-09-01 NaT
5 technology D 2022-01-03 2022-01-01
6 technology A/B/C 2022-01-03 2022-02-20
7 technology A, B] 2022-02-20 2022-04-10
8 social C 2021-10-25 NaT
当create_date字段的值(日期)介于from_date和to_date之间以及当它不是,但发生错误时,我写了一个if语句。ValueError: Series的真值是不明确的。使用a.empty a.bool (), a.item (), a.any()或所有()。
from_date = '2022-01-01'
to_date = '2022-04-10'
def create_date_in_period(x):
if (x >= from_date) & (x <= to_date):
interval = abs(df['second_date'] - df['create_date']).dt.days
return interval.div(99)
else:
return np.nan
df['INTERVAL'] = df.apply(create_date_in_period(df['create_date']), axis=1)
df
study_name indicator create_date second_date INVERVAL
0 science A 2022-02-25 2022-01-01 0.55
1 science [A/C] 2022-02-25 2022-04-10 0.44
2 math C 2022-03-02 2022-01-01 0.6
3 math B/C 2022-03-02 2022-04-10 0.4
4 entertainment [E 2021-09-01 NaT NaT
5 technology D 2022-01-03 2022-01-01 0.02
6 technology A/B/C 2022-01-03 2022-02-20 0.48
7 technology A, B] 2022-02-20 2022-04-10 0.49
8 social C 2021-10-25 NaT NaT
我用谷歌搜索解决这个错误,我找到的解决方案是"&"或"|";而不是and and or据说可以通过写下
来解决可以使用
df['INTERVAL2'] = np.where(df['create_date'].between(from_date, to_date), df['second_date'].sub(df['create_date']).dt.days.div(99), np.nan)
修改代码,
- 您应该将行而不是列传递给
DataFrame.apply
from_date
和to_date
应该是日期时间类型
from_date = pd.to_datetime('2022-01-01')
to_date = pd.to_datetime('2022-04-10')
def create_date_in_period(row):
x = row['create_date']
if (x >= from_date) & (x <= to_date):
interval = abs(row['second_date'] - row['create_date']).days
return interval / 99
else:
return np.nan
df['INTERVAL'] = df.apply(lambda row: create_date_in_period(row), axis=1)
# or
df['INTERVAL'] = df.apply(create_date_in_period, axis=1)
print(df)
study_name indicator create_date second_date INTERVAL
0 science A 2022-02-25 2022-01-01 0.555556
1 science [A/C] 2022-02-25 2022-04-10 0.444444
2 math C 2022-03-02 2022-01-01 0.606061
3 math B/C 2022-03-02 2022-04-10 0.393939
4 entertainment [E 2021-09-01 NaT NaN
5 technology D 2022-01-03 2022-01-01 0.020202
6 technology A/B/C 2022-01-03 2022-02-20 0.484848
7 technology A, B] 2022-02-20 2022-04-10 0.494949
8 social C 2021-10-25 NaT NaN
不要使用loop/apply对于这种任务,这是低效的。
使用矢量代码,借助between
:
from_date = '2022-01-01'
to_date = '2022-04-10'
df['INTERVAL'] = (df['create_date']
.sub(df['second_date'])
.abs()
.dt.days
.div(99)
.where(df['create_date'].between(from_date, to_date))
)
print(df)
输出:
study_name indicator create_date second_date INTERVAL
0 science A 2022-02-25 2022-01-01 0.555556
1 science [A/C] 2022-02-25 2022-04-10 0.444444
2 math C 2022-03-02 2022-01-01 0.606061
3 math B/C 2022-03-02 2022-04-10 0.393939
4 entertainment [E 2021-09-01 NaT NaN
5 technology D 2022-01-03 2022-01-01 0.020202
6 technology A/B/C 2022-01-03 2022-02-20 0.484848
7 technology A, B] 2022-02-20 2022-04-10 0.494949
8 social C 2021-10-25 NaT NaN