我有一个包含客户生日的数据集,我希望将该变量转换为年份,四舍五入到小数点后两三位。我想好了如何将整个列转换为时间戳。
一个问题是,我不知道这些数据有多旧,但它是在2019年4月4日发布到一个网站上的,所以我用这一天作为"今天"来计算时间增量。
当我试着减去这两个日期时,差值是以天为单位的。
以下是我和TIA的帮助:
数据以年月日格式的DOB开始,即:30-12-1993
## Making sure all observations are in same format
training_df['DATE_OF_BIRTH'] = pd.to_datetime(training_df['DATE_OF_BIRTH'])
## Checking format of an individual DOB
training_df['DATE_OF_BIRTH'][0]
Out[121]:
Timestamp('1984-01-01 00:00:00')
## Setting "today" as 4-4-2019
data_time_reference=datetime(2019, 4, 4)
data_time_reference
Out[155]:
datetime.datetime(2019, 4, 4, 0, 0)
## Subtracting
data_time_reference - training_df['DATE_OF_BIRTH'][0]
输出为
Timedelta('12877 days 00:00:00')
当我需要它是35.26(12877除以365.25(
数据在Kaggle.com上:https://www.kaggle.com/avikpaul4u/vehicle-loan-default-prediction
考虑以下数据帧:
DATE_OF_BIRTH
0 01-01-1984
1 30-12-1993
2 02-12-1997
3 04-07-1963
4 14-04-2000
#Convert the values in dates column to datetime object
df['DATE_OF_BIRTH'] = pd.to_datetime(df['DATE_OF_BIRTH'])
#Set the reference date to subtract
data_time_reference= datetime(2019, 4, 4)
#Get the no of days (integer) after subtracting from reference date
df['days_int'] = pd.to_numeric((data_time_reference - df['DATE_OF_BIRTH']).dt.days, downcast='integer')
print(df)
现在,它看起来是这样的:
DATE_OF_BIRTH days_int
0 1984-01-01 12877
1 1993-12-30 9226
2 1997-02-12 8086
3 1963-04-07 20451
4 2000-04-14 6929
然后,将days_int列除以365.25
,并四舍五入到小数点后2位。
df['result'] = (df['days_int']/365.25).round(2)
最终输出:
DATE_OF_BIRTH days_int result
0 1984-01-01 12877 35.26
1 1993-12-30 9226 25.26
2 1997-02-12 8086 22.14
3 1963-04-07 20451 55.99
4 2000-04-14 6929 18.97