我想在我的pandas数据框中计算从出生日期开始的年龄。但是,列中有一些日期是NaN
,由于格式不同,导致我出现了一些错误。这是我的代码:
dob = {'DOB': ['11/29/1986', 'NaN', '02/23/2006']}
# Creating dataframe
df33 = pd.DataFrame(data = dob)
# This function converts given date to age
def age(born):
born = datetime.strptime(born, "%m/%d/%Y").date()
today = date.today()
return today.year - born.year - ((today.month,
today.day) < (born.month,
born.day))
df33['Age'] = df33['DOB'].apply(age)
display(df33)
我可以知道我应该如何修改代码,以便它可以忽略NaN
值并继续计算其他行的年龄吗?这些有NaN
的行可以留下NaN
。任何帮助或建议将非常感激!
您可以通过在age
函数中添加异常处理来修改age函数以保持行不变。
import pandas as pd
from datetime import datetime, date
# added pd.NaT to posted data
dob = {'DOB': ['11/29/1986', 'NaN', pd.NaT, '02/23/2006']}
# Creating dataframe
df33 = pd.DataFrame(data = dob)
def age(born):
try:
born = datetime.strptime(born, "%m/%d/%Y").date()
today = date.today()
return today.year - born.year - ((today.month,
today.day) < (born.month,
born.day))
except (ValueError, TypeError):
return born # leave unchanged
dob = {'DOB': ['11/29/1986', 'NaN', '02/23/2006']}
df33['Age'] = df33['DOB'].apply(age)
display(df33)
输出
DOB Age
0 11/29/1986 35
1 NaN NaN
2 NaT NaT
3 02/23/2006 16
这可以使用relativedelta
完成,而无需使用单独的函数。
安装模块
pip install python-dateutil
import pandas as pd
from datetime import datetime
from dateutil.relativedelta import relativedelta
import numpy as np
dob = {'DOB': ['11/29/1986', np.nan, '02/23/2006']}
# Creating dataframe
df33 = pd.DataFrame(data=dob)
df33["DOB"] = pd.to_datetime(df33["DOB"])
df33["Age"] = df33.apply(lambda x: relativedelta(datetime.now().date(), x['DOB']).years if x.notnull().all() else pd.NaT, axis=1)
print(df33)
DOB Age
0 1986-11-29 35.0
1 NaT NaT
2 2006-02-23 16.0
备注:下面的答案只是为了解决你的问题。我建议使用像relativedelta
这样的库来正确计算年龄。
"NaN"并不是真的麻木。nan,您应该将dob
字典修改为:
dob = {'DOB': ['11/29/1986', pd.NaT, '02/23/2006']}
对于日期时间类型,最好使用pandas的NaT值来表示不是时间值。
那么你可以用它来转换成pd。日期和时间,然后做其他事情。
但是不修改字典dob
的快速修复是:
在开头包含这个检查:
if born == 'NaN':
return 'NaN'