我在Python Pandas中有数据帧,如下所示:
col1
-----------
60100412345
70111243335
我尝试创建列";年龄;基于";col1";因为:
- 前两个数字是年份
- 接下来的两个值是月
- netxt两个值为day
因此,60100412345是年=1960,月=10,日=04
我用下面的代码来计算年龄:
today_date= pd.Timestamp(year=2021, month=6, day=30)
df["AGE"] = (today_date - pd.to_datetime(df.col1.str[:6], format = '%y%m%d')) / np.timedelta64(1, 'Y')
df["AGE"] = df.AGE.astype("int")
但我有一个错误,比如:ValueError:unconverted data remains: 28
如何修复此错误?或者,你对如何根据col1中的值计算年龄有其他想法吗?
下次提供这样的狙击手,以帮助您更快地创建数据
df = pd.DataFrame(
columns=['col1'],
data = [['60100412345'],['70111243335']]
)
以下功能基于:https://github.com/arthurdejong/python-stdnum/blob/master/stdnum/pl/pesel.py
def get_birth_date(number):
year = int(number[0:2])
month = int(number[2:4])
day = int(number[4:6])
year += {
0: 1900,
1: 2000,
2: 2100,
3: 2200,
4: 1800,
}[month // 20]
month = month % 20
return pd.Timestamp(year, month, day)
today_date= pd.Timestamp(year=2021, month=6, day=30)
df["AGE"] = (today_date - df.col1.apply(lambda x: get_birth_date(x))) / np.timedelta64(1, 'Y')
df["AGE"] = df.AGE.astype("int")
df
col1 AGE
0 60100412345 60
1 70111243335 50