我在pandas数据帧中有一个mm/dd/yyyy格式的日期列,我想将其转换为yyyymmdd格式的整数。这可能使用日期时间吗?
您可以直接对字符串进行篡改并强制转换为integer(更高效(,也可以转换为datetime,格式化为字符串并强制转换成integer(可能更方便/可读(。
例如:
import pandas as pd
# dummy data:
df = pd.DataFrame({'date': ['10/11/2012', '11/12/2013']}) # mm/dd/yyyy
# working with strings...
tmp = df['date'].str.split('/')
df['date_int'] = (tmp.str[2]+tmp.str[0]+tmp.str[1]).astype(int)
# working with datetime...
df['date_int'] = pd.to_datetime(df['date']).dt.strftime('%Y%m%d').astype(int)
两种情况下的输出:
0 20121011
1 20131112
Name: date_int, dtype: int32
您可以将年*1000+月*1000+日相加:
df = pd.DataFrame({'dt': pd.date_range('2021-01-01', '2021-01-05')})
df['dt_int'] = df['dt'].dt.year * 1000 + df['dt'].dt.month * 100 + df['dt'].dt.day
df
输出:
dt dt_int
0 2021-01-01 2021101
1 2021-01-02 2021102
2 2021-01-03 2021103
3 2021-01-04 2021104
4 2021-01-05 2021105
更新
如果列值是具有已知格式mm/dd/yyyy
的字符串,则可以连接子字符串并转换为integer:
df = pd.DataFrame({'dt': ['10/11/2012', '11/12/2013']})
df['dt_int'] = (
df['dt'].str[6:] +
df['dt'].str[0:2] +
df['dt'].str[3:5]).astype(int)
与带有split
:的版本相比,这可以节省一些时间
%%time
df['dt_int'] = (
df['dt'].str[6:] +
df['dt'].str[3:5] +
df['dt'].str[0:2]).astype(int)
CPU times: user 7.08 ms, sys: 646 µs, total: 7.72 ms
Wall time: 6.95 ms
%%time
tmp = df['dt'].str.split('/')
df['dt_int'] = (tmp.str[2]+tmp.str[0]+tmp.str[1]).astype(int)
CPU times: user 16.5 ms, sys: 0 ns, total: 16.5 ms
Wall time: 15.5 ms