如何将世纪值(即1900年和2000年)添加到熊猫列中的十年0



这应该很简单:我在pandas数据帧中有一列。该列的值为91至99(1991年至1999年(,本世纪年份的值为00至17。

我现在使用这个很长的代码将1900和2000分别添加到上个世纪和本世纪的值中。

df['year2'] = df.year
df.loc[df.year>20, 'year2']=df.loc[df.year>20, 'year']+1900
df.loc[df.year<20, 'year2']=df.loc[df.year<20, 'year']+2000
df['year']=df['year2']
df.drop(columns=['year2']

我相信这可以更有效地完成。

使用numpy.where:

df = pd.DataFrame({
'year':[91,99,1,15,17,93],
'A':[7,8,9,4,2,3],
})
df['year1'] = np.where(df['year']>20, df['year']+1900, df['year']+2000)
print (df)
year  A  year1
0    91  7   1991
1    99  8   1999
2     1  9   2001
3    15  4   2015
4    17  2   2017
5    93  3   1993

如果字符串列:

y = df['year'].astype(int)
df['year1'] = np.where(y>20, y+1900, y+2000)

性能

np.random.seed(123)
N = 1000
df = pd.DataFrame({
'year':np.random.randint(1, 99, size=N),
})

In [55]: %timeit df['year1'] = np.where(df['year']>20, df['year']+1900, df['year']+2000)
615 µs ± 79.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [58]: %timeit df['year2'] = pd.to_datetime(df['year'].astype(str).str.zfill(2), format='%y').dt.year
3.49 ms ± 31.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

字符串列的性能

N = 1000
df = pd.DataFrame({
'year':np.random.randint(1, 99, size=N),
})
df['year'] = df['year'].astype(str).str.zfill(2)
print (df.head())
year
0   36
1   55
2   39
3   05
4   55

In [80]: %%timeit
...: y = df['year'].astype(int)
...: df['year1'] = np.where(y>20, y+1900, y+2000)
...: 
761 µs ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [81]: %%timeit
...: df['year2'] = pd.to_datetime(df['year'], format='%y').dt.year
...: 
2.33 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

pandas.to_datetime将处理此问题。

import pandas as pd
import datetime as dt
df = pd.DataFrame({'year':['91', '95', '05', '99', '13', '17']})
df['year2'] = pd.to_datetime(df['year'], format='%y').dt.year
print(df['year2'])

输出:

0    1991
1    1995
2    2005
3    1999
4    2013
5    2017

最新更新