我有一个看起来像这样的数据集:
NEW QUARTERLY ESTIMATES - After rebasing GDP (seasonally adjusted)
2009 1314844
Unnamed: 3 1326084
Unnamed: 4 1348808
Unnamed: 5 1371285
2010 1414539
Unnamed: 7 1438482
Unnamed: 8 1449582
Unnamed: 9 1490081
2011 1501464
Unnamed: 11 1512220
Unnamed: 12 1527277
Unnamed: 13 1548587
我想用年份来命名未命名的行例如,未命名:3到未命名:5应该是2009等等,我该怎么做
我唯一能想到的就是在pandas上创建一个字典并使用rename方法。有没有一个可扩展的方法来处理这个问题?
try this:
df['your_col_name'] = pd.to_numeric(
df['your_col_name'], errors='coerce').ffill(downcast='int')
使用Series.str.contains
测试4位数值:
m = df['NEW QUARTERLY ESTIMATES - After rebasing'].str.contains('d{4}')
或通过~
测试是否包含Unnamed
与反掩码:
m = ~df['NEW QUARTERLY ESTIMATES - After rebasing'].str.contains('Unnamed')
然后传递给Series.where
,向前填充缺失值:
df['NEW QUARTERLY ESTIMATES - After rebasing'] = df['NEW QUARTERLY ESTIMATES - After rebasing'].where(m).ffill()
print (df)
NEW QUARTERLY ESTIMATES - After rebasing GDP (seasonally adjusted)
0 2009 1314844
1 2009 1326084
2 2009 1348808
3 2009 1371285
4 2010 1414539
5 2010 1438482
6 2010 1449582
7 2010 1490081
8 2011 1501464
9 2011 1512220
10 2011 1527277
11 2011 1548587