我有一个数据框架:
ID 2000-01 2000-02 2000-03 2001-01 2001-02 val
1 2847 2861 2875 2890 2904 94717
2 1338 1343 1348 1353 1358 70105
3 3301 3311 3321 3331 3341 60307
4 1425 1422 1419 1416 1413 79888
我想在表中添加一个新行,表示当前年份与去年的差值,例如:"2001-01"——"2000 - 01 -">
输出:
ID 2000-01 2000-02 2000-03 2001-01 2001-02 val
1 2847 2861 2875 2890 2904 94717
2 1338 1343 1348 1353 1358 70105
3 3301 3311 3321 3331 3341 60307
4 1425 1422 1419 1416 1413 79888
5 NaN NaN NaN -9 -9 NaN
如何在不硬编码列标题的情况下选择上一年的列名?
下面的代码将满足您的要求。"if"条件可以修改,以便它可以检测包含年份的更好的列。目前,它只会在我们分手后检查"-"。如果结果长度等于"2">
import pandas as pd
import math
df=pd.DataFrame({"ID" :[ 1,2,3,4],
"2000-01":[2847,1338,3301,1425 ],
"2000-02":[2861,1343,3311,1422 ],
"2000-03":[2875,1348,3321,1419 ],
"2001-01":[2890,1353,3331,1416 ],
"2001-02":[2904,1358,3341,1413 ],
"val" :[94717,70105,60307,79888 ]})
#setting index
df=df.set_index("ID")
#creating a dictionary that will serve so pick what is the previous year
ly_dict={}
#making a list of the columns
mylist=df.columns.copy()
#two lists for internal storage
myempty_list=[]
usable_cols=[]
for item in mylist:
#getting the year
ha=item.split("-")
if (len(ha) == 2 ):
ly=str(int(ha[0])-1)+"-"+ha[1]
myempty_list.append(ly)
usable_cols.append(item)
#filling the last year dictionary
ly_dict[item]=ly
combined_list=list(set(list(mylist)+myempty_list))
df=df.reindex(columns=combined_list)
last_row_id=df.shape[0]+1
df.loc[last_row_id] = [math.nan for item in range(df.shape[1])]
for item in usable_cols:
try:
df.loc[last_row_id,item]=df.loc[last_row_id-1,item]-df.loc[last_row_id-1,ly_dict[item]]
except:
pass
df=df.reindex(columns=mylist)