当为我的DataFrame分配新列时,我得到了这个错误这是我的代码
def check_header(header, df):
print("Header : ",header)
for item in header:
if not item in df.columns:
df = df.assign(item) #here I'm getting error
return df[header]
我已经检查了这个帖子,但不适合我,因为我的熊猫版本是满意的
>>> import pandas as pd
>>> pd.__version__
'1.1.5'
我的代码有什么问题,请帮助我。
如果需要在header
列表中添加新列,将不匹配的值转换为NaN
填充的新列,请使用DataFrame.reindex
:
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
def check_header(header, df):
return df.reindex(header, axis=1)
a = ['test','test1','test3']
print (check_header(a, df))
test test1 test3
0 mkt1 NaN NaN
1 mkt2 NaN NaN
2 mkt3 NaN NaN
如果需要在新列中使用相同的值,请使用fill_value
参数:
def check_header(header, df):
return df.reindex(header, axis=1, fill_value=0)
a = ['test','test1','test3']
print (check_header(a, df))
test test1 test3
0 mkt1 0 0
1 mkt2 0 0
2 mkt3 0 0
如果每个新列需要不同的值,则使用DataFrame.assign
和字典来处理新列的名称,如keys::
def check_header(header, df):
diff = np.setdiff1d(header, df.columns)
d = dict(zip(diff, diff))
print (d)
{'test1': 'test1', 'test3': 'test3'}
return df.assign(**d).reindex(header, axis=1)
a = ['test','test1','test3']
print (check_header(a, df))
test test1 test3
0 mkt1 test1 test3
1 mkt2 test1 test3
2 mkt3 test1 test3