我有下面的数据框,我希望创建新的变量"profit_ loss";以及";margin";基于收入&预算
revenue budget
0 1513528810 150000000
1 378436354 150000000
2 295238201 110000000
3 2068178225 200000000
4 1506249360 190000000
我试图用pandasassign((方法创建新的变量,但我在下面遇到了错误。
d.assign(profit_loss = (d['revenue'] - d['budget']),
profit_loss_margin = (d['profit_loss'] * 100 / d['revenue']),
financial_status = d['profit_loss'].apply(lambda num: 'Profit-Making' if num > 0 else 'Loss-
Making'))
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2895 return self._engine.get_loc(casted_key) 2896 except KeyError as err:
-> 2897 raise KeyError(key) from err 2898 2899 if tolerance is not None:
KeyError: 'profit_loss'
不过,下面的代码运行得很好。
d.assign(profit_loss = (d['revenue'] - d['budget']))
请告知我在以前的代码中犯了什么错误吗?
您需要lambda
来处理新创建的列,如这里的profit_loss
:
df = d.assign(profit_loss = (d['revenue'] - d['budget']),
profit_loss_margin = lambda x: (x['profit_loss'] * 100 / x['revenue']),
financial_status = lambda x: x['profit_loss'].apply(lambda num: 'Profit-Making' if num > 0 else 'Loss- Making'))
print (df)
revenue budget profit_loss profit_loss_margin financial_status
0 1513528810 150000000 1363528810 90.089386 Profit-Making
1 378436354 150000000 228436354 60.363216 Profit-Making
2 295238201 110000000 185238201 62.741949 Profit-Making
3 2068178225 200000000 1868178225 90.329654 Profit-Making
4 1506249360 190000000 1316249360 87.385887 Profit-Making
您正在分配df变量"profit_loss",并试图在同一调用中使用它来分配新变量。Python将在调用函数之前解析所有参数。因此,当它试图在第二个和第三个参数中解析d['profit_loss']
时,它还不存在,因为assign
还没有被调用。尝试
d.assign(profit_loss = (d['revenue'] - d['budget']))
d.assign(profit_loss_margin = (d['profit_loss'] * 100 / d['revenue']),
financial_status = d['profit_loss'].apply(lambda num: 'Profit-Making' if num > 0 else 'Loss-Making'))