在使用map创建新列时处理PerformanceWarning



完整错误:

PerformanceWarning: DataFrame高度碎片化。这通常是多次调用frame.insert的结果,其性能较差。考虑使用pd。concat代替。要获得非碎片帧,请使用newframe = frame.copy()支出排名[x] = [x] . map (prizes.set_index("排名")("支付").to_dict ()),

lineups = range(1, 5)
prizes = {'Rank':[1, 2, 3], 'Payout':[100, 50, 25]}
prizes = pd.DataFrame(prizes)
payouts = pd.DataFrame(lineups, columns=['Lineup'])
ranking = {'Lineup':[1, 2, 3, 4], 1:[1, 2 , 3, 4], 2:[2, 1, 4, 3], 3:[4, 1, 2, 3], 4:[1, 3, 4, 2]}
ranking = pd.DataFrame(ranking)
for x in range(1, 4):
payouts[x] = ranking[x].map(prizes.set_index('Rank')['Payout'].to_dict())
payouts = payouts.fillna(-20)

代替循环,我们可以创建mapper然后applymapranking然后concat的每个列,payouts:

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat(
[payouts,
ranking[range(1, 5)].apply(lambda s: s.map(mapper)).fillna(-20)],
axis=1
)

也可以是replacemask,其中值超过了最大奖励Rank:

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat(
[payouts,
ranking[range(1, 5)].replace(mapper)
.mask(ranking.gt(prizes['Rank'].max()), -20)],
axis=1
)

都产生payouts:

Lineup    1    2    3    4
0       1  100   50  -20  100
1       2   50  100  100   25
2       3   25  -20   50  -20
3       4  -20   25   25   50

*注意,在这个例子中,rank包含了在不初始化payouts的情况下构建DataFrame的必要信息:

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = ranking.copy()  # Create copy of ranking
cols = list(range(1, 5))
payouts[cols] = payouts[cols].apply(lambda s: s.map(mapper)).fillna(-20)

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = ranking.copy()  # Create copy of ranking
cols = list(range(1, 5))
payouts[cols] = (
payouts[cols].replace(mapper).mask(ranking.gt(prizes['Rank'].max()), -20)
)

DataFrame and imports:

import pandas as pd
prizes = pd.DataFrame({'Rank': [1, 2, 3], 'Payout': [100, 50, 25]})
payouts = pd.DataFrame({'Lineup': range(1, 5)})
ranking = pd.DataFrame({
'Lineup': [1, 2, 3, 4],
1: [1, 2, 3, 4],
2: [2, 1, 4, 3],
3: [4, 1, 2, 3],
4: [1, 3, 4, 2]
})

最新更新