我正在尝试编写两个for循环,将返回不同输入的分数,并使用新分数创建一个新字段。第一个循环可以正常工作,但第二个循环永远不会返回正确的分数。
import pandas as pd
d = {'a':['foo','bar'], 'b':[1,3]}
df = pd.DataFrame(d)
score1 = df.loc[df['a'] == 'foo']
score2 = df.loc[df['a'] == 'bar']
for i in score1['b']:
if i < 3:
score1['c'] = 0
elif i <= 3 and i < 4:
score1['c'] = 1
elif i >= 4 and i < 5:
score1['c'] = 2
elif i >= 5 and i < 8:
score1['c'] = 3
elif i == 8:
score1['c'] = 4
for j in score2['b']:
if j < 2:
score2['c'] = 0
elif j <= 2 and i < 4:
score2['c'] = 1
elif j >= 4 and i < 6:
score2['c'] = 2
elif j >= 6 and i < 8:
score2['c'] = 3
elif j == 8:
score2['c'] = 4
print(score1)
print(score2)
当我运行脚本时,它返回以下内容:
print(score1)
a b c
0 foo 1 0
print(score2)
a b
1 bar 3
为什么score2不创建新字段"c"还是分数?
避免使用for
循环有条件地更新非Python列表的DataFrame列。使用Pandas和Numpy的矢量化方法,如numpy.select
它可以扩展到数百万行!请记住,这些数据科学工具的计算方式与一般使用的Python有很大不同:
# LIST OF BOOLEAN CONDITIONS
conds = [
score1['b'].lt(3), # EQUIVALENT TO < 3
score1['b'].between(3, 4, inclusive="left"), # EQUIVALENT TO >= 3 or < 4
score1['b'].between(4, 5, inclusive="left"), # EQUIVALENT TO >= 4 or < 5
score1['b'].between(5, 8, inclusive="left"), # EQUIVALENT TO >= 5 or < 8
score1['b'].eq(8) # EQUIVALENT TO == 8
]
# LIST OF VALUES
vals = [0, 1, 2, 3, 4]
# VECTORIZED ASSIGNMENT
score1['c'] = numpy.select(conds, vals, default=numpy.nan)
# LIST OF BOOLEAN CONDITIONS
conds = [
score2['b'].lt(2),
score2['b'].between(2, 4, inclusive="left"),
score2['b'].between(4, 6, inclusive="left"),
score2['b'].between(6, 8, inclusive="left"),
score2['b'].eq(8)
]
# LIST OF VALUES
vals = [0, 1, 2, 3, 4]
# VECTORIZED ASSIGNMENT
score2['c'] = numpy.select(conds, vals, default=numpy.nan)
在第二个for
循环的第一次迭代中,j
将在3中。使你的条件都不满足。
for j in score2['b']:
if j < 3:
score2['c'] = 0
elif j <= 3 and i < 5:
score2['c'] = 1
elif j >= 5 and i < 7:
score2['c'] = 2
elif j >= 7 and i < 9:
score2['c'] = 3
elif j == 9:
score2['c'] = 4