我创建了下面给出的用户定义函数,并尝试应用于DataFrame,但出现了错误:-"TypeError:(quot;scoreq((缺少3个必需的位置参数:"ADVTG_TRGT_INC"、"AGECD"one_answers"PPXPRI","发生在索引ADNTG_MARITAL_STAT’处(";
def scoreq(PCT_NO_OPEN_TRDLN, ADVTG_TRGT_INC, AGECD, PPXPRI):
scoreq += -0.3657
scoreq += (ADVNTG_MARITAL_STAT in ('2'))*-0.039
scoreq += (ADVTG_TRGT_INC in ('7','6','5','4'))*0.1311
scoreq += (AGECD in ('7','2'))*-0.1254
scoreq += (PPXPRI in (-1))*-0.1786
return scoreq
df_3Var['scoreq'] = df_3Var.apply(scoreq)
"TypeError: ("scoreq() missing 3 required positional arguments: 'ADVTG_TRGT_INC', 'AGECD', and 'PPXPRI'", 'occurred at index ADVNTG_MARITAL_STAT')"
df_3Var:-
ADVNTG_MARITAL_STAT ADVTG_TRGT_INC AGECD PPXPRI
0 1 5 6 -1
1 2 6 5 -1
2 1 2 2 -1
3 2 7 6 133
4 2 1 3 75
"apply"调用的函数应该接受一行或一列。以下是一个有效的实现。
注意,您还应该:
- 初始化
scoreq
- 将值视为数字而非字符串
- 对列表使用"in",而不是元组
def scoreq(row):
scoreq = 0 # you need to initialize this variable.
scoreq += -0.3657
scoreq += (row["ADVNTG_MARITAL_STAT"] == 2)*-0.039
scoreq += (row["ADVTG_TRGT_INC"] in [7,6,5,4])*0.1311
scoreq += (row["AGECD"] in [7,2])*-0.1254
scoreq += (row["PPXPRI"] == -1)*-0.1786
return scoreq
df_3Var['scoreq'] = df_3Var.apply(scoreq, axis=1)
您在scoreq
函数中使用列名作为参数,但它不是这样工作的。它应该接收定期的参数。
您有两个选项:将整行发送到scoreq
,或仅发送相关值:
def scoreq(row):
scoreq = row["...."]
...
return scoreq
df_3Var['scoreq'] = df_3Var.apply(scoreq)
或者只直接发送值:
df_3Var['scoreq'] = df_3Var.apply(lambda row: scoreq(row["..."], row["..."]))
此外,您可能希望将scoreq
函数中的数字处理为数字而不是字符串:例如scoreq += (row["PPXPRI"]==(-1))*-0.1786
而不是in
。