尝试在pandas DataFrame上应用函数来计算分数



我创建了下面给出的用户定义函数,并尝试应用于DataFrame,但出现了错误:-"TypeError:(quot;scoreq((缺少3个必需的位置参数:"ADVTG_TRGT_INC"、"AGECD"one_answers"PPXPRI","发生在索引ADNTG_MARITAL_STAT’处(";

def scoreq(PCT_NO_OPEN_TRDLN, ADVTG_TRGT_INC, AGECD, PPXPRI):
        scoreq += -0.3657
        scoreq += (ADVNTG_MARITAL_STAT in ('2'))*-0.039
        scoreq += (ADVTG_TRGT_INC in ('7','6','5','4'))*0.1311
        scoreq += (AGECD in ('7','2'))*-0.1254
        scoreq += (PPXPRI in (-1))*-0.1786
        return scoreq
        
df_3Var['scoreq'] = df_3Var.apply(scoreq)
"TypeError: ("scoreq() missing 3 required positional arguments: 'ADVTG_TRGT_INC', 'AGECD', and 'PPXPRI'", 'occurred at index ADVNTG_MARITAL_STAT')"
 

df_3Var:- 
    ADVNTG_MARITAL_STAT   ADVTG_TRGT_INC    AGECD   PPXPRI
0                     1                5        6       -1
1                     2                6        5       -1
2                     1                2        2       -1
3                     2                7        6      133
4                     2                1        3       75

"apply"调用的函数应该接受一行或一列。以下是一个有效的实现。

注意,您还应该:

  • 初始化scoreq
  • 将值视为数字而非字符串
  • 对列表使用"in",而不是元组
    def scoreq(row):
        scoreq = 0 # you need to initialize this variable. 
        scoreq += -0.3657
        scoreq += (row["ADVNTG_MARITAL_STAT"] == 2)*-0.039
        scoreq += (row["ADVTG_TRGT_INC"] in [7,6,5,4])*0.1311
        scoreq += (row["AGECD"] in [7,2])*-0.1254
        scoreq += (row["PPXPRI"] == -1)*-0.1786
        return scoreq
            
    df_3Var['scoreq'] = df_3Var.apply(scoreq, axis=1)

您在scoreq函数中使用列名作为参数,但它不是这样工作的。它应该接收定期的参数。

您有两个选项:将整行发送到scoreq,或仅发送相关值:

def scoreq(row):
        scoreq = row["...."]
        ...
        return scoreq
df_3Var['scoreq'] = df_3Var.apply(scoreq)

或者只直接发送值:

df_3Var['scoreq'] = df_3Var.apply(lambda row: scoreq(row["..."], row["..."]))

此外,您可能希望将scoreq函数中的数字处理为数字而不是字符串:例如scoreq += (row["PPXPRI"]==(-1))*-0.1786而不是in

最新更新