尝试在pandas DataFrame上应用函数来计算分数

我创建了下面给出的用户定义函数，并尝试应用于DataFrame，但出现了错误：-"TypeError:(quot；scoreq((缺少3个必需的位置参数："ADVTG_TRGT_INC"、"AGECD"one_answers"PPXPRI"，"发生在索引ADNTG_MARITAL_STAT’处("；

def scoreq(PCT_NO_OPEN_TRDLN, ADVTG_TRGT_INC, AGECD, PPXPRI):
        scoreq += -0.3657
        scoreq += (ADVNTG_MARITAL_STAT in ('2'))*-0.039
        scoreq += (ADVTG_TRGT_INC in ('7','6','5','4'))*0.1311
        scoreq += (AGECD in ('7','2'))*-0.1254
        scoreq += (PPXPRI in (-1))*-0.1786
        return scoreq
        
df_3Var['scoreq'] = df_3Var.apply(scoreq)
"TypeError: ("scoreq() missing 3 required positional arguments: 'ADVTG_TRGT_INC', 'AGECD', and 'PPXPRI'", 'occurred at index ADVNTG_MARITAL_STAT')"
 

df_3Var:- 
    ADVNTG_MARITAL_STAT   ADVTG_TRGT_INC    AGECD   PPXPRI
0                     1                5        6       -1
1                     2                6        5       -1
2                     1                2        2       -1
3                     2                7        6      133
4                     2                1        3       75

"apply"调用的函数应该接受一行或一列。以下是一个有效的实现。

注意，您还应该：

初始化scoreq
将值视为数字而非字符串
对列表使用"in"，而不是元组

    def scoreq(row):
        scoreq = 0 # you need to initialize this variable. 
        scoreq += -0.3657
        scoreq += (row["ADVNTG_MARITAL_STAT"] == 2)*-0.039
        scoreq += (row["ADVTG_TRGT_INC"] in [7,6,5,4])*0.1311
        scoreq += (row["AGECD"] in [7,2])*-0.1254
        scoreq += (row["PPXPRI"] == -1)*-0.1786
        return scoreq
            
    df_3Var['scoreq'] = df_3Var.apply(scoreq, axis=1)

您在scoreq函数中使用列名作为参数，但它不是这样工作的。它应该接收定期的参数。

您有两个选项：将整行发送到scoreq，或仅发送相关值：

def scoreq(row):
        scoreq = row["...."]
        ...
        return scoreq
df_3Var['scoreq'] = df_3Var.apply(scoreq)

或者只直接发送值：

df_3Var['scoreq'] = df_3Var.apply(lambda row: scoreq(row["..."], row["..."]))

此外，您可能希望将scoreq函数中的数字处理为数字而不是字符串：例如scoreq += (row["PPXPRI"]==(-1))*-0.1786而不是in。

相关内容

最新更新

热门标签：