局部变量返回显示错误消息



我认为这不是一个新问题,但我认为显示此消息很奇怪 - 局部变量df_ret赋值前引用 - 这是我重新计算不平衡数据集的函数:

def down_sample(df, target, positive_label, negative_label):
positives = df.filter(df[target] == positive_label)
negatives = df.filter(df[target] == negative_label)
num_positives = positives.count()
num_negatives = negatives.count() 
if (num_positives > num_negatives): # down_sample positives
sampled_df = positives.sample(withReplacement=False, 
fraction=num_negatives/num_positives, 
seed=SEED)
df_ret = sampled_df.union(negatives)
return df_ret

错误消息"局部变量df_ret赋值前引用"在这里非常准确 - 函数运行并且num_positives > num_negativesif条件不为真,因此if块中的代码从未运行过,因此从未分配df_ret变量(从未声明和初始化(。

您可以使用几种模式来解决此问题,具体取决于此函数的客户端的期望:

  • 如果不满足if条件,则在函数中引发异常,则让调用方catch异常

  • if块之前初始化df_ret变量,以便函数在不满足if条件时返回默认值

来自 gladiesgoodluck 的好答案,我还要添加一个快速修复,即进一步缩进return命令,使其仅在满足if条件时执行。 您的代码将变为:

def down_sample(df, target, positive_label, negative_label):
positives = df.filter(df[target] == positive_label)
negatives = df.filter(df[target] == negative_label)
num_positives = positives.count()
num_negatives = negatives.count() 
if (num_positives > num_negatives): # down_sample positives
sampled_df = positives.sample(withReplacement=False, 
fraction=num_negatives/num_positives, 
seed=SEED)
df_ret = sampled_df.union(negatives)
return df_ret
return something_else  # OPTIONAL

最新更新