pyspark-Column不可使用withColumn进行迭代



为什么使用pyspark时会出现列不可迭代错误?

cost_allocation_df = cost_allocation_df.withColumn(
'resource_tags_user_engagement',          
f.when(
(f.col('line_item_usage_account_id') == '123456789101', '1098765432101') &
(f.col('resource_tags_user_engagement') == '' ) |
(f.col('resource_tags_user_engagement').isNull()) |
(f.col('resource_tags_user_engagement').rlike('^[a-zA-Z]')),
'10546656565').otherwise(f.col('resource_tags_user_engagement'))
)

从列到value进行直接比较是不可行的。您必须使用lit()制作该value的列

尝试将您的代码转换为:

cost_allocation_df = cost_allocation_df.withColumn('resource_tags_user_engagement',          
f.when(
((f.col('line_item_usage_account_id') == f.lit('123456789101')) | 
(f.col('line_item_usage_account_id') == f.lit('1098765432101'))) & 
(f.col('resource_tags_user_engagement') == f.lit('') ) |
(f.col('resource_tags_user_engagement').isNull()) |
(f.col('resource_tags_user_engagement').rlike('^[a-zA-Z]')), '10546656565'
).otherwise(f.col('resource_tags_user_engagement')))

最新更新