在值位于另一个DataFrame的列中的位置进行筛选



我正试图在DataFrame中创建一个新列,如果另一列的值在另一个DataFrame的列中,则该列将为"true"。我尝试过以下操作,但我认为isin()的语法是错误的,因为我传递的是一个带有单列的DataFrame。

客户:

customer_id     name
1     John
2     Mary
3     Jane
4     Jack
5     Emma

customer_referred_customer:

from    to
1     3
2     4

结果:

customer_id     name    is_referral
1     John          false
2     Mary          false
3     Jane           true
4     Jack           true
5     Emma          false

尝试:

customers.withColumn(
"is_referral",
F.when(
F.col("customer_id").isin(
customer_referred_customer.select("to")
),
F.lit("true"),
).otherwise(F.lit("false")),
)

我该怎么解决这个问题?

我会这样做:

customers.join(
customer_referred_customer,
customers.customer_id ==customer_referred_customer.to,
"left")
.withColumn("is_referral",
f.when(customer_referred_customer["to"].isNull(),f.lit("false"))
.otherwise(f.lit("true"))
.select(customers["customer_id"],customers["name"], "is_referral")

使用半联接和反联接。你没有提供数据,所以我不能测试,但代码的想法是:

customers = customers.join(
customer_referred_customer, 
customers.customer_id == customer_referred_customer.to, 
'left_semi'
).withColumn(
'is_referral', 
F.lit('true')
).unionAll(
customers.join(
customer_referred_customer, 
customers.customer_id == customer_referred_customer.to, 
'left_anti'
).withColumn(
'is_referral', 
F.lit('false')
)
)

创建检查列的列表并使用.isi((

df.withColumn('is_referral', df.customer_id.isin(df1.select("to").rdd.flatMap(list).collect())).show()

+-----------+----+-----------+
|customer_id|name|is_referral|
+-----------+----+-----------+
|          1|John|      false|
|          2|Mary|      false|
|          3|Jane|       true|
|          4|Jack|       true|
|          5|Emma|      false|
+-----------+----+-----------+

使用full outer联接&则使用CCD_ 4 导出新列CCD_

检查以下代码。

customers
.join(customer_referred_customer,customers.customer_id == customer_referred_customer.to,"full")
.withColumn("is_referral",col("to").isNotNull())
.select("customer_id","name","is_referral")
.orderBy(col("customer_id").asc())
.show(false)
+-----------+----+-----------+
|customer_id|name|is_referral|
+-----------+----+-----------+
|1          |John|false      |
|2          |Mary|false      |
|3          |Jane|true       |
|4          |Jack|true       |
|5          |Emma|false      |
+-----------+----+-----------+

相关内容

  • 没有找到相关文章

最新更新