如何在scala中使用x!= null



dataframe df01如下:

scala> df01.show
+--------------------+----+-----+
|          session_id|  材质|count|
+--------------------+----+-----+
|       360098626|120|  金属|    2|
|866693025201992-0...|  布艺|    2|
|        648401717|33|  其它|    1|
|b2df486d906403886...| ABS|    1|
|14962864822301789...|  金属|    2|
|        960455526|12|  金属|    1|
|14886198008411946...| PVC|    1|
|860410037295987-6...|  金属|    1|
|c267e7e20c6742e6d...| ABS|    1|
|862788039750580-1...| ABS|    2|
|85995192767403132...| ABS|    1|
|862681034959357-2...| ABS|    1|
|52f4754fe212caf9d...|  其它|    1|
| 51289594708875916|6|null|    1|
|        741995028|24|null|    1|
|        2099986503|5|  金属|    1|
|14965600686729437...|null|    1|
|15098023912712771...| ABS|    2|
|a28fe88a99e3983c6...|  金属|    2|
|         703270023|2|null|    1|
+--------------------+----+-----+
only showing top 20 rows
scala> df01.schema
res58: org.apache.spark.sql.types.StructType = StructType(StructField(session_id,StringType,true), StructField(材质,StringType,true), StructField(count,LongType,false))

我想做的是列column == null count是1。代码如下:

val e = "材质"

类型1:attr!= null

 val df02 = df01.map{x=>
        val session_id = x(0).toString()
        val attr = x(1).toString()
        var cnt = 1
        if(attr!=null){cnt = x(2).toString().toInt}        
        (session_id,attr,cnt)
       }.toDF("session_id",e,"cnt")

类型2:attr!=" null"

val df02 = df01.map{x=>
    val session_id = x(0).toString()
    val attr = x(1).toString()
    var cnt = 1
    if(attr!="null"){cnt = x(2).toString().toInt}        
    (session_id,attr,cnt)
   }.toDF("session_id",e,"cnt")  

类型3:x(1)!= null

val df02 = df01.map{x=>
    val session_id = x(0).toString()
    val attr = x(1).toString()
    var cnt = 1
    if(x(1)!=null){cnt = x(2).toString().toInt}        
    (session_id,attr,cnt)
   }.toDF("session_id",e,"cnt")

类型4:x(1)!=" null"

val df02 = df01.map{x=>
    val session_id = x(0).toString()
    val attr = x(1).toString()
    var cnt = 1
    if(x(1)!="null"){cnt = x(2).toString().toInt}        
    (session_id,attr,cnt)
   }.toDF("session_id",e,"cnt")

上面的所有类型都是错误" 由:java.lang.nullpointerexception "。如何使它正确?

@ psidom 评论是正确的:

df01.withColumn("count", when(col(e).isNull, 1).otherwise(col("count")))

当"材质"列具有null值时,x(1).toString中将有nullPointException。

我认为@psidom评论中的答案是正确的。

相关内容

  • 没有找到相关文章

最新更新