dataframe df01如下:
scala> df01.show
+--------------------+----+-----+
| session_id| 材质|count|
+--------------------+----+-----+
| 360098626|120| 金属| 2|
|866693025201992-0...| 布艺| 2|
| 648401717|33| 其它| 1|
|b2df486d906403886...| ABS| 1|
|14962864822301789...| 金属| 2|
| 960455526|12| 金属| 1|
|14886198008411946...| PVC| 1|
|860410037295987-6...| 金属| 1|
|c267e7e20c6742e6d...| ABS| 1|
|862788039750580-1...| ABS| 2|
|85995192767403132...| ABS| 1|
|862681034959357-2...| ABS| 1|
|52f4754fe212caf9d...| 其它| 1|
| 51289594708875916|6|null| 1|
| 741995028|24|null| 1|
| 2099986503|5| 金属| 1|
|14965600686729437...|null| 1|
|15098023912712771...| ABS| 2|
|a28fe88a99e3983c6...| 金属| 2|
| 703270023|2|null| 1|
+--------------------+----+-----+
only showing top 20 rows
scala> df01.schema
res58: org.apache.spark.sql.types.StructType = StructType(StructField(session_id,StringType,true), StructField(材质,StringType,true), StructField(count,LongType,false))
我想做的是列column == null count是1。代码如下:
val e = "材质"
类型1:attr!= null
val df02 = df01.map{x=>
val session_id = x(0).toString()
val attr = x(1).toString()
var cnt = 1
if(attr!=null){cnt = x(2).toString().toInt}
(session_id,attr,cnt)
}.toDF("session_id",e,"cnt")
类型2:attr!=" null"
val df02 = df01.map{x=>
val session_id = x(0).toString()
val attr = x(1).toString()
var cnt = 1
if(attr!="null"){cnt = x(2).toString().toInt}
(session_id,attr,cnt)
}.toDF("session_id",e,"cnt")
类型3:x(1)!= null
val df02 = df01.map{x=>
val session_id = x(0).toString()
val attr = x(1).toString()
var cnt = 1
if(x(1)!=null){cnt = x(2).toString().toInt}
(session_id,attr,cnt)
}.toDF("session_id",e,"cnt")
类型4:x(1)!=" null"
val df02 = df01.map{x=>
val session_id = x(0).toString()
val attr = x(1).toString()
var cnt = 1
if(x(1)!="null"){cnt = x(2).toString().toInt}
(session_id,attr,cnt)
}.toDF("session_id",e,"cnt")
上面的所有类型都是错误" 由:java.lang.nullpointerexception "。如何使它正确?
@ psidom 评论是正确的:
df01.withColumn("count", when(col(e).isNull, 1).otherwise(col("count")))
当"材质"列具有null值时,x(1).toString中将有nullPointException。
我认为@psidom评论中的答案是正确的。