Spark数据框架StructField中空值的重要性



nullable的意义是什么?

case class StructField(
name: String,
dataType: DataType,
nullable: Boolean = true,
metadata: Metadata = Metadata.empty) {

从文档,

StructField(name, dataType, nullable):表示a中的一个字段StructType。字段的名称由名称表示。的数据类型字段由dataType表示。Nullable用于指示是否该字段的值可以为空值。

仅用于适应症吗?因为我看不出它是强制非空值(或者我错过了什么?)

计划:

val cols = "firstName:String:false,middlename:String:true,lastName:String:false,zipCode:String:false,sex:String:false,salary:Int:true"
def inferType(field: String): StructField = {
val splits = field.split(":")
val colName = splits(0)
val nullable = splits(2).toBoolean
val dataType = splits(1).toUpperCase() match {
case "INT" => IntegerType
case "DOUBLE" => DoubleType
case "STRING" => StringType
case _ => StringType
}
StructField(colName, dataType, nullable)
}
val schema: StructType = StructType(cols
.split(",")
.map(col => inferType(col)))
val simpleData = Seq(
Row("Soumya","","Kole","36636","M",-1),
Row("Foo","Bar","","","",9000)
)
val rdd = spark.sparkContext.parallelize(simpleData)
val df = spark.createDataFrame(rdd, schema)
df.printSchema()
df.show()
输出:

root
|-- firstName: string (nullable = false)
|-- middlename: string (nullable = true)
|-- lastName: string (nullable = false)
|-- zipCode: string (nullable = false)
|-- sex: string (nullable = false)
|-- salary: integer (nullable = true)
+---------+----------+--------+-------+---+------+
|firstName|middlename|lastName|zipCode|sex|salary|
+---------+----------+--------+-------+---+------+
|   Soumya|          |    Kole|  36636|  M|    -1|
|      Foo|       Bar|        |       |   |  9000|
+---------+----------+--------+-------+---+------+

空格是empty strings,不是NULLs,它们是不同的

最新更新