如何使用包含"$"的列名进行查询?



在Spark SQL中,我可以使用

val spark = SparkSession
      .builder()
      .appName("SparkSessionZipsExample")
      .master("local")
      .config("spark.sql.warehouse.dir", "warehouseLocation-value")
      .getOrCreate()
val df = spark.read.json("source/myRecords.json")
df.createOrReplaceTempView("shipment")
val sqlDF = spark.sql("SELECT * FROM shipment")

从" myrecords.json"中获取数据,此JSON文件的结构是:

df.printSchema()
root
 |-- _id: struct (nullable = true)
 |    |-- $oid: string (nullable = true)
 |-- container: struct (nullable = true)
 |    |-- barcode: string (nullable = true)
 |    |-- code: string (nullable = true)

我可以获取此JSON的特定列,例如:

val sqlDF = spark.sql("SELECT container.barcode, container.code FROM shipment")

但是如何从此JSON文件中获取ID。$ oid?我尝试过"SELECT id.$oid FROM shipment_log""SELECT id.$oid FROM shipment_log",但根本不起作用。错误消息:

 error: invalid escape character

任何人都可以告诉我如何获得id.$oid

Backticks是您的朋友:

spark.read.json(sc.parallelize(Seq(
  """{"_id": {"$oid": "foo"}}""")
)).createOrReplaceTempView("df")
spark.sql("SELECT _id.`$oid` FROM df").show
+----+
|$oid|
+----+
| foo|
+----+

DataFrame API相同:

spark.table("df").select($"_id".getItem("$oid")).show
+--------+
|_id.$oid|
+--------+
|     foo|
+--------+

spark.table("df").select($"_id.$$oid")
+--------+
|_id.$oid|
+--------+
|     foo|
+--------+

相关内容

  • 没有找到相关文章

最新更新