在Spark SQL中查询具有映射数据类型的配置单元表时出错.但是在HiveQL中执行时工作



我有如下结构的配置单元表

+---------------+--------------+----------------------+
| column_value  | metric_name  |         key          |
+---------------+--------------+----------------------+
| A37B          | Mean         | {0:"202006",1:"1"}  |
| ACCOUNT_ID    | Mean         | {0:"202006",1:"2"}  |
| ANB_200       | Mean         | {0:"202006",1:"3"}  |
| ANB_201       | Mean         | {0:"202006",1:"4"}  |
| AS82_RE       | Mean         | {0:"202006",1:"5"}  |
| ATTR001       | Mean         | {0:"202007",1:"2"}  |
| ATTR001_RE    | Mean         | {0:"202007",1:"3"}  |
| ATTR002       | Mean         | {0:"202007",1:"4"}  |
| ATTR002_RE    | Mean         | {0:"202007",1:"5"}  |
| ATTR003       | Mean         | {0:"202008",1:"3"}  |
| ATTR004       | Mean         | {0:"202008",1:"4"}  |
| ATTR005       | Mean         | {0:"202008",1:"5"}  |
| ATTR006       | Mean         | {0:"202009",1:"4"}  |
| ATTR006       | Mean         | {0:"202009",1:"5"}  |

我需要编写一个spark-sql查询,以基于具有NOT IN条件的Key列进行过滤,同时混合两个键。

以下查询在Beeline 中的HiveQL中运行良好

select * from your_data where key[0] between  '202006' and '202009' and key NOT IN ( map(0,"202009",1,"5") );

但是当我在Spark SQL中尝试相同的查询时。我收到错误

由于数据类型不匹配,

无法解析:map<int,字符串>网址:org.apache.spark.sqlcatalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42(网址:org.apache.spark.sqlcatalyst.analysis.CheckAnalysis$$anonfun$CheckAnalysis$1$$anonfon$apply$3.applyOrElse(CheckAnalysis.scala:115(网址:org.apache.spark.sqlcatalyst.analysis.CheckAnalysis$$anonfun$CheckAnalysis$1$$anonfon$apply$3.applyOrElse(CheckAnalysis.scala:107(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278(网址:org.apache.spark.sqlcatalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70(网址:org.apache.spark.sqlcatalyst.trees.TreeNode.transformUp(TreeNode.scala:277(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326(网址:org.apache.spark.sqlcatalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187(网址:org.apache.spark.sqlcatalyst.trees.TreeNode.mapChildren(TreeNode.scala:324(网址:org.apache.spark.sqlcatalyst.trees.TreeNode.transformUp(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326(

请帮忙!

我从之前提出的不同问题中得到了答案。此查询运行良好

select * from your_data where key[0] between 202006 and 202009 and NOT (key[0]="202009" and key[1]="5" );

最新更新