我有如下结构的配置单元表
+---------------+--------------+----------------------+
| column_value | metric_name | key |
+---------------+--------------+----------------------+
| A37B | Mean | {0:"202006",1:"1"} |
| ACCOUNT_ID | Mean | {0:"202006",1:"2"} |
| ANB_200 | Mean | {0:"202006",1:"3"} |
| ANB_201 | Mean | {0:"202006",1:"4"} |
| AS82_RE | Mean | {0:"202006",1:"5"} |
| ATTR001 | Mean | {0:"202007",1:"2"} |
| ATTR001_RE | Mean | {0:"202007",1:"3"} |
| ATTR002 | Mean | {0:"202007",1:"4"} |
| ATTR002_RE | Mean | {0:"202007",1:"5"} |
| ATTR003 | Mean | {0:"202008",1:"3"} |
| ATTR004 | Mean | {0:"202008",1:"4"} |
| ATTR005 | Mean | {0:"202008",1:"5"} |
| ATTR006 | Mean | {0:"202009",1:"4"} |
| ATTR006 | Mean | {0:"202009",1:"5"} |
我需要编写一个spark-sql查询,以基于具有NOT IN条件的Key列进行过滤,同时混合两个键。
以下查询在Beeline 中的HiveQL中运行良好
select * from your_data where key[0] between '202006' and '202009' and key NOT IN ( map(0,"202009",1,"5") );
但是当我在Spark SQL中尝试相同的查询时。我收到错误
由于数据类型不匹配,无法解析:map<int,字符串>网址:org.apache.spark.sqlcatalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42(网址:org.apache.spark.sqlcatalyst.analysis.CheckAnalysis$$anonfun$CheckAnalysis$1$$anonfon$apply$3.applyOrElse(CheckAnalysis.scala:115(网址:org.apache.spark.sqlcatalyst.analysis.CheckAnalysis$$anonfun$CheckAnalysis$1$$anonfon$apply$3.applyOrElse(CheckAnalysis.scala:107(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278(网址:org.apache.spark.sqlcatalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70(网址:org.apache.spark.sqlcatalyst.trees.TreeNode.transformUp(TreeNode.scala:277(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326(网址:org.apache.spark.sqlcatalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187(网址:org.apache.spark.sqlcatalyst.trees.TreeNode.mapChildren(TreeNode.scala:324(网址:org.apache.spark.sqlcatalyst.trees.TreeNode.transformUp(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275(网址:org.apache.spark.sqlcatalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326(
请帮忙!
我从之前提出的不同问题中得到了答案。此查询运行良好
select * from your_data where key[0] between 202006 and 202009 and NOT (key[0]="202009" and key[1]="5" );