如何在 Spark 中将 hive.serialization.extend.nesting.levels 属性设置为

我正在尝试将json转换为数据帧，创建临时表并执行一些查询。但是，我得到了org.apache.hadoop.hive.serde2.SerDeException，因为json有超过7个嵌套级别。我尝试将该属性设置为 true hiveContext.sql("hive.serialization.extend.nesting.levels","true")但仍然遇到同样的问题。我正在使用火花 1.6.1 版本。任何解决此问题的帮助都将有所帮助。

添加日志
ERROR log: error in initSerDe: org.apache.hadoop.hive.serde2.SerDeException Number of levels of nesting supported for LazySimpleSerde is 7 Unable to work with level 9. Use hive.serialization.extend.nesting.levels serde property for tables using LazySimpleSerde. org.apache.hadoop.hive.serde2.SerDeException: Number of levels of nesting supported for LazySimpleSerde is 7 Unable to work with level 9. Use hive.serialization.extend.nesting.levels serde property for tables using LazySimpleSerde.

谢谢

如果外部表定义如下：

create external table t1
(
 a int,
 b double,
 c array<struct<
          k1:struct<
                     p1:struct<
                              r1:struct<
                                        h1:struct<
                                                  s1:array<struct<
                                                                  j1:struct<
                                                                            x1:int
                                                                           >
                                                        >>
                                              >
                                     >
                            >
                    >
         >>
 )
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES ( "mapping.time_stamp" = "timestamp" ) 
 LOCATION '/user/user1/staging/data/populationdata'
  ;

假设数据包含超过 7 的嵌套级别。

然后在下一步中，将表展平为，

 create table t1
 ROW FORMAT SERDE   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ( 'hive.serialization.extend.nesting.levels'='true' )
 as
 select
   a, 
   b, 
   c1.k1
 from 
   t1
 lateral view explode(c) subview as c1
 ;

相关内容

最新更新

热门标签：