Filter Array in Hive



Apache hive表的列定义如下:

myvars:array<struct<index:bigint,value:string>>

对应数据的示例如下:

"myvars":[
  {"index":2,"value":"value1"}
  , {"index":1,"value":"value2"}
  , {"index":2,"value":"value3"}
]

这个数组如何被筛选到所有"index"==2的元素

在JavaScript中,我将做如下操作:
myvars.filter(function(d){return d.index==2;})

如何实现相同的结果与Apache Hive QL,最好没有横向视图?

hive中有一组Collection函数:

 Collection
    array_contains(Array<T> a, val)
    array<K.V> map_keys(Map<K.V> a)
    array<K.V> map_values(Map<K.V> a)
    size(Map<K.V>|Array<T> a)
    sort_array(Array<T> a)

在查询中使用

...
WHERE
array_contains(myvars,2) 

我想如果你想提取索引为2的所有值,你想要这样的东西:

SELECT DISTINCT value
FROM mytable 
LATERAL VIEW EXPLODE(myvars) exploded_myvars AS idx, value
WHERE idx = 2;

如果数据类型为array<map<string,string>>,则为

SELECT DISTINCT mv["value"]
FROM mytable 
LATERAL VIEW EXPLODE(myvars) exploded_myvars AS mv
WHERE mv["index"] = 2;

最新更新