将查询从SQL移动到SparkSQL



我正在尝试将SQL查询迁移到SparkSQL查询。我的主要问题是string_split函数,我试图用explodesplit组合来代替它。仍然给我一些错误。。

原始查询如下所示:

select *
from t1
where exists (
select *
from t2
where exists (
select *
from string_split(t2.concat_id, ',')
where t1.id = value
) and exists (
select *
from string_split(t2.concat_name, ',')
where t1.name = value
)
)

我把它移到了这样的地方:

SELECT *
FROM t1
WHERE exists ( 
SELECT * FROM t2 
WHERE exists ( SELECT * FROM (SELECT explode(split(t2.concat_id,',')) as value)
WHERE t1.id = value
) AND exists (
SELECT * FROM (SELECT explode(split(t2.concat_name,',')) as value)
WHERE t1.name = value
)
)

我目前的错误是:

AnalysisException:无法解析给定输入列的"t1.id":[__auto_generated_subquery_name.value];

表1示例:

| ID   | NAME   |
| RO1  | Arnold |
| RO2  | Ed     |
| RO3  | Sal    |
| RO4  | Teus   |
| RO11 | Haus  |

表2示例:

| Concat_ID  | Concat_Name |
| RO3,RO4    | Teus,Port   |
| RO10, RO15 | Rar,Tar     |
| RO2, RO6   | Ed, Kev     |

您可以使用exists函数直接检查数组中的值,而无需将其分解:

SELECT *
FROM t1
WHERE exists ( 
SELECT * FROM t2 
WHERE exists(split(t2.concat_id, ','), x -> x = t1.id) 
AND exists(split(t2.concat_name, ','), x -> x = t1.name)
)

另一种方法是,如果您分解拆分的字符串,则使用LATERAL VIEW

SELECT *
FROM t1
WHERE exists ( 
SELECT * FROM t2 
LATERAL VIEW EXPLODE(split(t2.concat_id, ',')) AS c_id
LATERAL VIEW EXPLODE(split(t2.concat_name,',')) AS c_name
WHERE  t1.id = c_id AND t1.name = c_name
)

创建数据:

create table t2 (ids string, names string);
insert into t2 values ("john,smith", "1,2");
insert into t2 values ("jane,doe", "3,4");
create table t1 (id string, name string);
insert into t1 values ("john", "1");

将列分解为行的笛卡尔乘积,并连接t1。(横向视图有助于将列更改为行,这是这里的神奇之处(

with t_two as 
(
select * from t2 
LATERAL VIEW explode(split(t2.ids,',')) as my_ids 
LATERAL VIEW explode(split(t2.names,',')) as my_names
)
select t1.* 
from 
t1 join t_two 
on 
t_two.my_names = t1.name 
and 
t_two.my_ids = t1.id ;

最新更新