PostgreSQL:在比较嵌套数组时识别缺少的值



我在两列中有以下嵌套数据:

Categories_A    Categories_B
{"A"}           {"B","F","C"}
{"B","A"}       {"Z","B","F"}
{"B","F"}       {"A","E","R"}

我想在一个新的列中返回categories_B中与categories_a相比缺失的一个或多个类别-categories_a中的内容不在categories-B中。理想情况下:

Categories_A    Categories_B      Missing_Category
{"A"}           {"B","F","C"}     {"A"}
{"B","A"}       {"Z","B","F"}     {"A"}
{"B","F"}       {"A","E","R"}     {"B","F"}

到目前为止,如果列之间完全匹配,我设法返回:

select Categories_A, Categories_B,
case
when Categories_A = Categories_B then 'TRUE'
else 'FALSE'
end is_a_match
from facts_themes
Categories_A    Categories_B      is_a_match
{"A"}           {"B","F","C"}     FALSE
{"B","A"}       {"Z","B","F"}     FALSE
{"B","F"}       {"A","E","R"}     FALSE

没有内置的函数,但很容易编写自己的:

create function array_except(p_one anyarray, p_two anyarray)
returns anyarray
as
$$
select array_agg(e)
from (
select e
from unnest(p_one) as p1(e)
except
select e
from unnest(p_two) as p2(e)
) x
$$
language sql
immutable
;

然后你可以这样使用它:

select categories_a, categories_b, 
array_except(categories_a, categories_b) as missing_categories
from facts_themes

在线示例

虽然不能涉及重复,但使用更快的EXCEPT ALL:

SELECT *, ARRAY(SELECT unnest(categories_a) EXCEPT ALL
SELECT unnest(categories_b)) AS missing
FROM   facts_themes;

db<gt;小提琴这里

如果可能涉及到重复,您首先必须定义所需的行为。关于EXCEPT ALL:

  • 在PostgreSQL中使用EXCEPT子句

还要注意,EXCEPT将NULL视为另一个值(而大多数数组运算符则不这样做(。

最新更新