我在两列中有以下嵌套数据:
Categories_A Categories_B
{"A"} {"B","F","C"}
{"B","A"} {"Z","B","F"}
{"B","F"} {"A","E","R"}
我想在一个新的列中返回categories_B中与categories_a相比缺失的一个或多个类别-categories_a中的内容不在categories-B中。理想情况下:
Categories_A Categories_B Missing_Category
{"A"} {"B","F","C"} {"A"}
{"B","A"} {"Z","B","F"} {"A"}
{"B","F"} {"A","E","R"} {"B","F"}
到目前为止,如果列之间完全匹配,我设法返回:
select Categories_A, Categories_B,
case
when Categories_A = Categories_B then 'TRUE'
else 'FALSE'
end is_a_match
from facts_themes
Categories_A Categories_B is_a_match
{"A"} {"B","F","C"} FALSE
{"B","A"} {"Z","B","F"} FALSE
{"B","F"} {"A","E","R"} FALSE
没有内置的函数,但很容易编写自己的:
create function array_except(p_one anyarray, p_two anyarray)
returns anyarray
as
$$
select array_agg(e)
from (
select e
from unnest(p_one) as p1(e)
except
select e
from unnest(p_two) as p2(e)
) x
$$
language sql
immutable
;
然后你可以这样使用它:
select categories_a, categories_b,
array_except(categories_a, categories_b) as missing_categories
from facts_themes
在线示例
虽然不能涉及重复,但使用更快的EXCEPT ALL
:
SELECT *, ARRAY(SELECT unnest(categories_a) EXCEPT ALL
SELECT unnest(categories_b)) AS missing
FROM facts_themes;
db<gt;小提琴这里
如果可能涉及到重复,您首先必须定义所需的行为。关于EXCEPT ALL
:
- 在PostgreSQL中使用EXCEPT子句
还要注意,EXCEPT
将NULL视为另一个值(而大多数数组运算符则不这样做(。