查找表列中每个组最频繁的值

我需要找到每个种族object_of_search的最常见值。我怎样才能做到这一点呢?不允许SELECT子句中的子查询和相关子查询。类似这样:

mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"

但这不会聚合，并且为每个种族和object_of_search提供了许多行:

officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
ethnicity3                |                 2 |              100 | Fireworks
ethnicity1                |                 5 |               60 | Firearms
ethnicity3                |                 2 |              100 | Firearms
ethnicity1                |                 5 |               60 | Cat
ethnicity1                |                 5 |               60 | Dog
ethnicity2                |                 3 | 66.6666666666667 | Firearms
ethnicity1                |                 5 |               60 | Psychoactive substances
ethnicity1                |                 5 |               60 | Fireworks

应该是这样的:

officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
ethnicity3                |                 2 |              100 | Fireworks
ethnicity1                |                 5 |               60 | Firearms

Table on fiddle。

查询:

SELECT DISTINCT
stopAndSearches.officer_defined_ethnicity,
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;

表:

CREATE TABLE IF NOT EXISTS stopAndSearches(
"sas_id" bigserial PRIMARY KEY,
"officer_defined_ethnicity" VARCHAR(255),
"object_of_search" VARCHAR(255),
"outcome" VARCHAR(255)
);

更新:Fiddle

这应该针对特定的"哪个对象符合种族"。问题。

注意，这并没有处理计数中的关系。这不是问题/请求的一部分。

调整SQL以包含此逻辑，以提供详细信息:

WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;

结果:

tbody> <<tr>

officer_defined_ethnicity	object_of_search	n	rn
ethnicity1	猫	1	1
ethnicity2	赃物	2	1
ethnicity3	烟花	1	1

SELECT DISTINCT ON (1)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY 1, 2
ORDER  BY 1, 3 DESC, 2;

或者更明确地:

SELECT DISTINCT ON (officer_defined_ethnicity)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY officer_defined_ethnicity, object_of_search
ORDER  BY officer_defined_ethnicity, ct DESC, object_of_search;

officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
ethnicity1                | Cat              | 1
ethnicity2                | Stolen goods     | 2
ethnicity3                | Firearms         | 1

db<此处小提琴>

SinceDISTINCT ON在GROUP BY之后应用，我们只需要一个查询级别。

汇总(officer_defined_ethnicity, object_of_search)与GROUP BY的计数。
选择每个officer_defined_ethnicity与DISTINCT ON的最高计数行。

我将object_of_search添加为第三个ORDER BY项，作为平局断路器并产生确定性结果:
在平局的情况下，根据字母排序顺序选择第一个object_of_search。
适应您的需求。

选择每个GROUP BY组的第一行?
在应用LIMIT之前获得结果计数的最佳方法

比row_number():

更简单，通常更快

选择每个GROUP BY组的第一行?/ul>

相关内容

最新更新

热门标签：