查找表列中每个组最频繁的值



我需要找到每个种族object_of_search的最常见值。我怎样才能做到这一点呢?不允许SELECT子句中的子查询和相关子查询。类似这样:

mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"

但这不会聚合,并且为每个种族和object_of_search提供了许多行:

officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
ethnicity3                |                 2 |              100 | Fireworks
ethnicity1                |                 5 |               60 | Firearms
ethnicity3                |                 2 |              100 | Firearms
ethnicity1                |                 5 |               60 | Cat
ethnicity1                |                 5 |               60 | Dog
ethnicity2                |                 3 | 66.6666666666667 | Firearms
ethnicity1                |                 5 |               60 | Psychoactive substances
ethnicity1                |                 5 |               60 | Fireworks

应该是这样的:

officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
ethnicity3                |                 2 |              100 | Fireworks
ethnicity1                |                 5 |               60 | Firearms

Table on fiddle。


查询:
SELECT DISTINCT
stopAndSearches.officer_defined_ethnicity,
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;

表:

CREATE TABLE IF NOT EXISTS stopAndSearches(
"sas_id" bigserial PRIMARY KEY,
"officer_defined_ethnicity" VARCHAR(255),
"object_of_search" VARCHAR(255),
"outcome" VARCHAR(255)
);

更新:Fiddle

这应该针对特定的"哪个对象符合种族"。问题。

注意,这并没有处理计数中的关系。这不是问题/请求的一部分。

调整SQL以包含此逻辑,以提供详细信息:

WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;

结果:

tbody> <<tr>
officer_defined_ethnicityobject_of_searchnrn
ethnicity111
ethnicity2赃物21
ethnicity3烟花11
SELECT DISTINCT ON (1)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY 1, 2
ORDER  BY 1, 3 DESC, 2;

或者更明确地:

SELECT DISTINCT ON (officer_defined_ethnicity)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY officer_defined_ethnicity, object_of_search
ORDER  BY officer_defined_ethnicity, ct DESC, object_of_search;
officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
ethnicity1                | Cat              | 1
ethnicity2                | Stolen goods     | 2
ethnicity3                | Firearms         | 1

db<此处小提琴>

SinceDISTINCT ONGROUP BY之后应用,我们只需要一个查询级别。

  1. 汇总(officer_defined_ethnicity, object_of_search)GROUP BY的计数。
  2. 选择每个officer_defined_ethnicityDISTINCT ON的最高计数行。

我将object_of_search添加为第三个ORDER BY项,作为平局断路器并产生确定性结果:
在平局的情况下,根据字母排序顺序选择第一个object_of_search
适应您的需求。

:

  • 选择每个GROUP BY组的第一行?
  • 在应用LIMIT之前获得结果计数的最佳方法

row_number():

更简单,通常更快
  • 选择每个GROUP BY组的第一行?/ul>