我需要找到每个种族object_of_search
的最常见值。我怎样才能做到这一点呢?不允许SELECT
子句中的子查询和相关子查询。类似这样:
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
但这不会聚合,并且为每个种族和object_of_search提供了许多行:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
ethnicity3 | 2 | 100 | Firearms
ethnicity1 | 5 | 60 | Cat
ethnicity1 | 5 | 60 | Dog
ethnicity2 | 3 | 66.6666666666667 | Firearms
ethnicity1 | 5 | 60 | Psychoactive substances
ethnicity1 | 5 | 60 | Fireworks
应该是这样的:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
Table on fiddle。
查询:
SELECT DISTINCT
stopAndSearches.officer_defined_ethnicity,
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;
表:
CREATE TABLE IF NOT EXISTS stopAndSearches(
"sas_id" bigserial PRIMARY KEY,
"officer_defined_ethnicity" VARCHAR(255),
"object_of_search" VARCHAR(255),
"outcome" VARCHAR(255)
);
更新:Fiddle
这应该针对特定的"哪个对象符合种族"。问题。
注意,这并没有处理计数中的关系。这不是问题/请求的一部分。
调整SQL以包含此逻辑,以提供详细信息:
WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;
结果:
officer_defined_ethnicity | object_of_search | n | rn | ethnicity1 | 猫 | 1 | 1 |
---|---|---|---|
ethnicity2 | 赃物 | 2 | 1 |
ethnicity3 | 烟花 | 1 | 1 |
SELECT DISTINCT ON (1)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY 1, 2
ORDER BY 1, 3 DESC, 2;
或者更明确地:
SELECT DISTINCT ON (officer_defined_ethnicity)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY officer_defined_ethnicity, object_of_search
ORDER BY officer_defined_ethnicity, ct DESC, object_of_search;
officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
ethnicity1 | Cat | 1
ethnicity2 | Stolen goods | 2
ethnicity3 | Firearms | 1
db<此处小提琴>此处小提琴>
SinceDISTINCT ON
在GROUP BY
之后应用,我们只需要一个查询级别。
- 汇总
(officer_defined_ethnicity, object_of_search)
与GROUP BY
的计数。 - 选择每个
officer_defined_ethnicity
与DISTINCT ON
的最高计数行。
我将object_of_search
添加为第三个ORDER BY
项,作为平局断路器并产生确定性结果:
在平局的情况下,根据字母排序顺序选择第一个object_of_search
。
适应您的需求。
:
- 选择每个GROUP BY组的第一行?
- 在应用LIMIT之前获得结果计数的最佳方法
比row_number()
:
- 选择每个GROUP BY组的第一行?/ul>