以查找某个值是否已存在于分区(SQL)中



所以我有以下数据:

term    period  id
7       201910  A2C5
7       201911  A2C5
7       201912  A2C5
7       202001  A2C5
7       202002  A2C5
7       202003  A2C5
7       202004  A2C5

我必须找到每学期每学期新id的百分比。因此基本上在上表中,当id A2C5第一次出现在第7项的时段201910中时,该行应该显示100%。其他行的其余部分应该显示0%,因为该id以前已经存在。

这个解决方案有点复杂,但它应该能让您找到需要的地方。它首先为每个ID找到最早的术语和周期,然后将术语&周期乘以同一术语中的所有ID的计数&时期

-- Find the earliest instance of each ID
WITH earliest_instances AS (
SELECT DISTINCT ON (id) * 
FROM your_table
ORDER BY id, term, period  -- you may need to adjust this order
), 
-- For each term and period, count how many IDs showed up there first
earliest_counts AS (
SELECT term, period, count(*) 
FROM earliest_instances 
GROUP BY term, period
), 
-- For each term and period, count how many IDs there are in total
total_counts AS (
SELECT term, period, count(*) 
FROM your_table
GROUP BY term, period
)
-- Put those together to make a percentage of new IDs
SELECT 
total_counts.term,
total_counts.period, 
100 * COALESCE(earliest_counts.count, 0) / total_counts.count AS new_ids_percentage 
FROM total_counts 
LEFT JOIN earliest_counts ON (
total_counts.term = earliest_counts.term 
AND total_counts.period = earliest_counts.period
)
ORDER BY term, period;
term | period | new_ids_percentage
------+--------+--------------------
7    | 201910 |                100
7    | 201911 |                  0
7    | 201912 |                  0
7    | 202001 |                  0
7    | 202002 |                  0
7    | 202003 |                  0
7    | 202004 |                  0
(7 rows)

最新更新