所以我有以下数据:
term period id
7 201910 A2C5
7 201911 A2C5
7 201912 A2C5
7 202001 A2C5
7 202002 A2C5
7 202003 A2C5
7 202004 A2C5
我必须找到每学期每学期新id的百分比。因此基本上在上表中,当id A2C5第一次出现在第7项的时段201910中时,该行应该显示100%。其他行的其余部分应该显示0%,因为该id以前已经存在。
这个解决方案有点复杂,但它应该能让您找到需要的地方。它首先为每个ID找到最早的术语和周期,然后将术语&周期乘以同一术语中的所有ID的计数&时期
-- Find the earliest instance of each ID
WITH earliest_instances AS (
SELECT DISTINCT ON (id) *
FROM your_table
ORDER BY id, term, period -- you may need to adjust this order
),
-- For each term and period, count how many IDs showed up there first
earliest_counts AS (
SELECT term, period, count(*)
FROM earliest_instances
GROUP BY term, period
),
-- For each term and period, count how many IDs there are in total
total_counts AS (
SELECT term, period, count(*)
FROM your_table
GROUP BY term, period
)
-- Put those together to make a percentage of new IDs
SELECT
total_counts.term,
total_counts.period,
100 * COALESCE(earliest_counts.count, 0) / total_counts.count AS new_ids_percentage
FROM total_counts
LEFT JOIN earliest_counts ON (
total_counts.term = earliest_counts.term
AND total_counts.period = earliest_counts.period
)
ORDER BY term, period;
term | period | new_ids_percentage
------+--------+--------------------
7 | 201910 | 100
7 | 201911 | 0
7 | 201912 | 0
7 | 202001 | 0
7 | 202002 | 0
7 | 202003 | 0
7 | 202004 | 0
(7 rows)