SQL 分配 ID 分区依据

我想为满足某些条件的行组分配唯一 ID。在下面的示例中，我想根据数据和硬件分配一个唯一的 ID

例：

date, hardware, color
1990, 8989, blue
1990, 8989, yellow
1991, 8989, blue
1991, 3333, blue
1991, 8989, black

预期成果

date, hardware, color, ID
1990, 8989, blue, 1
1990, 8989, yellow, 1
1991, 8989, blue, 2
1991, 3333, blue, 3
1991, 8989, black, 2

如何在 BigQuery 中实现此结果？

您可以使用DENSE_RANK：

select t.*,dense_rank() over (order by date, hardware) as id
from table_name t;

数据库<>小提琴演示

我会这样做：

with
x as (
  select distinct date, hardware from my_table
),
y as (
  select 
    date, 
    hardware, 
    row_number() over(order by date, hardware) as rn
  from x
)
select
  t.*, y.rn
from my_table t
join y on y.date = t.date and y.hardware = t.hardware

当

没有partition by子句时，BigQuery 中的窗口函数可能会在大数据上出现问题。他们可能会耗尽资源。

另一种方法是使用哈希分配 id：

select t.*, farm_fingerprint(cast(date as string), '|', hardware) as id
from table_name t;

id没有那么优雅。但是，如果您的查询由于缺乏资源而失败，优雅的 id 几乎没有什么安慰。

相关内容

最新更新

热门标签：