如何在R中创建从特定数字开始的唯一ID



我正在尝试创建一个具有唯一id的新列,该id基于从特定数字(即R中的1102535(开始的每个客户端id递增计数。

以下是我目前的数据:在此处输入图像描述

期望输出:

在此处输入图像描述

通常最好用可复制的数据样本发布问题,而不是屏幕截图。你可以在这里看到一些细节。

但为了回答您的问题,以下代码应该使用dplyr包工作:

# LOAD PACKAGE
library(dplyr) 
# CREATE SAMPLE DATA
df <- tribble(~Client, ~Timepoint, ~Status,
100001,111,"Positive",100001,222,"Positive",100001,111,"Positive",100002,333,"Negative",100002,333,"Negative",100002,444,"Negative",100002,444,"Positive", 100004,555,"Positive",100004,555,"Negative",100004,666,"Positive",100004,666,"Positive",100005,777,"Negative",100005,777,"Positive",100005,777,"Positive",100006,888,"Negative",100006,999,"Negative")
# ADD ROW NUMBERS TO EACH DISTINCT CLIENT (PLUS YOUR CHOICE OF STARTING NUMBER)
# JOIN TO ORIGINAL DF
df |>
distinct(Client) |> 
mutate(ID = row_number()+1102534, .before = client) |>
inner_join(df)

这应该产生以下内容:

# A tibble: 16 × 4
ID Client Timepoint Status  
<dbl>  <dbl>     <dbl> <chr>   
1 1102535 100001       111 Positive
2 1102535 100001       222 Positive
3 1102535 100001       111 Positive
4 1102536 100002       333 Negative
5 1102536 100002       333 Negative
6 1102536 100002       444 Negative
7 1102536 100002       444 Positive
8 1102537 100004       555 Positive
9 1102537 100004       555 Negative
10 1102537 100004       666 Positive
11 1102537 100004       666 Positive
12 1102538 100005       777 Negative
13 1102538 100005       777 Positive
14 1102538 100005       777 Positive
15 1102539 100006       888 Negative
16 1102539 100006       999 Negative

我们可以使用cur_group_id来分配组ID,然后我们可以根据起始编号调整编号。你必须减去1,因为cur_group_id将从1开始。

library(tidyverse)
start_num <- 1102535
df %>%
group_by(Client) %>%
mutate(ID = cur_group_id()+start_num-1)

输出

Client  Time      ID
<dbl> <dbl>   <dbl>
1 1000001  69.0 1102535
2 1000001  39.0 1102535
3 1000001  77.2 1102535
4 1000002  50.3 1102536
5 1000002  72.0 1102536
6 1000003  99.2 1102537

数据

df <- structure(list(Client = c(1000001, 1000001, 1000001, 1000002, 
1000002, 1000003), Time = c(69.0152618191205, 39.02626810316, 
77.2143005798571, 50.2722249664366, 72.0442323181778, 99.1987033882178
)), class = "data.frame", row.names = c(NA, -6L))

数据表解决方案

df <- df[
j = ID := base::as.numeric(base::interaction(Client,drop=TRUE)) + 1102534
]

如果需要的话,interaction()可以基于多个变量创建唯一的ID。在这里,您只需要Client

最新更新