为现有列值创建新顺序，而不重新排序dataframe - r中的行 - create new order for existing column values without reordering rows in dataframe

我在不同的id上完成了kmeans的一些结果聚类标签(下面的示例)。问题是，尽管所有id都有3个集群，但kmeans集群代码的顺序并不一致。

reprex = data.frame(id = rep(1:2, each = 41, 
v1 = rep(seq(1:4), 2),
cluster = c(2,2,1,3,3,1,2,2))
reprex
id v1 cluster
1  1  1       2
2  1  2       2
3  1  3       1
4  1  4       3
5  2  1       3
6  2  2       1
7  2  3       2
8  2  4       2

我想要的是变量集群应该总是在每个ID中以1开始。注意，我不想按集群重新排序该数据框，顺序需要保持不变。所以期望的结果是:

reprex_desired<- data.frame(id = rep(1:2, each = 4), 
v1 = rep(seq(1:4), 2),
cluster = c(2,2,1,3,3,1,2,2),
what_iWant = c(1,1,2,3,1,2,3,3))
reprex_desired
id v1 cluster what_iWant
1  1  1       2          1
2  1  2       2          1
3  1  3       1          2
4  1  4       3          3
5  2  1       3          1
6  2  2       1          2
7  2  3       2          3
8  2  4       2          3

我们可以在按'id'分组后使用match

library(dplyr)
reprex <- reprex %>%
group_by(id) %>% 
mutate(what_IWant = match(cluster, unique(cluster))) %>%
ungroup

与产出

reprex
# A tibble: 8 × 4
id    v1 cluster what_IWant
<int> <int>   <dbl>      <int>
1     1     1       2          1
2     1     2       2          1
3     1     3       1          2
4     1     4       3          3
5     2     1       3          1
6     2     2       1          2
7     2     3       2          3
8     2     4       2          3

以下是cumsum与lag结合的版本:

library(dplyr)
df %>% 
group_by(id) %>% 
mutate(what_i_want = cumsum(cluster != lag(cluster, def = first(cluster)))+1)

id    v1 cluster what_i_want
<int> <int>   <dbl>       <dbl>
1     1     1       2           1
2     1     2       2           1
3     1     3       1           2
4     1     4       3           3
5     2     1       3           1
6     2     2       1           2
7     2     3       2           3
8     2     4       2           3

为现有列值创建新顺序，而不重新排序dataframe - r中的行

相关内容

最新更新

热门标签：