R:通过ID选择某些角色属性而不是其他角色属性来折叠

我想用每个ID在不同时间的多次观察来折叠微观数据。通常一个ID有相同的出生国家，但有时会发生变化。我想把我的数据压缩为每个ID一个观察结果，并以一种从不选择两个特定国家(例如加拿大和德国(的方式选择国家。例如，如果有一个与出生国加拿大和美国的观察结果，我想选择美国。如果一个人有意大利和德国，我想例如选择意大利。如果只有一个国家，就应该保留这一点。

我的数据：

ID  birth_country
1   US
1   Canada
1   Canada
1   Canada
2   Germany
2   Italy
2   Germany
3   Canada
3   Canada
3   Canada

应该看起来像：

ID  birth_country
1   US
2   Italy
3   Canada

我在dplyer和group_by上尝试过，但找不出一个合适的方法来按角色进行选择。

使用按偏好排序国家的preference矢量：

df <- data.frame(ID = rep.int(1:3, c(4, 3, 3)), birth_country = c("US", "Canada", "Italy", "Germany")[c(1,2,2,2,4,3,4,2,2,2)])
preference <- c("US", "Italy", "Canada", "Germany")
library(dplyr)
df %>%
group_by(ID) %>%
summarize(birth_country = preference[min(match(birth_country, preference))])
#> # A tibble: 3 x 2
#>      ID birth_country
#>   <int> <chr>        
#> 1     1 US           
#> 2     2 Italy        
#> 3     3 Canada

或data.table

library(data.table)
setDT(df)[, .(birth_country = preference[min(match(birth_country, preference))]), by = "ID"]
#>    ID birth_country
#> 1:  1            US
#> 2:  2         Italy
#> 3:  3        Canada

df %>%
group_by(ID) %>%
summarize(birth_country = first(birth_country))

应该做到这一点。您可以使用last或nth(参见?summarise(而不是first

相关内容

最新更新

热门标签：