r语言 - 使用data.table创建字符变量



假设我们有如下的data.table:

x_dt <- data.table(sexn = c(1, 0, 0, 1, NA, 1, NA), 
country = c("CHN", "JPN", "BGR",  "AUT", " ", "TWN", " "), 
age = c(35, NA, 40, NA, 70, 18, 36)
)

我试图创建一个变量asia_region,当国家%chin% c("CHN", "JPN", "KOR", "SGP", "TWN")时其值为1,当国家不缺失时其值为0,当国家缺失时其值为NA。

当缺少国家时,以下代码填充0。

result <- x_dt[, asia_region := ifelse(country %chin% c("CHN", "JPN", "KOR",  "SGP", "TWN"),1 , 0)]

我们可以直接将as.integer+的逻辑强制为二进制,然后通过在i中指定逻辑条件,并将'asia_region'中相应元素的赋值(:=)指定为NA,将'country'为空("")的值更改为NA

x_dt[,  asia_region := +(country %chin% c("CHN", "JPN", "KOR", "SGP", "TWN"))]
x_dt[trimws(country) == "", asia_region := NA_integer_]

与产出

> x_dt
sexn country age asia_region
1:    1     CHN  35           1
2:    0     JPN  NA           1
3:    0     BGR  40           0
4:    1     AUT  NA           0
5:   NA          70          NA
6:    1     TWN  18           1
7:   NA          36          NA

或者如果我们需要ifelse/fifelse(if/else不能工作,因为它不是矢量化的,即它期望输入表达式长度为1且不大于1)

x_dt[, asia_region := fifelse(trimws(country) == "", NA_integer_,
fifelse(country %chin% c("CHN", "JPN", "KOR", "SGP", "TWN"), 1, 0))]

dplyr()解决方案如何?为了便于参考,我将这些国家做成一个矢量:

asia_countries <-  c("CHN", "JPN", "KOR",  "SGP", "TWN")
x_dt |>
dplyr::mutate(asia_region = ifelse(country %in% asia_countries, 1, 0)) |>
dplyr::mutate(asia_region = ifelse(country == " ", NA, asia_region))

相关内容

  • 没有找到相关文章

最新更新