NA 的 R 函数问题，条件的长度> 1，并且只会使用第一个元素

我有以下函数，它从ID返回年龄：

giveAge = function(id){
# start returns the place where any number starts in the id string
start = regexpr(id, pattern ="[0-9]")[[1]] 
# age returns the age by using the year the id was born 
age  = ifelse(substr(id,start,start) == 0,
lubridate::year(Sys.Date()) - (2000 + as.numeric(substr(id,start,start + 1))),
lubridate::year(Sys.Date()) - (1900 + as.numeric(substr(id,start,start + 1)))
)
return(age)
}

例如，假设我们有一个有四个id的向量，第三个就不见了。(1975年出生的AAHG，1991年出生的FFCH，1955年出生的CUM(

IDs = c("AAHG7511083A8", "FFCH9108017U2", NA, "CUM550117112")

在IDs中使用giveAge得到

> giveAge(IDs)
[1] 46 30 NA 66

这里的一切都很酷，但当缺失的值在向量中最先出现时

IDs2 = c(NA, "AAHG7511083A8", "FFCH9108017U2", "CUM550117112")

当将giveAge应用于IDs2时，我得到

> giveAge(IDs2)
[1] NA NA NA NA

我试图通过在值为NA的情况下输入任意数字来解决这个问题，但我得到了一个警告，并且该函数没有应用于整个向量：c

giveAge2 = function(id){
if(!is.na(id)){
start = regexpr(id, pattern ="[0-9]")[[1]] 

age  = ifelse(substr(id,start,start) == 0,
lubridate::year(Sys.Date()) - (2000 + as.numeric(substr(id,start,start + 1))),
lubridate::year(Sys.Date()) - (1900 + as.numeric(substr(id,start,start + 1)))
)
return(age)
} else {
return(28)  
}
}
> giveAge2(IDs2)
[1] 28
Warning message:
In if (!is.na(id)) { :
the condition has length > 1 and only the first element will be used

如何解决此问题？

谢谢。

1(问题中的giveAge代码仅根据输入的第一个元素计算开始，因此如果该元素为NA，则所有内容都为NA。如果删除[[1]]，问题中的giveAge将起作用。

(给定Age2存在上述问题，加上它正在向if语句传递向量，但此类语句需要标量。(

2(或者试试这个。我们还删除了对包的依赖。这将从每个字符串的左手边修剪非数字，取剩余部分的前两位数字，并将其转换为数字，给出两位数字的年份yy。然后它将其转换为4位数的年份，并从当前年份中减去。

giveAge3 <- function(id, today = Sys.Date(), cutoff = 10) {
yy <- as.numeric(substr(trimws(id, "left", "\D"), 1, 2))
year <- yy + 1900 + 100 * (yy < cutoff)
as.numeric(format(today, "%Y")) - year
}
giveAge3(IDs)
## [1] 46 30 NA 66
giveAge3(IDs2)
## [1] NA 46 30 66

在函数中，使用is.na创建一个逻辑索引。然后使用索引从输入向量中提取并分配给返回值。

giveAge <- function(id){
# start returns the place where any number starts in the id string
i_na <- is.na(id)
age <- rep(NA_real_, length(id))
start <- regexpr(id[!i_na], pattern ="[0-9]")[[1]] 
# age returns the age by using the year the id was born 
age[!i_na] <- ifelse(substr(id[!i_na],start,start) == 0,
lubridate::year(Sys.Date()) - (2000 + as.numeric(substr(id[!i_na],start,start + 1))),
lubridate::year(Sys.Date()) - (1900 + as.numeric(substr(id[!i_na],start,start + 1)))
)
age
}
IDs = c("AAHG7511083A8", "FFCH9108017U2", NA, "CUM550117112")
IDs2 = c(NA, "AAHG7511083A8", "FFCH9108017U2", "CUM550117112")
giveAge(IDs)
#[1] 46 30 NA 71
giveAge(IDs2)
#[1] NA 46 30 71

is.na(id)为id中的每个值返回TRUE或FALSE。由于在您的示例中id设置为c(NA, "AAHG7511083A8", "FFCH9108017U2", "CUM550117112")，因此is.na(id)的输出将为TRUE, FALSE, FALSE; FALSE。

然而，if()函数只期望一个值(单个TRUE或FALSE(。不检查其余部分：如果问题只出现"；当丢失的值在矢量"中首先出现时；，您可以简单地使用if(!is.na(id[1]))来检查第一个值是否为NA

相关内容

最新更新

热门标签：