名称的子集向量字符串(基于大小写)



我想分割一个名字向量:

names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane") 

变成两个向量

last <- c("DOE", "VAN DYKE", "SMITH") 

first <- c("John", "Dick", "Mary Jane")

任何帮助都将非常感激。谢谢。

应该可以:

# Define a pattern that only matches words composed entirely of capital letters
pat <- paste("^[", paste(LETTERS, collapse=""), "]*$", sep="")
# [1] "^[ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"
names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane") 
splitNames <- strsplit(names, " ")
# LAST NAMES: (Extract and paste together words matching 'pat')
sapply(splitNames, 
       function(X) paste(grep(pat, X, value=TRUE), collapse=" "))
# [1] "DOE"      "VAN DYKE" "SMITH" 
# First Names: (Extract and paste together words NOT matching 'pat')
sapply(splitNames, 
       function(X) paste(grep(pat, X, value=TRUE, invert=TRUE), collapse=" "))
# [1] "John"      "Dick"      "Mary Jane"

要匹配所有大写字母,您可以选择使用字符类[:upper:],如下所示:

pat <- "^[[:upper:]]*$"

尽管?regexp的文档似乎温和地警告不要这样做,理由是降低了可移植性。

有一种方法:

l <- strsplit(names," ")
splitCaps <- function(x){
    ind <- x == toupper(x)
    list(upper = paste(x[ind],collapse = " "),
         lower = paste(x[!ind],collapse = " "))
}
> lapply(l,splitCaps)
[[1]]
[[1]]$upper
[1] "DOE"
[[1]]$lower
[1] "John"

[[2]]
[[2]]$upper
[1] "VAN DYKE"
[[2]]$lower
[1] "Dick"

[[3]]
[[3]]$upper
[1] "SMITH"
[[3]]$lower
[1] "Mary Jane"

请注意,这里有大量警告,如果您开始混合不寻常的字符集,区域设置,符号等,使用toupper挑选出所有大写单词将非常不可靠。但是对于非常简单的ASCII类型的情况,它应该工作得很好。

相关内容

最新更新