如何将数据帧分离为与R中的列名有关的数据帧列表

  • 本文关键字:数据帧 列表 分离 r
  • 更新时间 :
  • 英文 :


假设我有以下数据帧:

df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))

我想创建一个数据帧列表,用列名的第一部分将它们分隔开,即以"BR"开头的列将是列表的一个元素,以"USA"开头的栏将是另一个元素等等

我可以获得列名并使用strsplit将它们分隔开。然而,我不确定迭代它并分离数据帧的最佳方式是什么。

strsplit(names(df), "\.")

给了我一个列表,其中顶层元素是列的名称,第二层是由"."分割的相同元素。

我如何迭代这个列表,以获得以相同子字符串开头的列的索引号,并将这些列分组为另一个列表的元素?

只有当列名始终以您拥有的形式(根据"."拆分)并且您希望根据第一个"."之前的标识符进行分组时,这才有效。

df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))
## Grab the component of the names we want
nm <- do.call(rbind, strsplit(colnames(df), "\."))[,1]
## Create list with custom function using lapply
datlist <- lapply(unique(nm), function(x){df[, nm == x]})

Dason击败了我,但这是同一概念方法的不同风格:

library(plyr)
# Use regex to get the prefixes
# Pulls any letters or digits ("\w*") from the beginning of the string ("^")
# to the first period ("\.") into a group, then matches all the remaining
# characters (".*").  Then replaces with the first group ("\1" = "(\w*)").
# In other words, it matches the whole string but replaces with only the prefix.
prefixes <- unique(gsub(pattern = "^(\w*)\..*",
                        replace = "\1",
                        x = names(df)))
# Subset to the variables that match the prefix
# Iterates over the prefixes and subsets based on the variable names that
# match that prefix
llply(prefixes, .fun = function(x){
    y <- subset(df, select = names(df)[grep(names(df),
                                            pattern = paste("^", x, sep = ""))])
})

我认为这些正则表达式仍然应该为您提供正确的结果,即使变量名称后面有".":

unique(gsub(pattern = "^(\w*)\..*",
            replace = "\1",
            x = c(names(df), "FRA.c.blahblah")))

或者,如果前缀稍后出现在变量名中:

# Add a USA variable with "FRA" in it
df2 <- data.frame(df, USA.FRANKLINS = rnorm(10))
prefixes2 <- unique(gsub(pattern = "^(\w*)\..*",
                        replace = "\1",
                        x = names(df2)))
llply(prefixes2, .fun = function(x){
    y <- subset(df2, select = names(df2)[grep(names(df2),
                                            pattern = paste("^", x, sep = ""))])
})

最新更新