R:提取的字符串可以作为分隔字符保存到一列中吗



假设我需要根据注释行中的句子为人们分配类。(实际数据比这更复杂,我简化了它(因此,我使用带有remath((、gsub((和gregexpr((的正则表达式从数据中的注释语句中提取字符串。然后将列表保存到列中,并将它们组合为字符,如下所示。

>cbind.data.frame(level,software,month,stringsAsFactors = FALSE) 
level                         software             month
1  c("beginner1","beginner2")    c++                  Dec       
2                      NA        Java                 Jan       
3             "beginner3"        NA                   May   
4         "intermediate2"        NA                   NA      
5                      NA        Matlab               Mar    
6             "advanced1"        c("java","c++")      Jul     

我想用把所有字符子集成一列

-将列表c("beginr1"、"beginr2"(分解为"beginer1"、"Beginer2">

-下降NA

-保持为下的字符

newcol
"beginner1","beginner2","c++","Dec" 
"Java","Jan" 
"beginner3", "May"
"intermediate2" 
"Matlab", "Mar"    
"advanced1","java","c++","Jul"  

然而,当我组合在一起时,它被组合成了一个角色。

> newcol<-unite(combined, newcol, 1:ncol(combined), remove=TRUE, sep = ",")
"beginner1,beginner2,c++,Dec"  
"Java,Jan" 
"beginner3, May"
"intermediate2" 
"Matlab, Mar"    
"advanced1,java,c++,Jul"  

是否可以将多个字符作为分隔字符保存到一列中?

以下是使用的基本R解决方案

f <- Vectorize(function(u) {
z <- unlist(regmatches(u,gregexpr('".*?"',u,perl = T)))
if (length(z)> 0) {
r <- gsub('"',"",z)
} else {
r <- u
}
r
})
df$newcol <- apply(df,1,function(x) f(na.omit(x)))

使得

> df
level        software month                         newcol
1 c("beginner1","beginner2")             c++   Dec beginner1, beginner2, c++, Dec
2                       <NA>            Java   Jan                      Java, Jan
3                  beginner3            <NA>   May                 beginner3, May
4              intermediate2            <NA>  <NA>                  intermediate2
5                       <NA>          Matlab   Mar                    Matlab, Mar
6                  advanced1 c("java","c++")   Jul      advanced1, java, c++, Jul

其中

> df$newcol
$`1`
$`1`$level
[1] "beginner1" "beginner2"
$`1`$software
[1] "c++"
$`1`$month
[1] "Dec"

$`2`
$`2`$software
[1] "Java"
$`2`$month
[1] "Jan"

$`3`
$`3`$level
[1] "beginner3"
$`3`$month
[1] "May"

$`4`
$`4`$level
[1] "intermediate2"

$`5`
$`5`$software
[1] "Matlab"
$`5`$month
[1] "Mar"

$`6`
$`6`$level
[1] "advanced1"
$`6`$software
[1] "java" "c++" 
$`6`$month
[1] "Jul"

数据

df <- structure(list(level = c("c("beginner1","beginner2")", NA, 
"beginner3", "intermediate2", NA, "advanced1"), software = c("c++", 
"Java", NA, NA, "Matlab", "c("java","c++")"), month = c("Dec", 
"Jan", "May", NA, "Mar", "Jul")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

这有帮助吗?

A<-data.frame(a=c("a","b","c"),b=c("a","b","c"),c=c("a","b","c"))
apply(A,2,paste,collapse=",")

相关内容