>我有一个数据帧,其中包含一些包含逗号分隔字符串的列:
colA colB
1 a,b,c,ñ d,b,e
2 f,g,h f,g,m,p
3 i,j,k f,o,j
我想在对应于同一行的两列之间获取公共元素。所以我想要的输出是:
colA colB
1 b b
2 f,g f,g
3 j j
我试图将此列转换为列表列表以在此之后执行交集,但是我遇到了一些问题,所以我想知道是否有更简单的方法。我怎样才能得到这个?
我们可以使用它
df[,1:2] <- apply(df,1, function(X) paste(unlist(strsplit(X[1],","))[unlist(strsplit(X[1],",")) %in% unlist(strsplit(X[2],","))],collapse=",") )
> df
colA colB
1 b b
2 f,g f,g
3 j j
数据:
df <- structure(list(colA = structure(1:3, .Label = c("a,b,c,ñ", "f,g,h",
"i,j,k"), class = "factor"), colB = structure(1:3, .Label = c("d,b,e",
"f,g,m,p", "f,o,j"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
如果你正在处理一个更大的数据集,你可以尝试从"splitstackshape"cSplit_l
:
library(splitstackshape)
temp <- cSplit_l(df, names(df), ",", stripWhite = TRUE, type.convert = FALSE, drop = TRUE)
temp[, 1:2] <- vapply(Map(intersect, temp[[1]], temp[[2]]), toString, character(1L))
setnames(temp, names(df))[]
## colA colB
## 1: b b
## 2: f, g f, g
## 3: j j
目前尚不清楚为什么您希望在这两列中使用相同的内容。
另一种选择是str_extract
library(stringr)
library(dplyr)
library(purrr)
df %>%
transmute(colA = map_chr(str_extract_all(colA,
str_replace_all(colB, ",", "|")), toString),
colB = colA)
# colA colB
#1 b b
#2 f, g f, g
#3 j j