对于变量的每个观察,我需要在R中拆分字符



我有一个数据帧,对于其中一个变量,我需要通过","分割每个观测值

我用过:

y<-strsplit(作为.字符(x(,","(

我得到一个数据集,显示一行中的每个拆分字符,而不是在之前的同一行中

我有这样的:"a,b,c,d…"需要这个:"a"b"c"…对于每行

strsplit返回一个vectors的list。如果我们的元素具有不同数量的,,则listlengths将不同。在这种情况下,在list的最小lengths的maxrbind的基础上,在最后(一般情况(焊盘NA,以在base R中创建matrix

# assuming the data.frame object name as 'df1', split the column x
# by `,` followed by zero or more spaces `\s*`)
lst1 <- with(df1, strpslit(as.character(x), ",\s*"))
# find the max lengths of the list
mx <- max(lengths(lst1))
# pad NA at the end for elements with lesser length `length<-`
# and rbind the list elements 
out <- do.call(rbind, lapply(lst1, `length<-`, mx))

这也可以在tidyverse分解为list后进行

library(dplyr)
library(tidyr)
df1 %>%
mutate(y = strsplit(as.character(x), ",\s*")) %>%
unnest_wider(y, names_sep = "")

您可以使用tidyr()dplyr()中的separate()

library(tidyr)
library(dplyr)
#Create data
data <-  tibble(rep(c("a,b,c", "ab,c", "cb,a"),5)) %>% 
set_names("var1")
data %>% 
separate(var1, into = c("var2", "var3", "var4"),   #Names of new columns
sep = ",",   #Specify to separate at comma
fill = "right",   #Pad remaining side with NA
remove = FALSE)  #Keep original variable

最新更新