对于变量的每个观察，我需要在R中拆分字符

我有一个数据帧，对于其中一个变量，我需要通过"，"分割每个观测值

我用过：

y<-strsplit(作为.字符(x(，"，"(

我得到一个数据集，显示一行中的每个拆分字符，而不是在之前的同一行中

我有这样的："a，b，c，d…"需要这个："a"b"c"…对于每行

strsplit返回一个vectors的list。如果我们的元素具有不同数量的,，则list的lengths将不同。在这种情况下，在list的最小lengths的max和rbind的基础上，在最后(一般情况(焊盘NA，以在base R中创建matrix

# assuming the data.frame object name as 'df1', split the column x
# by `,` followed by zero or more spaces `\s*`)
lst1 <- with(df1, strpslit(as.character(x), ",\s*"))
# find the max lengths of the list
mx <- max(lengths(lst1))
# pad NA at the end for elements with lesser length `length<-`
# and rbind the list elements 
out <- do.call(rbind, lapply(lst1, `length<-`, mx))

这也可以在tidyverse分解为list后进行

library(dplyr)
library(tidyr)
df1 %>%
mutate(y = strsplit(as.character(x), ",\s*")) %>%
unnest_wider(y, names_sep = "")

您可以使用tidyr()和dplyr()中的separate()

library(tidyr)
library(dplyr)
#Create data
data <-  tibble(rep(c("a,b,c", "ab,c", "cb,a"),5)) %>% 
set_names("var1")
data %>% 
separate(var1, into = c("var2", "var3", "var4"),   #Names of new columns
sep = ",",   #Specify to separate at comma
fill = "right",   #Pad remaining side with NA
remove = FALSE)  #Keep original variable

相关内容

最新更新

热门标签：