使用R将字符串拆分为多个列,而不是将一个向量拆分为一个列

  • 本文关键字:一个 拆分 字符串 向量 使用 r
  • 更新时间 :
  • 英文 :


我想根据分配的分隔符号"|"将数据集中的一列拆分为多个列。

我的数据集是这样的:

vname<-c("x1", "x2", "x3","x4")
label<-c("1,Eng |2,Man", "1,yes|2,no|3,dont know", "1,never|2,sometimes|3,usually|4,always", "1,yes|2,No|3,dont know")
df<-data.frame(vname, label)

所以,我想将column: label in拆分为基于符号"|"。我用stringr:函数,和我的代码就像:

cd2<-df %>%
select(vname, everything())%>%
mutate(label=str_split(value, " \| "))

但是,结果返回label列中的向量。它看起来像这样:

vname  label
x1     c("1,Eng","2,Man")
x2     c("1,yes","2,no", "3,dont know")
....

我的问题是如何得到这样一个预期的结果:

vname  label1   label2      label3          label4
x1     1,Eng    2,Man
x2     1,yes    2,no         3, dont know
x3     1,never. 2,sometimes, 3,usually.     4,always
...

谢谢你的帮助~~~


dput(head(cd2, 10)) 
structure(list(variable = c("x2", "x8", "x9", "x10", "x13", "x14", 
"x15", "x20", "x22", NA), vname = c("consenting_language", "county", 
"respondent", "residence", "language", "int_q1", "int_q2", "int_q4", 
"int_q5", "int_q6"), label = c("Consenting Language", "County", 
"Respondent Type", "Residence", "Interview language ", "1. What was your sex at birth?", 
"2. How would you describe your current sexual orientation?", 
"4. What is the highest level of education you completed?", "5. What is your current marital status?", 
"<div class="rich-text-field-label"><p>6. Is <span style="color: #3598db;">regular </span>your partner currently living with you now, or does s/he stay elsewhere?</p></div>"
), value = c("1, English | 2, Kiswahili", "1, County011 | 2, County014  | 3, County002| 4, County006  | 5, County010 | 6, County008  | 7, County005  | 8, County003 | 9, County012| 10, County004 | 11, County009  | 12, County001 | 13, County015 | 14, County007 | 15, County012", 
"1, FSW | 2, MSM | 3, AGYW", "1, Urban | 2, Peri urban | 3, Rural", 
"1, English | 2, Kiswahili", "1, Male | 2, Female", "1, Homosexual/Gay | 2, Bisexual | 3, Heterosexual/Straight | 4, Transgender Male | 5, Transgender Female | 96, Other | 98, Don't Know | 99, Decline to state", 
"1,None  | 2,Nursery/kindergarten | 3,Primary | 4,Secondary | 5,Tertiary/Vocational | 6,College/University | 7,Adult education | 96,Other", 
"1, Single/Not married | 2, Married | 3, Cohabiting | 4, Divorced | 5, Separated | 6, Widowed | 7, In a relationship", 
"1, Living with You | 2, Staying Elsewhere")), row.names = c(NA, 
10L), class = "data.frame")

使用的代码,它返回一个list(也许我们必须确保有零个或多个空格,因为在示例中没有空格),我们可以将unnest_wider转换为新的列

library(dplyr)
library(stringr)
library(tidyr)
df %>%
select(vname, everything())%>%
mutate(label=str_split(label, "\s*\|\s*")) %>% 
unnest_wider(where(is.list), names_sep = "")

与产出

# A tibble: 4 × 5
vname label1  label2      label3      label4  
<chr> <chr>   <chr>       <chr>       <chr>   
1 x1    1,Eng   2,Man       <NA>        <NA>    
2 x2    1,yes   2,no        3,dont know <NA>    
3 x3    1,never 2,sometimes 3,usually   4,always
4 x4    1,yes   2,No        3,dont know <NA>    

这可能也适用于separate

library(tidyr)
df %>%
separate(label, into = str_c('label', 
seq_len(max(str_count(.$label, fixed("|"))) + 1)), 
sep = "\|", fill = "right")

与产出

vname  label1      label2      label3   label4
1    x1  1,Eng        2,Man        <NA>     <NA>
2    x2   1,yes        2,no 3,dont know     <NA>
3    x3 1,never 2,sometimes   3,usually 4,always
4    x4   1,yes        2,No 3,dont know     <NA>

或者使用OP的数据'cd2' -在|

之前和之后添加空格
cd2new <- cd2 %>% 
separate(value, into = str_c('value', 
seq_len(max(str_count(.$value, fixed("|"))) + 1)), 
sep = "\s*\|\s*", fill = "right")

与产出

> head(cd2new, 2)
variable               vname               label       value1       value2       value3       value4       value5
1       x2 consenting_language Consenting Language   1, English 2, Kiswahili         <NA>         <NA>         <NA>
2       x8              county              County 1, County011 2, County014 3, County002 4, County006 5, County010
value6       value7       value8       value9       value10       value11       value12       value13
1         <NA>         <NA>         <NA>         <NA>          <NA>          <NA>          <NA>          <NA>
2 6, County008 7, County005 8, County003 9, County012 10, County004 11, County009 12, County001 13, County015
value14       value15
1          <NA>          <NA>
2 14, County007 15, County012

您可以简单地使用separate()from {tidyr}

library(tidyverse)
dat %>% as_tibble() %>% 
separate(value, sep = "\s*\|\s*", 
into = paste0("value", seq(str_count(.$value, "\s*\|\s*"))))

相关内容

  • 没有找到相关文章

最新更新