我想根据分配的分隔符号"|"将数据集中的一列拆分为多个列。
我的数据集是这样的:
vname<-c("x1", "x2", "x3","x4")
label<-c("1,Eng |2,Man", "1,yes|2,no|3,dont know", "1,never|2,sometimes|3,usually|4,always", "1,yes|2,No|3,dont know")
df<-data.frame(vname, label)
所以,我想将column: label in拆分为基于符号"|"。我用stringr:函数,和我的代码就像:
cd2<-df %>%
select(vname, everything())%>%
mutate(label=str_split(value, " \| "))
但是,结果返回label列中的向量。它看起来像这样:
vname label
x1 c("1,Eng","2,Man")
x2 c("1,yes","2,no", "3,dont know")
....
我的问题是如何得到这样一个预期的结果:
vname label1 label2 label3 label4
x1 1,Eng 2,Man
x2 1,yes 2,no 3, dont know
x3 1,never. 2,sometimes, 3,usually. 4,always
...
谢谢你的帮助~~~
dput(head(cd2, 10))
structure(list(variable = c("x2", "x8", "x9", "x10", "x13", "x14",
"x15", "x20", "x22", NA), vname = c("consenting_language", "county",
"respondent", "residence", "language", "int_q1", "int_q2", "int_q4",
"int_q5", "int_q6"), label = c("Consenting Language", "County",
"Respondent Type", "Residence", "Interview language ", "1. What was your sex at birth?",
"2. How would you describe your current sexual orientation?",
"4. What is the highest level of education you completed?", "5. What is your current marital status?",
"<div class="rich-text-field-label"><p>6. Is <span style="color: #3598db;">regular </span>your partner currently living with you now, or does s/he stay elsewhere?</p></div>"
), value = c("1, English | 2, Kiswahili", "1, County011 | 2, County014 | 3, County002| 4, County006 | 5, County010 | 6, County008 | 7, County005 | 8, County003 | 9, County012| 10, County004 | 11, County009 | 12, County001 | 13, County015 | 14, County007 | 15, County012",
"1, FSW | 2, MSM | 3, AGYW", "1, Urban | 2, Peri urban | 3, Rural",
"1, English | 2, Kiswahili", "1, Male | 2, Female", "1, Homosexual/Gay | 2, Bisexual | 3, Heterosexual/Straight | 4, Transgender Male | 5, Transgender Female | 96, Other | 98, Don't Know | 99, Decline to state",
"1,None | 2,Nursery/kindergarten | 3,Primary | 4,Secondary | 5,Tertiary/Vocational | 6,College/University | 7,Adult education | 96,Other",
"1, Single/Not married | 2, Married | 3, Cohabiting | 4, Divorced | 5, Separated | 6, Widowed | 7, In a relationship",
"1, Living with You | 2, Staying Elsewhere")), row.names = c(NA,
10L), class = "data.frame")
使用的代码,它返回一个list
(也许我们必须确保有零个或多个空格,因为在示例中没有空格),我们可以将unnest_wider
转换为新的列
library(dplyr)
library(stringr)
library(tidyr)
df %>%
select(vname, everything())%>%
mutate(label=str_split(label, "\s*\|\s*")) %>%
unnest_wider(where(is.list), names_sep = "")
与产出
# A tibble: 4 × 5
vname label1 label2 label3 label4
<chr> <chr> <chr> <chr> <chr>
1 x1 1,Eng 2,Man <NA> <NA>
2 x2 1,yes 2,no 3,dont know <NA>
3 x3 1,never 2,sometimes 3,usually 4,always
4 x4 1,yes 2,No 3,dont know <NA>
这可能也适用于separate
library(tidyr)
df %>%
separate(label, into = str_c('label',
seq_len(max(str_count(.$label, fixed("|"))) + 1)),
sep = "\|", fill = "right")
与产出
vname label1 label2 label3 label4
1 x1 1,Eng 2,Man <NA> <NA>
2 x2 1,yes 2,no 3,dont know <NA>
3 x3 1,never 2,sometimes 3,usually 4,always
4 x4 1,yes 2,No 3,dont know <NA>
或者使用OP的数据'cd2' -在|
cd2new <- cd2 %>%
separate(value, into = str_c('value',
seq_len(max(str_count(.$value, fixed("|"))) + 1)),
sep = "\s*\|\s*", fill = "right")
与产出
> head(cd2new, 2)
variable vname label value1 value2 value3 value4 value5
1 x2 consenting_language Consenting Language 1, English 2, Kiswahili <NA> <NA> <NA>
2 x8 county County 1, County011 2, County014 3, County002 4, County006 5, County010
value6 value7 value8 value9 value10 value11 value12 value13
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 6, County008 7, County005 8, County003 9, County012 10, County004 11, County009 12, County001 13, County015
value14 value15
1 <NA> <NA>
2 14, County007 15, County012
您可以简单地使用separate()
from {tidyr}
library(tidyverse)
dat %>% as_tibble() %>%
separate(value, sep = "\s*\|\s*",
into = paste0("value", seq(str_count(.$value, "\s*\|\s*"))))