我正试图在多个列中分割字符串,目前我在数据框架中有此列表。每个字符串都有不同的长度,我不能总是预测它,我想用","来分割,去掉"one_answers")";并且每个变量都在一个列中。
x
(1,2,3,4,5)
(1,2,3,4,5,6)
(1,2,3,4,5,6,7)
我已经试过了,但是不工作
y = strsplit(as.character(df$x),',')
所需输出
x x x x x x x
1 2 3 4 5 n/a n/a
1 2 3 4 5 6 n/a
1 2 3 4 5 6 7
开头:
library(tidyr)
x %>%
# remove`(` and `)`:
mutate(V = gsub("\(|\)", "", V)) %>%
# split `V` into separate columns:
separate(V, into = paste0('x', 1:7), fill = 'right', remove = TRUE, sep = ',')
x1 x2 x3 x4 x5 x6 x7
1 1 2 3 4 5 <NA> <NA>
2 1 2 3 4 5 6 <NA>
3 1 2 3 4 5 6 7
数据:
x <- data.frame(
V = c("(1,2,3,4,5)","(1,2,3,4,5,6)","(1,2,3,4,5,6,7)")
)
编辑:
如果位数和列数是未知的,你可以这样做:
x_new <- x %>%
# remove`(` and `)`:
mutate(V = gsub("\(|\)", "", V)) %>%
# count number of digits:
mutate(N = str_count(V, "\d"))
x_new %>%
# split `V` into separate columns:
separate(V, into = paste0('x', 1:max(x_new$N, na.rm = TRUE)), fill = 'right', remove = TRUE, sep = ',') %>%
select(-N)
你可以这样做:
x <- c("(1,2,3,4,5)", "(1,2,3,4,5,6)", "(1,2,3,4,5,6,7)")
x <- lapply(strsplit(gsub("\(|\)", "", x), ","), as.numeric)
x <- lapply(x, function(y) c(y, rep(NA, max(lengths(x)) - length(y))))
setNames(as.data.frame(x), c("x1", "x2", "x3"))
#> x1 x2 x3
#> 1 1 1 1
#> 2 2 2 2
#> 3 3 3 3
#> 4 4 4 4
#> 5 5 5 5
#> 6 NA 6 6
#> 7 NA NA 7
在2022-05-28由reprex包(v2.0.1)创建
分割非数字,删除第一个元素,调整长度,数据帧
lapply(strsplit(dat$V1, '\D'), `[`, -1) |>
{(.) lapply(., `length<-`, max(lengths(.)))}() |>
do.call(what=rbind) |> as.data.frame()
# V1 V2 V3 V4 V5 V6 V7
# 1 1 2 3 4 5 <NA> <NA>
# 2 1 2 3 4 5 6 <NA>
# 3 1 2 3 4 5 6 7
数据:
dat <- structure(list(V1 = c("(1,2,3,4,5)", "(1,2,3,4,5,6)", "(1,2,3,4,5,6,7)"
)), class = "data.frame", row.names = c(NA, -3L))
在base R
中,用gsub
去除()
后更容易用read.csv
读取
read.csv(text = gsub("[()]", "", x), header = FALSE, fill = TRUE)
V1 V2 V3 V4 V5 V6 V7
1 1 2 3 4 5 NA NA
2 1 2 3 4 5 6 NA
3 1 2 3 4 5 6 7