r语言 - 多列字符串



我正试图在多个列中分割字符串,目前我在数据框架中有此列表。每个字符串都有不同的长度,我不能总是预测它,我想用","来分割,去掉"one_answers")";并且每个变量都在一个列中。

x
(1,2,3,4,5)
(1,2,3,4,5,6)
(1,2,3,4,5,6,7)

我已经试过了,但是不工作

y = strsplit(as.character(df$x),',')

所需输出

x   x   x   x   x   x   x
1   2   3   4   5   n/a n/a
1   2   3   4   5   6   n/a
1   2   3   4   5   6   7

开头:

library(tidyr)
x %>%
# remove`(`  and `)`:
mutate(V = gsub("\(|\)", "", V)) %>%
# split `V` into separate columns:
separate(V, into = paste0('x', 1:7), fill = 'right', remove = TRUE, sep = ',')
x1 x2 x3 x4 x5   x6   x7
1  1  2  3  4  5 <NA> <NA>
2  1  2  3  4  5    6 <NA>
3  1  2  3  4  5    6    7

数据:

x <- data.frame(
V = c("(1,2,3,4,5)","(1,2,3,4,5,6)","(1,2,3,4,5,6,7)")
)

编辑:

如果位数和列数是未知的,你可以这样做:

x_new <- x %>%
# remove`(`  and `)`:
mutate(V = gsub("\(|\)", "", V)) %>%
# count number of digits:
mutate(N = str_count(V, "\d"))
x_new %>% 
# split `V` into separate columns:
separate(V, into = paste0('x', 1:max(x_new$N, na.rm = TRUE)), fill = 'right', remove = TRUE, sep = ',') %>%
select(-N)

你可以这样做:

x <- c("(1,2,3,4,5)", "(1,2,3,4,5,6)", "(1,2,3,4,5,6,7)")
x <- lapply(strsplit(gsub("\(|\)", "", x), ","), as.numeric)
x <- lapply(x, function(y) c(y, rep(NA, max(lengths(x)) - length(y))))
setNames(as.data.frame(x), c("x1", "x2", "x3"))
#>   x1 x2 x3
#> 1  1  1  1
#> 2  2  2  2
#> 3  3  3  3
#> 4  4  4  4
#> 5  5  5  5
#> 6 NA  6  6
#> 7 NA NA  7

在2022-05-28由reprex包(v2.0.1)创建

分割非数字,删除第一个元素,调整长度,数据帧

lapply(strsplit(dat$V1, '\D'), `[`, -1) |>
{(.) lapply(., `length<-`, max(lengths(.)))}() |>
do.call(what=rbind) |> as.data.frame()
#   V1 V2 V3 V4 V5   V6   V7
# 1  1  2  3  4  5 <NA> <NA>
# 2  1  2  3  4  5    6 <NA>
# 3  1  2  3  4  5    6    7

数据:

dat <- structure(list(V1 = c("(1,2,3,4,5)", "(1,2,3,4,5,6)", "(1,2,3,4,5,6,7)"
)), class = "data.frame", row.names = c(NA, -3L))

base R中,用gsub去除()后更容易用read.csv读取

read.csv(text = gsub("[()]", "", x), header = FALSE, fill = TRUE)
V1 V2 V3 V4 V5 V6 V7
1  1  2  3  4  5 NA NA
2  1  2  3  4  5  6 NA
3  1  2  3  4  5  6  7

最新更新