r语言 - 如何通过逗号分割字符串,但保留日期?



我在R

中有一个这样的字符串
ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,

我想做一些类似str.split()的事情,通过逗号和引号的所有组合划分为字符串数组,但保留引号中的逗号表示日期,以便我得到:

ABCDE
January 10, 2010
F
GH
March 9, 2009

感谢

这是一种方法

data.frame(list = na.omit(
unname(unlist(read.csv(
text = 'ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,', 
check.names = F, header = F)))))
list
1            ABCDE
2 January 10, 2010
3            FALSE
4               GH
5    March 9, 2009

您可能应该在这里使用CSV解析器,但如果您想使用纯正则表达式方法,您可以尝试:

library(stringr)
library(dplyr)
x <- "ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,"
y <- str_match_all(x, ""(.*?)"|[^,]+")[[1]]
output <- coalesce(y[,2], y[,1])
output
[1] "ABCDE"            "January 10, 2010" "F"                "GH"
[5] "March 9, 2009"

regex模式使用了一个交替的技巧,表示匹配:

  • "(.*?)"匹配引号中的日期,但不捕获引号
  • |
  • [^,]+匹配单个CSV项

如果模式如所示,那么一个regex选项将是创建分隔符并使用read.table

read.table(text = gsub('"', '', gsub('("[^,"]+,)(*SKIP)(*FAIL)|,',
'n', trimws(gsub(",{2,}", ",", str1), whitespace = ","), perl = TRUE)), 
header = FALSE, fill = TRUE, sep = "n")

与产出

V1
1            ABCDE
2 January 10, 2010
3                F
4               GH
5    March 9, 2009

scan

data.frame(V1 = setdiff(scan(text = str1, sep = ",",
what = character()), ""))

与产出

V1
1            ABCDE
2 January 10, 2010
3                F
4               GH
5    March 9, 2009

数据
str1 <- "ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,"

另一个选项可以是:

na.omit(stack(read.csv(text = str1, header = FALSE)))[1]
values
1            ABCDE
2 January 10, 2010
3            FALSE
4               GH
5    March 9, 2009

txt <- 'ABCDE,"January 10, 2010",F,,,,GH,"March 9, 2009",,,'

相关内容

  • 没有找到相关文章