我有一个来自Progress OpenEdge系统的数据导出,我想用JavaScript解析。我想使用正则表达式查找导出的所有字段。
我已经尝试了很多类似的事情:/("[^"]*")|[^s]+/g
我也试着尝试消极向前看的(?!"")
,但到目前为止我还没有成功。
示例导出输出可能类似于以下内容:
12345 24,25 0 2015-06-30T14:53:14.891 "12345" "24,25" "0" "2015-06-30T14:53:14.891" "" yes no ? "String with ""quoted"" word" "String
with a multi
line string. "" <- Just a quote
" " This is the last value "
6789 35,36 0 2016-07-31T15:54:15.892 "6789" "35,36" "0" "2016-07-31T15:54:15.892" "" no yes ? "Just a simple string" ? ?
字段为:
DEFINE TEMP-TABLE tt_test NO-UNDO
FIELD valueA AS INTEGER
FIELD valueB AS DECIMAL
FIELD valueC AS INTEGER
FIELD valueD AS DATETIME
FIELD valueE AS CHARACTER
FIELD valueF AS CHARACTER
FIELD valueG AS CHARACTER
FIELD valueH AS CHARACTER
FIELD valueI AS CHARACTER
FIELD valueJ AS LOGICAL
FIELD valueK AS LOGICAL
FIELD valueL AS LOGICAL
FIELD valueM AS CHARACTER
FIELD valueN AS CHARACTER
FIELD valueO AS CHARACTER
.
导出格式为:所有字段之间用空格分隔。字符串包含在双引号字符(")中。如果字符串中有引号,使用两个双引号字符(")进行转义。如果存在一个空字符串,该字符串也是两个双引号字符("),但它们之间有分隔符
实际的数据类型和这是一个进度系统的事实并不重要,这只是给我的问题一些背景。
所以总结:我如何写一个(JavaScript兼容)正则表达式,成功地分离导出数据的不同部分,同时忽略字符串中的转义双引号?
我不认为这是可行的一个单一的正则表达式。这里需要一个解析器。幸运的是,它很容易编写,例如:
str = `12345 24,25 0 2015-06-30T14:53:14.891 "12345" "24,25" "0" "2015-06-30T14:53:14.891" "" yes no ? "String with ""quoted"" word" "String
with a multi
line string. "" <- Just a quote
" " This is the last value "
6789 35,36 0 2016-07-31T15:54:15.892 "6789" "35,36" "0" "2016-07-31T15:54:15.892" "" no yes ? "Just a simple string" ? ?`;
str = str.replace(/""/g, '@');
matches = str.match(/"([sS]*?)"|S+|n/g);
rows = [[]]
for(var m of matches) {
if (m === 'n') {
rows.push([]);
continue;
}
if(m === '@') {
m = '';
}
if (m[0] === '"') {
m = m.slice(1, -1);
}
m = m.replace(/@/g, '"');
rows[rows.length - 1].push(m)
}
console.log(rows)