在正则表达式匹配处断句,并使用javascript创建数组



我有一个字符串,包含如下的多项选择问题和答案:

(1) Capital of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Comilla। Ans (A) Dhaka
(2) Largest city of Bangladesh is-
(A) Mirpur (B) Rangpur (C) Chittagong (D) Comilla। Ans (C) Chittagong
(3) Smallest city of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Meherpur। Ans (B) Rangpur

我需要从上面的字符串创建json,这样它就可以创建带有答案的单独问题

最后的json将类似于:

{"questions":
[
{
"options":["Dhaka","Rangpur","Chittagong","Comilla"],
"body":"Capital of Bangladesh is-",
"answers":["A"]
},
{
"options":["Mirpur","Rangpur","Chittagong","Comilla"],
"body":"Capital of Bangladesh is-",
"answers":["C"]
}   
]
}

我试过

var result = reader.result.split('n');
for (var index = 0; index < result.length; index++) {
var question = result[index]
if(question.match("/[(/)]/g")){
questions.push = question
}
else {
questions.push = question
}
}
console.log(questions)

我如何使它成为

试试这个

我们需要/u来处理unicode,然后。+而不是因为双字节

使用Unicode正则表达式的更多内容

正则表达式和

const str = `(1) The main language of Bangladesh is-
(ক) বাংলা (খ) ইংরেজি  (C) Hindi (D) French। Ans (ক) বাংলা
(2) Largest city of Bangladesh is-
(A) Mirpur (B) Rangpur (C) Chittagong (D) Comilla। Ans (C) Chittagong
(3) Smallest city of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Meherpur। Ans (B) Rangpur`;
const obj = str.split(/n/u).reduce((acc,line,i) => { 
if (i%2===0) acc.questions.push({"body":line.match(/(.+) (.*)/u)[1]}); // remove the (X) from the question
else {
const curItem = acc.questions[acc.questions.length-1]; // last pushed object
let [optionStr,answer] = line.split(/। /u);// split on this special character
// assuming 4 options 
curItem.options = optionStr
.match(/(.+) (.+) (.+) (.+) (.+) (.+) (.+) (.+)/u)
.slice(1); // drop the first element from the result (full match)
answer = answer.match(/((.+))/u)[1]; // just get the letter from the bracket
curItem.answers = [answer];
}  
return acc
},{questions:[]})
console.log(obj)

您还可以使用一个模式来获取捕获组中的问答部分。然后对于答案部分,您可以在括号之间拆分大写字符。

带有捕获组的模式:

^(d+) (.+)n(([A-Z]).*?)। Ans (([A-Z]))
  • ^字符串开始
  • (d+)匹配括号和空格之间的1+位数字
  • (.+)n捕获组1,匹配行的其余部分和换行符
  • (([A-Z]).*?)捕获组2,在括号之间匹配一个大写字符,后跟尽可能少的字符
  • । Ans按字面匹配
  • (([A-Z]))捕获组3中括号之间的大写字符

Regex演示

或者使用unicode类别(如果支持(:

^(p{Nd})s+(.+)n((p{L}).*?)।s+Anss+((p{L}))

Regex演示

代码中的第1组值由i[1]等表示

const str = `(1) Capital of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Comilla। Ans (A) Dhaka
(2) Largest city of Bangladesh is-
(A) Mirpur (B) Rangpur (C) Chittagong (D) Comilla। Ans (C) Chittagong
(3) Smallest city of Bangladesh is-
(ক) বাংলা (খ) ইংরেজি । Ans (B) Rangpur`;
const regex = /^(p{Nd}+)s+(.+)n((p{L}).*?)।s+Anss+((p{L}))/gum;
let result = {
questions: Array.from(str.matchAll(regex)).map(i =>
({
options: i[2].split(/s*(p{L})s*/u).filter(Boolean),
body: i[1],
answers: [i[3]]
})
)
};
console.log(result);

或者使用否定字符类[^()]+来匹配括号之间的内容的exmaple。

const str = `(1) Capital of Bangladesh is-
(A) Dhaka (B) Rangpur (C) Chittagong (D) Comilla। Ans (A) Dhaka
(2) Largest city of Bangladesh is-
(A) Mirpur (B) Rangpur (C) Chittagong (D) Comilla। Ans (C) Chittagong
(3) Smallest city of Bangladesh is-
(ক) বাংলা (খ) ইংরেজি । Ans (B) Rangpur`;
const regex = /^([^()]+)s+(.+)n(([^()]+).*?)।s+Anss+(([^()]+))/gm;
let result = {
questions: Array.from(str.matchAll(regex)).map(i =>
({
options: i[2].split(/s*([^()]+)s*/).filter(Boolean),
body: i[1],
answers: [i[3]]
})
)
};
console.log(result);

最新更新