使用shell脚本将文本文件中的问题转换为JSON格式



我有一个文本文件,内容为:

//input.txt
1) What is the first month of the year?
a) March
b) February
c) January
d) December
Answer: c) January
2) What is the last month of the year?
a) July
b) December
c) August
d) May
Answer: b) December
我想写一个shell脚本,循环遍历这个文件input.txt(它有许多相同格式的内容),并产生类似于下面的JSON 的输出
[
{
"question": "What is the first month of the year?",
"a": "March",
"b": "February",
"c": "January",
"d": "December",
"answer": "January",
},
{
"question": "What is the last month of the year?",
"a": "July",
"b": "December",
"c": "August",
"d": "May",
"answer": "December",
},
[

我开始尝试编写一个bash脚本,循环遍历文件,并将每一行用空行分隔到花括号中,将花括号中的每一项放入引号中,并用逗号分隔,但它不起作用

#!/bin/bash
output=""
while read line; do
if [ -z "$line" ]; then
output+="}n"
else
output+=""${line}","
if [ $(echo "$output" | tail -n 1) == "" ]; then
output+="{"
fi
fi
done < input.txt
output+="}"
echo "$output" > output.txt

下面是使用jq的一种方法:

jq -R -s '
sub("n+$"; "") |
split("nn") | map(
split("n") | map(split(") ")) | [
{question: .[0][1]},
(.[1:-1][] | {(.[0]): .[1]}),
{answer: .[-1][1]}
] | add
)' input.txt
<<p><一口>在线演示/一口>

尝试用Bash生成正确的JSON将会使您绞尽脑汁。

首先,您的示例JSON输出不是正确的JSON。在数组和映射中不支持末尾的,。所以你的例子需要是:

[{
"question": "What is the first month of the year?",
"a": "March",
"b": "February",
"c": "January",
"d": "December",
"answer": "January"
},
{
"question": "What is the last month of the year?",
"a": "July",
"b": "December",
"c": "August",
"d": "May",
"answer": "December"
}
]

(注意每个"answer"后面或最后一个}后面没有,。使用工具或jsonlint)检查有效的JSON

要从您的输入生成JSON,有许多JSON生成器工具。对于ME来说最简单的是Ruby:

ruby -00 -r json -ne '
BEGIN{out=[]}
sub(/Ad+)s+/,"question)")
sub(/Answer: [a-z]/,"answer")
out << $_.split(/R/).map{|l| l.split(/[):]s*/,2)}.to_h
END{puts JSON.pretty_generate(out)}' file 

打印:

[
{
"question": "What is the first month of the year?",
"a": "March",
"b": "February",
"c": "January",
"d": "December",
"answer": "January"
},
{
"question": "What is the last month of the year?",
"a": "July",
"b": "December",
"c": "August",
"d": "May",
"answer": "December"
}
]

使用json解析器的两种不同方法(结果相同)xidel:

$ xidel -s input.txt -e '
array{
for $x in tokenize($raw,"nn")
let $a:=tokenize($x,"n")
return
map:merge((
{"question":substring-after($a[1],") ")},
$a[position() = 2 to 5] ! {substring-before(.,")"):substring-after(.,") ")},
{"answer":substring-after($a[6],") ")}
))
}
'
$ xidel -s input.txt -e '
array{
for $x in tokenize($raw,"nn") return
map:merge(
for $y at $i in tokenize($x,"n") return {
if ($i eq 1) then "question"
else if ($i eq 6) then "answer"
else substring-before($y,")"):
substring-after($y,") ")
}
)
}
'