我想使用 Scala 重新格式化文本文件的内容,例如给定的示例文件:
"good service"
Tom Martin (USA) 17th October 2015
4
Hi my name is
Tom.
I love boardgames.
Aircraft TXT-102
"not bad"
M Muller (Canada) 22nd September 2015
6
Hi
I
like
boardgames.
Aircraft TXT-101
Type Of Customer Couple Leisure
Cabin Flown FirstClass
Route IND to CHI
Date Flown September 2015
Seat Comfort 12345
Cabin Staff Service 12345
.
.
改革到这个:
"good service"
Tom Martin (USA) 17th October 2015
4
Hi my name is Tom. I love boardgames.
Aircraft TXT-102
"not bad"
M Muller (Canada) 22nd September 2015
6
Hi I like boardgames.
Aircraft TXT-101
Type Of Customer Couple Leisure
Cabin Flown FirstClass
Route IND to CHI
Date Flown September 2015
Seat Comfort 12345
Cabin Staff Service 12345
.
.
我已经确定了我的文件的模式,即:这个多行字符串位于由制表符分隔的数字和单词之间。例如,第一个块的多行内容位于4 and Aircraft TXT-102
.第二个块的多行内容介于6 and Aircraft TXT-101
此外,块由两条新线分隔。
我知道使用正则表达式的模式匹配可以提供帮助,但我不知道如何处理文件。
我会做什么,在伪代码中:
while more lines available {
lines_so_far = read input until a number is seen
output(lines_so_far)
lines_to_join = read input until "Aircraft" is seen
output(joined lines_to_join)
}
仅由数字组成的行的正则表达式^d+$
;对于以"Airline"开头的行,^Airline .*
。方便的查看方法是 takeWhile
.