根据文件中的模式重构文本文件



我想使用 Scala 重新格式化文本文件的内容,例如给定的示例文件:

"good service"
Tom Martin (USA) 17th October 2015    
4    
Hi my name is
Tom.
I love boardgames.
Aircraft    TXT-102   
"not bad"
M Muller (Canada) 22nd September 2015
6
Hi
I
like
boardgames.
Aircraft    TXT-101
Type Of Customer    Couple Leisure
Cabin Flown FirstClass
Route   IND to CHI
Date Flown  September 2015
Seat Comfort    12345
Cabin Staff Service 12345
.
.

改革到这个:

"good service"
Tom Martin (USA) 17th October 2015    
4    
Hi my name is Tom. I love boardgames.
Aircraft    TXT-102    
"not bad"
M Muller (Canada) 22nd September 2015
6
Hi I like boardgames.
Aircraft    TXT-101
Type Of Customer    Couple Leisure
Cabin Flown FirstClass
Route   IND to CHI
Date Flown  September 2015
Seat Comfort    12345
Cabin Staff Service 12345
.
.

我已经确定了我的文件的模式,即:这个多行字符串位于由制表符分隔的数字和单词之间。例如,第一个块的多行内容位于4 and Aircraft TXT-102 .第二个块的多行内容介于6 and Aircraft TXT-101 此外,块由两条新线分隔。

我知道使用正则表达式的模式匹配可以提供帮助,但我不知道如何处理文件。

我会做什么,在伪代码中:

while more lines available { 
    lines_so_far = read input until a number is seen
    output(lines_so_far)
    lines_to_join = read input until "Aircraft" is seen
    output(joined lines_to_join)
}

仅由数字组成的行的正则表达式^d+$;对于以"Airline"开头的行,^Airline .* 。方便的查看方法是 takeWhile .

最新更新