我有一个文本文件:
Some comment on the 1st line of the file.
processing date: 31.8.2016
amount: -1.23
currency: EUR
balance: 1234.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info: Amount: 1.23 EUR 29.08.2016 Place: 123456789XY
processing date: 30.8.2016
amount: -2.23
currency: EUR
balance: 12345.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info: Amount: 2.23 EUR 28.08.2016 Place: 123456789XY
processing date: 29.8.2016
amount: -3.23
currency: EUR
balance: 123456.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info: Amount: 2.23 EUR 27.08.2016 Place: 123456789XY
我需要处理文件,所以我将有右边的值,31.8.2016
, -1.23
, EUR
, 1234.56
等,存储在MySQL数据库中。
我只实现返回包含特定字符串的行的1次出现或使用find
或find_all
的所有行,但这还不够,因为我需要以某种方式识别以"处理日期:"开始并以"附加信息:"结束的块,并处理那里的值,然后处理下一个块,下一个,直到文件结束。
有什么提示如何实现这一点吗?
我想从这个开始:
File.foreach('data.txt', "nn") do |li|
next unless li[/^processing/]
puts "'#{li.strip}'"
end
如果"data.txt"包含您的内容,foreach
将读取该文件并返回li
中的文本段落,而不是行。一旦你有了这些,你就可以随心所欲地操纵它们。这是非常快速和高效的,并且没有readlines
或任何基于read
的I/O可能存在的可伸缩性问题。
输出:
'processing date: 31.8.2016
amount: -1.23
currency: EUR
balance: 1234.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info: Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'
'processing date: 30.8.2016
amount: -2.23
currency: EUR
balance: 12345.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info: Amount: 2.23 EUR 28.08.2016 Place: 123456789XY'
'processing date: 29.8.2016
amount: -3.23
currency: EUR
balance: 123456.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info: Amount: 2.23 EUR 27.08.2016 Place: 123456789XY'
你可以从包装'
中看到,文件是按"nn"
描述的块或段落读取的,然后每个块被剥离以去除尾随的空白。
参见foreach
文档获取更多信息。
split(':', 2)
是你的朋友:
'processing date: 31.8.2016'.split(':', 2) # => ["processing date", " 31.8.2016"]
'amount: -1.23'.split(':', 2) # => ["amount", " -1.23"]
'currency: EUR'.split(':', 2) # => ["currency", " EUR"]
'balance: 1234.56'.split(':', 2) # => ["balance", " 1234.56"]
'payer reference: /VS123456/SS0011223344/KS1212'.split(':', 2) # => ["payer reference", " /VS123456/SS0011223344/KS1212"]
'type of the transaction: Some type of the transaction 1'.split(':', 2) # => ["type of the transaction", " Some type of the transaction 1"]
'additional info: Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'.split(':', 2) # => ["additional info", " Amount: 1.23 EUR 29.08.2016 Place: 123456789XY"]
你可以这样做:
text = 'processing date: 31.8.2016
amount: -1.23
currency: EUR
balance: 1234.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info: Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'
text.lines.map{ |li| li.split(':', 2).map(&:strip) }.to_h
# => {"processing date"=>"31.8.2016", "amount"=>"-1.23", "currency"=>"EUR", "balance"=>"1234.56", "payer reference"=>"/VS123456/SS0011223344/KS1212", "type of the transaction"=>"Some type of the transaction 1", "additional info"=>"Amount: 1.23 EUR 29.08.2016 Place: 123456789XY"}
有许多方法可以继续将信息解析为更有用的数据,但这取决于您自己。