我有一个大文件,其中的"部分"被***
分解了。 我必须为每个部分创建一个哈希值,并在其中每个部分以新格式编写一个新文件(我可能会为部分编写多个文件)。 每个部分都需要唯一的逻辑才能转换为哈希(由"或""或"="或"/some patter/'拆分)。
我正在寻找一种识别部分并应用适当的逻辑将部分转换为哈希的方法。 我可以编写单个逻辑片段,但它们是基于模式调用的单独方法或类吗?
由于文件很大,我正在尝试逐行读取、操作和写入。 我已经看到了在部分之间划线的方法,但并不特别关心这种类型的解决方案。 我对如何在部分之间抓取行并在适当的时候逐行应用不同的逻辑位有点困惑。
任何方向都值得赞赏。 谢谢!
以下是一些输入文件:
*** Summary ***
Job Name = test Date created: Mon Jan 14 15:48:33 2013
*** Analysis Information
Steady State is ON
Turbulent Incompressible Flow is ON
Static Temperature Equation is ON
Mixed Convection is ON
*** Field Variable Results Summary For Iteration 300
Var Mean at Max at Min
Vx Vel +5.71519e+002 1320103 +3.02718e+004 1319857 -2.66582e+004 mm/s
Vy Vel +3.40035e+002 158922 +2.79257e+004 1319731 -1.42855e+004 mm/s
Vz Vel -7.17959e+002 1318038 +1.62986e+004 1319053 -2.21582e+004 mm/s
Press -2.05980e+001 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2
Temp +4.60000e+001 10965 +4.60000e+001 315867 +4.60000e+001 C
TurbK +1.19616e+006 1319490 +1.44421e+008 10966 +1.81700e-008 mm^2/s^2
TurbD +1.71412e+009 1319490 +2.88554e+011 233065 +5.37798e-004 mm^2/s^3
Scal1 +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000
PTotl -5.91285e+000 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2
EVisc +2.52037e-004 1320370 +1.14488e-002 2229 +0.00000e+000 g/mm-s
ECond +1.05355e-002 1352833 +5.88890e-002 2229 +0.00000e+000 W/mm-K
Dens +2.34793e-004 58024 +3.43080e-003 315867 +1.20473e-006 g/mm^3
Visc +1.62605e-005 10965 +1.81700e-005 2229 +0.00000e+000 g/mm-s
Cond +2.50840e-002 2229 +2.04000e-001 315867 +2.56300e-005 W/mm-K
SpecH +1.01202e+000 38432 +1.81000e+000 10249 +1.00500e-003 J/g-K
Emiss +8.94911e-001 10965 +1.00000e+000 2229 +0.00000e+000
Transmiss +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000
WRough +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000 mm
SeeBeck +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000 V/K
GenT +1.11977e+003 223286 +1.18027e+005 584515 +3.19558e-013 1/s
*** Openings ***
*** Outlet 1 ***
Surface ID = 2329
Node near Minimum X,Y,Z of opening = 11761
Minimum X,Y,Z of opening = 369.964000, 11.275438, -98.433898
Mass Flow Out = -1.55703 g/s
Volume Flow Out = -1.29242e+006 mm^3/s
Reynolds Number = 1303.45
Outlet Bulk Pressure = -0 N/m^2
Outlet Bulk Temperature = 46 C
Outlet Mach Number = 0.00734951
*** Outlet 2 ***
Surface ID = 2332
Node near Minimum X,Y,Z of opening = 11125
Minimum X,Y,Z of opening = 369.964000, 73.727289, -114.615876
Mass Flow Out = -20.4612 g/s
Volume Flow Out = -1.6984e+007 mm^3/s
Reynolds Number = 11182.5
Outlet Bulk Pressure = -0 N/m^2
Outlet Bulk Temperature = 46 C
Outlet Mach Number = 0.0079087
*** Outlet 3 ***
Surface ID = 2335
Node near Minimum X,Y,Z of opening = 10924
Minimum X,Y,Z of opening = 369.964000, 164.751344, 40.640056
Mass Flow Out = -32.8714 g/s
Volume Flow Out = -2.72852e+007 mm^3/s
Reynolds Number = 17965
Outlet Bulk Pressure = -0 N/m^2
Outlet Bulk Temperature = 46 C
Outlet Mach Number = 0.00750077
*** Fluid Energy Balance Information:
MdotIn x Cp x (TOut - TIn) = 663.69 Watts
(Numerical) Energy Out - Energy In = 0.36447 Watts
Heat Transfer from Wall To Fluid = 761.35 Watts
Heat Transfer Due to Sources In Fluid = 0 Watts
*** Solid Energy Balance Information:
Heat Transfer from Exterior To Solid = 0 Watts
Heat Transfer Due to Sources In Solid = 761 Watts
Heat Transfer From Fluid To Solid = -761.31 Watts
*** Sum of Fluid Forces on Walls ***
ShearX, PressX = 68651 78199 microNewtons
ShearY, PressY = 39030 6.9349e+006 microNewtons
ShearZ, PressZ = -19749 -4.1017e+006 microNewtons
*** Data for internal fans
Fan Part Id = 16 Fan Name = fname1
Operating Pressure Rise = 0.46945 Inches of Water
Operating FlowRate = 36.0109 CFM
Fan Part Id = 94 Fan Name = fname2
Operating Pressure Rise = 0.309645 Inches of Water
Operating FlowRate = 2.33407 CFM
Fan Part Id = 95 Fan Name = fname3
Operating Pressure Rise = 0.267133 Inches of Water
Operating FlowRate = 8.78264 CFM
*** Analysis Statistics:
Input: 461 seconds
Analysis: 12686 seconds
Output: 179 seconds
Total: 13326 seconds
到目前为止,这是我所拥有的:
sum_file = File.new('sum_file.sum', 'r')
sum_file_hashed = File.new('sum_file_hashed', 'w')
inSection = false #flag when in or out of a section?
while (line = sum_file.gets ) #while reading lines
case line
when /*{3}/ #Found Sections by ***
inSection = true #in a section
l = line.gsub('*', '').strip
sum_file_hashed.puts('Found a section: ' + l ) #write section name
end
### I'm not sure how to introduce specific logic when in a certain type of section ###
end
sum_file.close
sum_file_hashed.close
我现在正在尝试类似的事情:
while /found section/
if /match pattern a/
call parsera
if /match pattern b/
call parserb
end
parsera
while =! /a section/
do stuff
return?
Ruby的Enumerable包含slice_before
非常适合此类任务,根据某些标记将文件分解为块。
require 'pp'
blocks = DATA.readlines.map(&:strip).reject{ |l| l == '' }.slice_before(/A*{3}/)
pp blocks.to_a
__END__
*** Summary ***
Job Name = test Date created: Mon Jan 14 15:48:33 2013
*** Analysis Information
Steady State is ON
Turbulent Incompressible Flow is ON
Static Temperature Equation is ON
Mixed Convection is ON
*** Field Variable Results Summary For Iteration 300
Var Mean at Max at Min
Vx Vel +5.71519e+002 1320103 +3.02718e+004 1319857 -2.66582e+004 mm/s
Vy Vel +3.40035e+002 158922 +2.79257e+004 1319731 -1.42855e+004 mm/s
Vz Vel -7.17959e+002 1318038 +1.62986e+004 1319053 -2.21582e+004 mm/s
Press -2.05980e+001 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2
Temp +4.60000e+001 10965 +4.60000e+001 315867 +4.60000e+001 C
TurbK +1.19616e+006 1319490 +1.44421e+008 10966 +1.81700e-008 mm^2/s^2
TurbD +1.71412e+009 1319490 +2.88554e+011 233065 +5.37798e-004 mm^2/s^3
Scal1 +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000
PTotl -5.91285e+000 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2
EVisc +2.52037e-004 1320370 +1.14488e-002 2229 +0.00000e+000 g/mm-s
ECond +1.05355e-002 1352833 +5.88890e-002 2229 +0.00000e+000 W/mm-K
Dens +2.34793e-004 58024 +3.43080e-003 315867 +1.20473e-006 g/mm^3
Visc +1.62605e-005 10965 +1.81700e-005 2229 +0.00000e+000 g/mm-s
Cond +2.50840e-002 2229 +2.04000e-001 315867 +2.56300e-005 W/mm-K
SpecH +1.01202e+000 38432 +1.81000e+000 10249 +1.00500e-003 J/g-K
Emiss +8.94911e-001 10965 +1.00000e+000 2229 +0.00000e+000
Transmiss +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000
WRough +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000 mm
SeeBeck +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000 V/K
GenT +1.11977e+003 223286 +1.18027e+005 584515 +3.19558e-013 1/s
我缩短了数据,因为样本使用量太多了。
运行代码输出:
[["*** 摘要 ***", "作业名称 = 测试创建日期:2013 年 1 月 14 日星期一 15:48:33"], ["*** 分析信息", "稳态开启", "湍流不可压缩流开启", "静态温度方程开启", "混合对流已打开"], ["迭代 300 的字段变量结果摘要", "Var Mean at Max at Min", "Vx Vel +5.71519e+002 1320103 +3.02718e+004 1319857 -2.66582e+004 mm/s", "Vy Vel +3.40035e+002 158922 +2.79257e+004 1319731 -1.42855e+004 mm/s", "Vz Vel -7.17959e+002 1318038 +1.62986e+004 1319053 -2.21582e+004 mm/s", "按 -2.05980e+001 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2", "温度 +4.60000e+001 10965 +4.60000e+001 315867 +4.60000e+001 C", "TurbK +1.19616e+006 1319490 +1.44421e+008 10966 +1.81700e-008 mm^2/s^2", "TurbD +1.71412e+009 1319490 +2.88554e+011 233065 +5.37798e-004 mm^2/s^3", "Scal1 +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000", "PTotl -5.91285e+000 50858 +5.19412e+003 50905 -1.44865e+003 N/m^2", "EVisc +2.52037e-004 1320370 +1.14488e-002 2229 +0.00000e+000 g/mm-s", "ECond +1.05355e-002 1352833 +5.88890e-002 2229 +0.00000e+000 W/mm-K", "Dens +2.34793e-004 58024 +3.43080e-003 315867 +1.20473e-006 g/mm^3", "Visc +1.62605e-005 10965 +1.81700e-005 2229 +0.00000e+000 g/mm-s", "Cond +2.50840e-002 2229 +2.04000e-001 315867 +2.56300e-005 W/mm-K", "SpecH +1.01202e+000 38432 +1.81000e+000 10249 +1.00500e-003 J/g-K", "Emiss +8.94911e-001 10965 +1.00000e+000 2229 +0.00000e+000", "Transmiss +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000", "WRough +0.00000e+000 10965 +0.00000e+000 315867 +0.00000e+000 mm", "SeeBeck +0.00000e+000 0 +0.00000e+000 0 +0.00000e+000 V/K", "GenT +1.11977e+003 223286 +1.18027e+005 584515 +3.19558e-013 1/s"]]
该文件已转换为数组数组。前导和尾随空格以及换行符和回车符被剥离,空白行被删除。
进一步处理文件是使用外部数组上的循环完成的,并且测试查看每个子数组的第一行以确定如何处理该块。像这样的东西将是一个起点:
hash = {}
blocks.each do |block|
case block.shift
when /Summary/
# process the summary information
when /Analysis Information/
# process the analysis information
hash[:analysis_information] = Hash[block.map{ |r| r.split(/ +is +/) }]
when /Field Variable Results/
# process the field variable results
end
end
这个想法是,当代码完成后,hash
将包含哈希哈希或数组哈希中的解析数据,准备写出来。我建议考虑使用 YAML,因为它可以缩短将数据序列化为文件的工作。
我不打算添加更多,因为这个问题听起来很像家庭作业,而且解析行并不难。将文件分解为块是一项更大的任务,这部分已经为您完成。
您可以将该节中的所有行添加到数组中。如果下一节开始或您位于文件末尾,则使用此数组作为参数调用特定于节的方法。这样,您就不必在检测部分时处理部分逻辑。
编辑:没有经过测试,但这应该给出一个想法:
def parse_section_name(name)
name.gsub!(/*/, '') # remove *'s
return name.strip # remove whitespace from both sides
end
def call_section_logic(name, lines)
case name
when ...
...
else # unknown name
end
end
section_lines = []
lines = file.readlines + ['***'] # add last section, too
lines.each do |line|
if line =~ /^s**{3}/ # detect section
call_section_logic(section_name, section_lines)
section_name = parse_section_name(line)
section_lines.clear
else
section_lines << line # add line to array
end
end