使用 python 处理和创建外部软件的输入文件

当我编程时，我经常使用外部软件进行繁重的计算，然后用Python分析结果。这些外部软件通常是Fortran，C或C++，通过为它们提供输入文件来工作。这可以是一个小文件，告诉执行哪种模式进行某些计算，也可以是它必须处理的大型数据文件。这些文件通常使用某种格式（数据列之间有如此多的空格）。例如，下面给出了我当前使用的数据文件。

This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9

我的问题是是否存在一个 Python 库来创建这样的输入文件，通过阅读模板（由同事提供或从外部软件的文档）？

通常，我以NumPy格式拥有所有列，并希望将其提供给创建输入文件的函数，以模板为例。我不是在寻找一种蛮力方法，这种方法很快就会变得丑陋。

我不确定在这里搜索什么，任何帮助都值得赞赏。

我基本上可以用savetxt复制您的样本。它的fmt变量为我提供了与FORTRAN代码用于读取和写入文件的相同类型的格式控件。它以与FORTRAN和C打印相同的方式保留空间。

import numpy as np
example = """
This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
...
"""
lines = example.split('n')[1:]
header = lines[0]
data = []
for line in lines[1:]:
  if len(line):
    data.append([float(x) for x in line.split()])
data = np.array(data)
fmt = '%10.3f %9.1f %9.2f %9.3f %20.1f'  # similar to a FORTRAN format statment
filename = 'stack21865757.txt'
with open(filename,'w') as f:
  np.savetxt(f, data, fmt, header=header)
with open(filename) as f:
  print f.read()

生产：

# This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                 11.2
  7353.510      26.0      4.73    -1.570                  3.5
...

编辑

下面是一个粗略的脚本，它将示例行转换为格式：

import re
tmplt = '  7352.103      26.0      2.61    -8.397                         11.2'
def fmt_from_template(tmplt):
    pat = r'( *-?d+.(d+))' # one number with its decimal
    fmt = []
    while tmplt:
        match = re.search(pat,tmplt)
        if match:
            x = len(match.group(1)) # length of the whole number
            d = len(match.group(2)) # length of decimals
            fmt += ['%%%d.%df'%(x,d)]
            tmplt = tmplt[x:]
    fmt = ''.join(fmt)
    return fmt
print fmt_from_template(tmplt)
# %10.3f%10.1f%10.2f%10.3f%29.1f

adapating hpaulj andwer 神奇地提取 savetxt 的 FMT

from __future__ import print_function
import numpy as np
import re
example = """
This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9
"""
def extract_format(line):
  def iter():
    for match in re.finditer(r"s+-?d+.(d+)",line):
      yield "%{}.{}f".format(len(match.group(0)),len(match.group(1)))
  return "".join(iter())
lines = example.split('n')[1:]
header = lines[0]
data = []
for line in lines[1:]:
  if len(line):
    data.append([float(x) for x in line.split()])
data = np.array(data)
fmt = extract_format(lines[1])  # similar to a FORTRAN format statment
filename = 'stack21865757.txt'
with open(filename,'w') as f:
  print(header,file=f)
  np.savetxt(f, data, fmt)
with open(filename) as f:
  print (f.read())

生产

This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9

如果你的标题总是相同的，那么你可以看看熊猫。这将允许您通过从标题中知道列的名称来非常轻松地移动列。即使标题并不总是相同的，如果您可以从模板中获取标头，那么它仍然可以重新排列它。

如果我误解了这个问题，那么我很抱歉，但更具体的数据或更长的例子可能会获得更多帮助。

相关内容

最新更新

热门标签：