替换fileinput中的数字,保持旧的文件格式并忽略逗号



嗨,我有一个来自另一个程序的配置文件,我想用python操作。唯一需要替换的是文件中的数字。文件本身有一些制表符、带空格符和逗号。我能够阅读文件,而熊猫只保留了相关信息。现在我想用pandas数据框中的数字替换文件中的数字。这是我到目前为止的代码:

import pandas as pd
import fileinput
params = pd.read_csv("params.txt", skip_blank_lines=True, delim_whitespace=True, skiprows=1, header=None, names=["paramname", "equal", "param1", "param2"])
#params.columns = params.columns.str.strip()
params.drop(params.columns[1], axis=1, inplace=True)
params.replace(',','', regex=True, inplace=True)
params.replace('t','', regex=True, inplace=True)
params.replace(' ','', regex=True, inplace=True)
it = 0
rowidx = 0
colidx = "param1"

with fileinput.FileInput("params.txt", inplace=True, backup='.bak') as file:
for line in file:
if (it % 2 == 0):
colidx = "param1"
else:
colidx = "param2"
print(' '.join([params.loc[rowidx, colidx] if x.lstrip('-,').isnumeric() else x for x in line.split()]))
if (it % 2):
rowidx += 1
it += 1

大多数事情都按照我的意图工作,但我仍然有一些问题,我找不到解决方案。我也能够用lstrip函数代替负数,但也有一些数字后面有一个逗号,lstrip('-,')似乎不适合。另一个问题是,虽然数字被正确地替换了,但文件的重写似乎取代了原始文件的格式(制表符被删除了)。

我的问题是:

  1. 如何替换后面跟着逗号的数字而不替换逗号本身?
  2. 如何保持输入文件(制表符)的原始格式等)?
  3. 如何将0.5这样的浮点数也考虑在内?

文件是这样的:

Parameter File
QQ  =   7,  6
RR  =   5,  5
SS  =   0,  0
ay_on   =   0,  0
by_on   =   0,  1
cvc_on  =   1
mvc_on  =   1
rc      =   0
adus    =   -200,   -200
bdus    =   200,    200
au      =   -5, -2
bu      =   5,  2
ay      =   7,  0
by      =   10, 0.5
con_Ayl =
con_byl =
Hp      =   100
Hu      =   20
Hw      =   1
Tsamp   =   100
model   =   1
soft    =   3
Qfac    =   5
sgm_r   =   0.01,   0.01
Tobs    =   -1
mpc_on  =   1

数据框是这样的:

paramname param1  param2
0         QQ      7    6.00
1         RR      5    5.00
2         SS      0    0.00
3      ay_on      0    0.00
4      by_on      0    1.00
5     cvc_on      1     NaN
6     mvc_on      1     NaN
7         rc      0     NaN
8       adus   -200 -200.00
9       bdus    200  200.00
10        au     -5   -2.00
11        bu      5    2.00
12        ay      7    0.00
13        by     10    0.50
14   con_Ayl    NaN     NaN
15   con_byl    NaN     NaN
16        Hp    100     NaN
17        Hu     20     NaN
18        Hw      1     NaN
19     Tsamp    100     NaN
20     model      1     NaN
21      soft      3     NaN
22      Qfac      5     NaN
23     sgm_r   0.01    0.01
24      Tobs     -1     NaN
25    mpc_on      1     NaN

编辑:我用逗号和浮点数解决了这个问题:

print(' '.join([params.loc[rowidx, colidx] if (x.lstrip('-').isnumeric() or isfloat(x.lstrip('-'))) else params.loc[rowidx, colidx] + ',' if x.endswith(',') else x for x in line.split()]))

从文件中删除所有选项卡的问题仍然存在。我该如何预防呢?

我倾向于先解析文件,然后为pd.Dataframe生成记录。

import pandas as pd
import numpy as np

def parse_file_to_records(file_name):
records = []
with open(file_name, 'r') as f:
lines = f.readlines()
# skip the first line
for line in lines[1:]:
line = line.strip().replace(' ', '')
if not line:
continue
paramname, paramvalue = line.split('=')
param2 = np.nan
try:
param1, param2 = paramvalue.split(',')
except ValueError:
param1 = paramvalue or np.nan
records.append({
'paramname': paramname,
'param1': param1,
'param2': param2,
})
return records

df = pd.DataFrame.from_records(parse_file_to_records('params.txt'))
print(df

)

,结果如下

paramname param1 param2
0         QQ      7      6
1         RR      5      5
2         SS      0      0
3      ay_on      0      0
4      by_on      0      1
5     cvc_on      1    NaN
6     mvc_on      1    NaN
7         rc      0    NaN
8       adus   -200   -200
9       bdus    200    200
10        au     -5     -2
11        bu      5      2
12        ay      7      0
13        by     10    0.5
14   con_Ayl    NaN    NaN
15   con_byl    NaN    NaN
16        Hp    100    NaN
17        Hu     20    NaN
18        Hw      1    NaN
19     Tsamp    100    NaN
20     model      1    NaN
21      soft      3    NaN
22      Qfac      5    NaN
23     sgm_r   0.01   0.01
24      Tobs     -1    NaN
25    mpc_on      1    NaN

如果你想判断一个字符串是否是有效的浮点数,你可以像这样写一个简单的函数:

def isfloat(num):
try:
float(num)
return True
except ValueError:
return False

print(isfloat('1.23'))

最新更新