嗨,我有一个来自另一个程序的配置文件,我想用python操作。唯一需要替换的是文件中的数字。文件本身有一些制表符、带空格符和逗号。我能够阅读文件,而熊猫只保留了相关信息。现在我想用pandas数据框中的数字替换文件中的数字。这是我到目前为止的代码:
import pandas as pd
import fileinput
params = pd.read_csv("params.txt", skip_blank_lines=True, delim_whitespace=True, skiprows=1, header=None, names=["paramname", "equal", "param1", "param2"])
#params.columns = params.columns.str.strip()
params.drop(params.columns[1], axis=1, inplace=True)
params.replace(',','', regex=True, inplace=True)
params.replace('t','', regex=True, inplace=True)
params.replace(' ','', regex=True, inplace=True)
it = 0
rowidx = 0
colidx = "param1"
with fileinput.FileInput("params.txt", inplace=True, backup='.bak') as file:
for line in file:
if (it % 2 == 0):
colidx = "param1"
else:
colidx = "param2"
print(' '.join([params.loc[rowidx, colidx] if x.lstrip('-,').isnumeric() else x for x in line.split()]))
if (it % 2):
rowidx += 1
it += 1
大多数事情都按照我的意图工作,但我仍然有一些问题,我找不到解决方案。我也能够用lstrip函数代替负数,但也有一些数字后面有一个逗号,lstrip('-,')
似乎不适合。另一个问题是,虽然数字被正确地替换了,但文件的重写似乎取代了原始文件的格式(制表符被删除了)。
我的问题是:
- 如何替换后面跟着逗号的数字而不替换逗号本身?
- 如何保持输入文件(制表符)的原始格式等)?
- 如何将0.5这样的浮点数也考虑在内?
文件是这样的:
Parameter File
QQ = 7, 6
RR = 5, 5
SS = 0, 0
ay_on = 0, 0
by_on = 0, 1
cvc_on = 1
mvc_on = 1
rc = 0
adus = -200, -200
bdus = 200, 200
au = -5, -2
bu = 5, 2
ay = 7, 0
by = 10, 0.5
con_Ayl =
con_byl =
Hp = 100
Hu = 20
Hw = 1
Tsamp = 100
model = 1
soft = 3
Qfac = 5
sgm_r = 0.01, 0.01
Tobs = -1
mpc_on = 1
数据框是这样的:
paramname param1 param2
0 QQ 7 6.00
1 RR 5 5.00
2 SS 0 0.00
3 ay_on 0 0.00
4 by_on 0 1.00
5 cvc_on 1 NaN
6 mvc_on 1 NaN
7 rc 0 NaN
8 adus -200 -200.00
9 bdus 200 200.00
10 au -5 -2.00
11 bu 5 2.00
12 ay 7 0.00
13 by 10 0.50
14 con_Ayl NaN NaN
15 con_byl NaN NaN
16 Hp 100 NaN
17 Hu 20 NaN
18 Hw 1 NaN
19 Tsamp 100 NaN
20 model 1 NaN
21 soft 3 NaN
22 Qfac 5 NaN
23 sgm_r 0.01 0.01
24 Tobs -1 NaN
25 mpc_on 1 NaN
编辑:我用逗号和浮点数解决了这个问题:
print(' '.join([params.loc[rowidx, colidx] if (x.lstrip('-').isnumeric() or isfloat(x.lstrip('-'))) else params.loc[rowidx, colidx] + ',' if x.endswith(',') else x for x in line.split()]))
从文件中删除所有选项卡的问题仍然存在。我该如何预防呢?
我倾向于先解析文件,然后为pd.Dataframe生成记录。
import pandas as pd
import numpy as np
def parse_file_to_records(file_name):
records = []
with open(file_name, 'r') as f:
lines = f.readlines()
# skip the first line
for line in lines[1:]:
line = line.strip().replace(' ', '')
if not line:
continue
paramname, paramvalue = line.split('=')
param2 = np.nan
try:
param1, param2 = paramvalue.split(',')
except ValueError:
param1 = paramvalue or np.nan
records.append({
'paramname': paramname,
'param1': param1,
'param2': param2,
})
return records
df = pd.DataFrame.from_records(parse_file_to_records('params.txt'))
print(df
)
,结果如下
paramname param1 param2
0 QQ 7 6
1 RR 5 5
2 SS 0 0
3 ay_on 0 0
4 by_on 0 1
5 cvc_on 1 NaN
6 mvc_on 1 NaN
7 rc 0 NaN
8 adus -200 -200
9 bdus 200 200
10 au -5 -2
11 bu 5 2
12 ay 7 0
13 by 10 0.5
14 con_Ayl NaN NaN
15 con_byl NaN NaN
16 Hp 100 NaN
17 Hu 20 NaN
18 Hw 1 NaN
19 Tsamp 100 NaN
20 model 1 NaN
21 soft 3 NaN
22 Qfac 5 NaN
23 sgm_r 0.01 0.01
24 Tobs -1 NaN
25 mpc_on 1 NaN
如果你想判断一个字符串是否是有效的浮点数,你可以像这样写一个简单的函数:
def isfloat(num):
try:
float(num)
return True
except ValueError:
return False
print(isfloat('1.23'))