Python-将表格从文本文件转换为字典



我试图在网上查找信息,但找不到任何关于我想做什么的信息。

我有一个文本文档,其中有一种类型的表,我想将其转换为词典。表格如下:

Test Name   Cycles   Operations      Result  Errors   Last Error
Network: eno1 (172.0.10.1)   9289     81.751 Million  PASS    0        No errors
Network: eno2 (172.0.10.1)   9289     81.750 Million  PASS    0        No errors
Network: eno3 (172.0.10.1)   9362     82.387 Million  PASS    0        No errors
Network: eno4 (172.0.10.1)   9411     82.818 Million  PASS    0        No errors
USB 2: PMOKZQ35 (1:10)   5        58.328 Million  PASS    0        No errors
USB 3: PMU34QG452 (2:1)   2        2.690 Billion   PASS    0        No errors
USB 3: PMU356Q2K0 (2:3)   2        2.403 Billion   PASS    0        No errors
Serial Port: ttyS1   4        224200          PASS    0        No errors

我想创建一个字典,当我调用特定列时,我会得到信息,例如:

ability_test['Test Name'][0]
# return Network: eno1 (172.0.10.1)
ability_test['Cycles'][1]
# return 9289

到目前为止,我只能将这些信息转换成字典,但无法拆分这些信息。

我的代码

ability_test = {}
with open(f"result.txt", "r") as f:
for line in f:
count = 2
try:
k, v = line.strip().split(":")
if k in ability_test.keys():
ability_test[k + f"{count}".strip()] = v.strip()
count = count + 1
else:
ability_test[k.strip()] = v.strip()
except:
pass

我将感谢任何关于如何进行的信息或建议

这对于panda及其read_fwf方法来说非常简单(用文件读取修复(。默认情况下,它推断出固定的列宽,并在这种情况下得到正确的列宽。如果没有,则有一些可选参数来指导功能。

import pandas as pd
df = pd.read_fwf('result.txt')
print(df)
print(df['Test Name'][0])
print(df['Cycles'][1])

输出:

Test Name  Cycles      Operations Result  Errors Last Error
0  Network: eno1 (172.0.10.1)    9289  81.751 Million   PASS       0  No errors
1  Network: eno2 (172.0.10.1)    9289  81.750 Million   PASS       0  No errors
2  Network: eno3 (172.0.10.1)    9362  82.387 Million   PASS       0  No errors
3  Network: eno4 (172.0.10.1)    9411  82.818 Million   PASS       0  No errors
4      USB 2: PMOKZQ35 (1:10)       5  58.328 Million   PASS       0  No errors
5     USB 3: PMU34QG452 (2:1)       2   2.690 Billion   PASS       0  No errors
6     USB 3: PMU356Q2K0 (2:3)       2   2.403 Billion   PASS       0  No errors
7          Serial Port: ttyS1       4          224200   PASS       0  No errors
Network: eno1 (172.0.10.1)
9289

许多问题。

首先,您需要将标题名称捕获到它们自己的列表中,以便跟踪它们。

其次,根据多个空白字符的存在,数据似乎是可拆分的。您可以为此使用正则表达式:re.compile(r"ss+")

例如

import re
splitter = re.compile(r"ss+")
ability_test = {}
with open(f"result.txt", "r") as f:
# Use `next` to pop off the first line of headers
headers = splitter.split(next(f).strip())
for header in headers:
ability_test[header] = []
for line in f:
# For each value, associate it with the proper list of headers
values = splitter.split(line.strip())
for header, value in zip(headers, values):
ability_test[header].append(value)
for header, values in ability_test.items():
print(header, values)

输出:

Test Name ['Network: eno1 (172.0.10.1)', 'Network: eno2 (172.0.10.1)', 'Network: eno3 (172.0.10.1)', 'Network: eno4 (172.0.10.1)', 'USB 2: PMOKZQ35 (1:10)', 'USB 3: PMU34QG452 (2:1)', 'USB 3: PMU356Q2K0 (2:3)', 'Serial Port: ttyS1']
Cycles ['9289', '9289', '9362', '9411', '5', '2', '2', '4']
Operations ['81.751 Million', '81.750 Million', '82.387 Million', '82.818 Million', '58.328 Million', '2.690 Billion', '2.403 Billion', '224200']
Result ['PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS', 'PASS']
Errors ['0', '0', '0', '0', '0', '0', '0', '0']
Last Error ['No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors', 'No errors']

这些数据仍然有点难以处理。我认为更好的模式可能是每行输出一个dict。你可以这样做:

import re
splitter = re.compile(r"ss+")
ability_test = []
with open(f"result.txt", "r") as f:
headers = splitter.split(next(f).strip())
for line in f:
values = splitter.split(line.strip())
ability_test.append(dict(zip(headers, values)))
for item in ability_test:
print(item)
{'Test Name': 'Network: eno1 (172.0.10.1)', 'Cycles': '9289', 'Operations': '81.751 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno2 (172.0.10.1)', 'Cycles': '9289', 'Operations': '81.750 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno3 (172.0.10.1)', 'Cycles': '9362', 'Operations': '82.387 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Network: eno4 (172.0.10.1)', 'Cycles': '9411', 'Operations': '82.818 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 2: PMOKZQ35 (1:10)', 'Cycles': '5', 'Operations': '58.328 Million', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 3: PMU34QG452 (2:1)', 'Cycles': '2', 'Operations': '2.690 Billion', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'USB 3: PMU356Q2K0 (2:3)', 'Cycles': '2', 'Operations': '2.403 Billion', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}
{'Test Name': 'Serial Port: ttyS1', 'Cycles': '4', 'Operations': '224200', 'Result': 'PASS', 'Errors': '0', 'Last Error': 'No errors'}

首先,您应该确保输入文件的每一行都被相同的标记分割。下面的代码假设文件被"\t\t"分割:

import collections
spliter = 't'
with open("./result.txt", "r") as f:
ability_test = collections.OrderedDict([(key, []) for key in f.readline().strip().split(spliter)])
print(ability_test.keys())
for line in f:
for l, v in zip(ability_test.values(), line.strip().split(spliter)):
l.append(v)
print(ability_test)
print(ability_test['Test Name'][0])
print(ability_test['Cycles'][1])

最新更新