将文本表结构转换为列表



谁能告诉我如何转换表如下:

Device      Type       Model          Description      Vendor
--------------------------------------------------------------
Device1    Network1     Model2       Network Device1     bla bla
Device2    Network2     Model2       Network Device2     bla bla

如下所示:

Device = [Device1, Device2]
Type = [Network1, Network1]
Model = [Model2, Model2]
Description = [Network Device1 , Network Device2]
Vendor = [bla bla, bla bla]

i tried use:

networkdata = open("./bin/data.txt",'r').read()
for row in networkdata:
row = networkdata.rstrip('n').split(" ")
networkdataTable= [r.strip() for r in row if r != '']

但是运气不好,有人能帮我吗?

假设列之间用多个空格分隔,且没有"&;empty&;"单元格,你可以这样做:

import re
with open("./bin/data.txt") as f:
rows = map(str.strip, f)
networkdata_table = [re.split(r'ss+', row) for row in rows if row][2:]
# slice [2:] removes first two lines, which are the table header
Device, Type, Model, Description, Vendor = zip(*networkdata_table)

两个重要的部分是re.split(r'ss+', ...),它在出现两个或多个空白字符时拆分字符串,zip(*...)转换"row ">

请注意,您通常应该使用with块打开文件,没有必要指定'r'作为打开文件的模式,因为这是默认的,并且您可以直接迭代文件句柄对象f以一次获取一行。

更新后的答案:

我将采用@kaya3给出的使用regex的绝妙解决方案,并将其包含在我的答案中。

import re
txt = '''
Device     Type         Model        Description         Vendor
--------------------------------------------------------------
Device1    Network1     Model2       Network Device1     bla bla
Device2    Network2     Model2       Network Device2     bla bla'''
Device      = []
Type        = []
Model       = []
Description = []
Vendor      = []
for i,t in enumerate(txt.split('n')):
if i< 2: continue  #ignore header and the line with ---
x = re.split(r'ss+', t.strip())
Device.append(x[0].strip())
Type.append(x[1].strip())
Model.append(x[2].strip())
Description.append(x[3].strip())
Vendor.append(x[4].strip())
print (Device)
print (Type)
print (Model)
print (Description)
print (Vendor)

输出将相同。在这种情况下,如果数据有多个空格,我们只是将数据提取到单独的字段中。

它的输出将是:

['Device', 'Device1', 'Device2']
['Type', 'Network1', 'Network2']
['Model', 'Model2', 'Model2']
['Description', 'Network Device1', 'Network Device2']
['Vendor', 'bla bla', 'bla bla']
<标题>

之前答:您可以遍历字符串并按位置提取数据。

d = []
for t in txt.split('n'):
y = []
y.append(t[4:11].strip())
y.append(t[15:23].strip())
y.append(t[27:35].strip())
y.append(t[35:60].strip())
y.append(t[60:].strip())
d.append(y)
d.pop(1)
print (d)

它的输出将是:

[['Device', 'Type', 'Model', 'Description', 'Vendor'], ['Device1', 'Network1', 'Model2', 'Network Device1', 'bla bla'], ['Device2', 'Network2', 'Model2', 'Network Device2', 'bla bla']]

如果你想把它们存储在单独的变量中,你可以给出:

Device      = []
Type        = []
Model       = []
Description = []
Vendor      = []
for x in d:
Device.append(x[0])
Type.append(x[1])
Model.append(x[2])
Description.append(x[3])
Vendor.append(x[4])

print (Device)
print (Type)
print (Model)
print (Description)
print (Vendor)

它的输出将是:

['Device', 'Device1', 'Device2']
['Type', 'Network1', 'Network2']
['Model', 'Model2', 'Model2']
['Description', 'Network Device1', 'Network Device2']
['Vendor', 'bla bla', 'bla bla']