以CSV格式格式化任意数据



我有一个任意格式的文件:

Name:pod1
Image:image1
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
Name:pod2
Image:image2
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi

它实际上是一个来自Kubernetes集群的pod列表。

我需要像这样转换csv中的数据:

Name,Image,cpu,memory
pod1,image1,2,1000Mi
pod1,image1,300m,1000Mi
pod2,image2,2,1000Mi
pod2,image2,300m,1000Mi

其中第2行和第4行的前2个值重复第1行和第3行的前两个值。

我最好在bash中有一个grep/sed/awk组合的解决方案,但这是我的挑剔。我对Python甚至Powershell中的任何解决方案都很满意。

非常感谢!

假设顺序是固定的:当我看到一个"存储器";行,我会打印一份完整的记录。

awk '
BEGIN {FS = ":"; OFS = ","; print "Name","Image","cpu","memory"}
{record[$1] = $2}
$1 == "memory" {print record["Name"], record["Image"], record["cpu"], record["memory"]}
' file

一个通用但基于python的解决方案:

from pandas import DataFrame
text = """Name:pod1
Image:image1
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
Name:pod2
Image:image2
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi"""
lines = text.splitlines(keepends=False)
record = dict()
records = list()
for line_number, line in enumerate(lines):
field, value = line.split(':')
if field in record:
# we have seen this field before thus this record is complete
# and the field value pair belongs to the next record
records.append(record)
# create a new empty record
record = dict()
# set the value for the current record
record[field] = value
dataframe = DataFrame(records)
dataframe.to_csv('here_we_go.csv', index=False)

注意:这个解决方案也适用于丢失或无序的字段,但两者结合可能会破坏它

由于顺序总是一致的,只要看到memory键,就可以写入一行:

import csv
with open('input.txt', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=['Name', 'Image', 'cpu', 'memory'])
csv_output.writeheader()
block = {}

for row in csv.reader(f_input, delimiter=':'):
if len(row) == 2:   # skip blank lines
block[row[0]] = row[1]

if row[0] == 'memory':
csv_output.writerow(block)

给予:

Name,Image,cpu,memory
pod1,image1,2,1000Mi
pod1,image1,300m,1000Mi
pod2,image2,2,1000Mi
pod2,image2,300m,1000Mi

最新更新